Opened 12 years ago

Closed 12 years ago

# Plot function in the annotation summary table in experiment explorer

Reported by: Owned by: Nicklas Nordborg Nicklas Nordborg major BASE 2.14 web

A plot button in annotation summary table. The plot should be a box plot of user selected values grouped on annotation, i.e., one box for each annotation value. Replaces 3 from #1375. This should be synchronized with #1386.

### comment:1 Changed 12 years ago by Nicklas Nordborg

I don't understand what kind of plot you would like. Please specify and give an example.

### comment:2 Changed 12 years ago by Johan Vallon-Christersson

Example in pdf file

### comment:3 Changed 12 years ago by Nicklas Nordborg

Description: modified (diff) → BASE 2.14 changed from everyone to Nicklas Nordborg new → assigned

### comment:4 Changed 12 years ago by Nicklas Nordborg

I have investigated what kind of help we can get from the `JFreeChart` plot package that we are using. It has a box-and-whisker type chart that is relatively easy to use. But I don't know if all calculations are made exactly as in the pdf that Johan submitted. By looking at the `JFreeChart` source code here is what I think it does:

• It calculates the mean and median as usual. The plot can show one or both values. The median as a line and the mean as a circle.
• The 1st (Q1) and 3rd (Q3) quartiles are calculated as the median of the lower/upper half of the (sorted) list of values. The two values define the bottom and top of the box. Eg. if we have 10 sorted data values, then Q1 = median of values 1-5 and Q2 = median of values 6-10.
• Then, upper and lower threashold (TU/TL) values are calculated as:
```   TU = Q3 + (Q3-Q1)*1.5
TL = Q1 - (Q3-Q1)*1.5
```
• The highest data value that is less than or equal to TU defines the upper whisker and the lowest data value that is greater than or equal to TL defines the lower whisker.

So, now my question is if this algorithm is what you want? If not, it would be nice if someone post an alternate algorithm for how to calculate the values.

### comment:5 Changed 12 years ago by Jari Häkkinen

We would prefer that the TU and TL are calculated differently. The lower value should be the 5th percentile and the upper value should be the 95th percentile. If you have all the values in a sorted vector simply use the value at index 0.05*vector_size and 0.95*vector_size, respectively. Ties should be solved by taking the arithmetic average of the two neighbouring values.

(Q1 and Q3 are calculated similarly with factors 0.25 and 0.75, but the way outlined above works also.)

### comment:6 follow-up:  7 Changed 12 years ago by Nicklas Nordborg

Ties should be solved by taking the arithmetic average of the two neighbouring values.

What exactly does this mean?

### comment:7 in reply to:  6 Changed 12 years ago by Jari Häkkinen

Ties should be solved by taking the arithmetic average of the two neighbouring values.

What exactly does this mean?

The counting for the 20th percentile may end up between two elements in the vector. Say you have a vector with 6 elements:

```1 4 12 53 100 126
```

the 20th percentile is between index 1 and 2 ... the value should be (1+4)/2=2.5.

### comment:8 Changed 12 years ago by Nicklas Nordborg

Hmmm... so if we use `factor * vector_size` we get the index for that percentile... It doesn't seem to work for medians which I guess is the same as the 50th percentile. And what about boundaries when we are close the the first and last element in the list?

• 25th percentile: 6 * 0.25 = 1.5 --> average of element 1+2
• median: 6 * 0.5 = 3 --> but the median should be the average of element 3+4
• 5th percentile: 6 * 0.05 = 0.3 --> value of element 1?
• 95th percentile: 6 * 0.95 = 5.7 --> average of element 5+6... but this is not symmetric with the 5th percentile??

What if we have 7 elements?

• 25th percentile: 7 * 0.25 = 1.75 --> average of element 1+2
• median: 7 * 0.5 = 3.5 --> but the median should be the value of element 4

What am I missing?

### comment:9 Changed 12 years ago by Nicklas Nordborg

Does that algorithm makes sense?

### comment:10 Changed 12 years ago by Jari Häkkinen

The index determination should use (vector.length+1) and you will the proper index.

The code seems okay, the difference lies in the calculation of ties. I suggested a non-weighted average whereas the code interpolates between the values in the two neighbouring elements. Either will do, just document the choice made.

### comment:11 Changed 12 years ago by Nicklas Nordborg

(In ) References #1385 and #1386. Plot functions in experiment explorer

Both types of plots can now be generated and I think the percentile values are correctly calculated.

### comment:12 Changed 12 years ago by Nicklas Nordborg

(In ) References #1385 and #1386. Plot functions in experiment explorer

The current reporter name is used as a default subtitle.

### comment:13 Changed 12 years ago by Nicklas Nordborg

Resolution: → fixed assigned → closed

Everything seems to be ok now.

Note: See TracTickets for help on using tickets.