Opened 15 years ago

Closed 15 years ago

# Plot function in the annotation summary table in experiment explorer

Reported by: Owned by: Nicklas Nordborg Nicklas Nordborg major BASE 2.14 web

A plot button in annotation summary table. The plot should be a box plot of user selected values grouped on annotation, i.e., one box for each annotation value. Replaces 3 from #1375. This should be synchronized with #1386.

### comment:1 by Nicklas Nordborg, 15 years ago

I don't understand what kind of plot you would like. Please specify and give an example.

### comment:2 by Johan Vallon-Christersson, 15 years ago

Example in pdf file

### comment:3 by Nicklas Nordborg, 15 years ago

Description: modified (diff) → BASE 2.14 changed from everyone to Nicklas Nordborg new → assigned

### comment:4 by Nicklas Nordborg, 15 years ago

I have investigated what kind of help we can get from the `JFreeChart` plot package that we are using. It has a box-and-whisker type chart that is relatively easy to use. But I don't know if all calculations are made exactly as in the pdf that Johan submitted. By looking at the `JFreeChart` source code here is what I think it does:

• It calculates the mean and median as usual. The plot can show one or both values. The median as a line and the mean as a circle.
• The 1st (Q1) and 3rd (Q3) quartiles are calculated as the median of the lower/upper half of the (sorted) list of values. The two values define the bottom and top of the box. Eg. if we have 10 sorted data values, then Q1 = median of values 1-5 and Q2 = median of values 6-10.
• Then, upper and lower threashold (TU/TL) values are calculated as:
```   TU = Q3 + (Q3-Q1)*1.5
TL = Q1 - (Q3-Q1)*1.5
```
• The highest data value that is less than or equal to TU defines the upper whisker and the lowest data value that is greater than or equal to TL defines the lower whisker.

So, now my question is if this algorithm is what you want? If not, it would be nice if someone post an alternate algorithm for how to calculate the values.

### comment:5 by Jari Häkkinen, 15 years ago

We would prefer that the TU and TL are calculated differently. The lower value should be the 5th percentile and the upper value should be the 95th percentile. If you have all the values in a sorted vector simply use the value at index 0.05*vector_size and 0.95*vector_size, respectively. Ties should be solved by taking the arithmetic average of the two neighbouring values.

(Q1 and Q3 are calculated similarly with factors 0.25 and 0.75, but the way outlined above works also.)

### follow-up:  7 comment:6 by Nicklas Nordborg, 15 years ago

Ties should be solved by taking the arithmetic average of the two neighbouring values.

What exactly does this mean?

### in reply to:  6 comment:7 by Jari Häkkinen, 15 years ago

Ties should be solved by taking the arithmetic average of the two neighbouring values.

What exactly does this mean?

The counting for the 20th percentile may end up between two elements in the vector. Say you have a vector with 6 elements:

```1 4 12 53 100 126
```

the 20th percentile is between index 1 and 2 ... the value should be (1+4)/2=2.5.

### comment:8 by Nicklas Nordborg, 15 years ago

Hmmm... so if we use `factor * vector_size` we get the index for that percentile... It doesn't seem to work for medians which I guess is the same as the 50th percentile. And what about boundaries when we are close the the first and last element in the list?

• 25th percentile: 6 * 0.25 = 1.5 --> average of element 1+2
• median: 6 * 0.5 = 3 --> but the median should be the average of element 3+4
• 5th percentile: 6 * 0.05 = 0.3 --> value of element 1?
• 95th percentile: 6 * 0.95 = 5.7 --> average of element 5+6... but this is not symmetric with the 5th percentile??

What if we have 7 elements?

• 25th percentile: 7 * 0.25 = 1.75 --> average of element 1+2
• median: 7 * 0.5 = 3.5 --> but the median should be the value of element 4

What am I missing?

### comment:9 by Nicklas Nordborg, 15 years ago

Does that algorithm makes sense?

### comment:10 by Jari Häkkinen, 15 years ago

The index determination should use (vector.length+1) and you will the proper index.

The code seems okay, the difference lies in the calculation of ties. I suggested a non-weighted average whereas the code interpolates between the values in the two neighbouring elements. Either will do, just document the choice made.

### comment:11 by Nicklas Nordborg, 15 years ago

(In [5138]) References #1385 and #1386. Plot functions in experiment explorer

Both types of plots can now be generated and I think the percentile values are correctly calculated.

### comment:12 by Nicklas Nordborg, 15 years ago

(In [5142]) References #1385 and #1386. Plot functions in experiment explorer

The current reporter name is used as a default subtitle.

### comment:13 by Nicklas Nordborg, 15 years ago

Resolution: → fixed assigned → closed

Everything seems to be ok now.

Note: See TracTickets for help on using tickets.