Opened 11 years ago

Closed 11 years ago

#1385 closed enhancement (fixed)

Plot function in the annotation summary table in experiment explorer

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: major Milestone: BASE 2.14
Component: web Version:
Keywords: Cc:

Description (last modified by Nicklas Nordborg)

A plot button in annotation summary table. The plot should be a box plot of user selected values grouped on annotation, i.e., one box for each annotation value. Replaces 3 from #1375. This should be synchronized with #1386.

Attachments (1)

plotfunction_EE_box.pdf (317.3 KB) - added by Johan Vallon-Christersson 11 years ago.

Download all attachments as: .zip

Change History (14)

comment:1 Changed 11 years ago by Nicklas Nordborg

I don't understand what kind of plot you would like. Please specify and give an example.

comment:2 Changed 11 years ago by Johan Vallon-Christersson

Example in pdf file

Changed 11 years ago by Johan Vallon-Christersson

Attachment: plotfunction_EE_box.pdf added

comment:3 Changed 11 years ago by Nicklas Nordborg

Description: modified (diff)
Milestone: BASE 2.14
Owner: changed from everyone to Nicklas Nordborg
Status: newassigned

comment:4 Changed 11 years ago by Nicklas Nordborg

I have investigated what kind of help we can get from the JFreeChart plot package that we are using. It has a box-and-whisker type chart that is relatively easy to use. But I don't know if all calculations are made exactly as in the pdf that Johan submitted. By looking at the JFreeChart source code here is what I think it does:

  • It calculates the mean and median as usual. The plot can show one or both values. The median as a line and the mean as a circle.
  • The 1st (Q1) and 3rd (Q3) quartiles are calculated as the median of the lower/upper half of the (sorted) list of values. The two values define the bottom and top of the box. Eg. if we have 10 sorted data values, then Q1 = median of values 1-5 and Q2 = median of values 6-10.
  • Then, upper and lower threashold (TU/TL) values are calculated as:
       TU = Q3 + (Q3-Q1)*1.5
       TL = Q1 - (Q3-Q1)*1.5
    
  • The highest data value that is less than or equal to TU defines the upper whisker and the lowest data value that is greater than or equal to TL defines the lower whisker.

So, now my question is if this algorithm is what you want? If not, it would be nice if someone post an alternate algorithm for how to calculate the values.

comment:5 Changed 11 years ago by jari

We would prefer that the TU and TL are calculated differently. The lower value should be the 5th percentile and the upper value should be the 95th percentile. If you have all the values in a sorted vector simply use the value at index 0.05*vector_size and 0.95*vector_size, respectively. Ties should be solved by taking the arithmetic average of the two neighbouring values.

(Q1 and Q3 are calculated similarly with factors 0.25 and 0.75, but the way outlined above works also.)

comment:6 Changed 11 years ago by Nicklas Nordborg

Ties should be solved by taking the arithmetic average of the two neighbouring values.

What exactly does this mean?

comment:7 in reply to:  6 Changed 11 years ago by jari

Replying to nicklas:

Ties should be solved by taking the arithmetic average of the two neighbouring values.

What exactly does this mean?

The counting for the 20th percentile may end up between two elements in the vector. Say you have a vector with 6 elements:

1 4 12 53 100 126

the 20th percentile is between index 1 and 2 ... the value should be (1+4)/2=2.5.

comment:8 Changed 11 years ago by Nicklas Nordborg

Hmmm... so if we use factor * vector_size we get the index for that percentile... It doesn't seem to work for medians which I guess is the same as the 50th percentile. And what about boundaries when we are close the the first and last element in the list?

  • 25th percentile: 6 * 0.25 = 1.5 --> average of element 1+2
  • median: 6 * 0.5 = 3 --> but the median should be the average of element 3+4
  • 5th percentile: 6 * 0.05 = 0.3 --> value of element 1?
  • 95th percentile: 6 * 0.95 = 5.7 --> average of element 5+6... but this is not symmetric with the 5th percentile??

What if we have 7 elements?

  • 25th percentile: 7 * 0.25 = 1.75 --> average of element 1+2
  • median: 7 * 0.5 = 3.5 --> but the median should be the value of element 4

What am I missing?

comment:9 Changed 11 years ago by Nicklas Nordborg

comment:10 Changed 11 years ago by jari

The index determination should use (vector.length+1) and you will the proper index.

The code seems okay, the difference lies in the calculation of ties. I suggested a non-weighted average whereas the code interpolates between the values in the two neighbouring elements. Either will do, just document the choice made.

comment:11 Changed 11 years ago by Nicklas Nordborg

(In [5138]) References #1385 and #1386. Plot functions in experiment explorer

Both types of plots can now be generated and I think the percentile values are correctly calculated.

comment:12 Changed 11 years ago by Nicklas Nordborg

(In [5142]) References #1385 and #1386. Plot functions in experiment explorer

The current reporter name is used as a default subtitle.

comment:13 Changed 11 years ago by Nicklas Nordborg

Resolution: fixed
Status: assignedclosed

Everything seems to be ok now.

Note: See TracTickets for help on using tickets.