Opened 12 years ago
Closed 11 years ago
#1385 closed enhancement (fixed)
Plot function in the annotation summary table in experiment explorer
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | major | Milestone: | BASE 2.14 |
Component: | web | Version: | |
Keywords: | Cc: |
Description (last modified by )
Attachments (1)
Change History (14)
comment:1 Changed 12 years ago by
Changed 12 years ago by
Attachment: | plotfunction_EE_box.pdf added |
---|
comment:3 Changed 11 years ago by
Description: | modified (diff) |
---|---|
Milestone: | → BASE 2.14 |
Owner: | changed from everyone to Nicklas Nordborg |
Status: | new → assigned |
comment:4 Changed 11 years ago by
I have investigated what kind of help we can get from the JFreeChart
plot package that we are using. It has a box-and-whisker type chart that is relatively easy to use. But I don't know if all calculations are made exactly as in the pdf that Johan submitted. By looking at the JFreeChart
source code here is what I think it does:
- It calculates the mean and median as usual. The plot can show one or both values. The median as a line and the mean as a circle.
- The 1st (Q1) and 3rd (Q3) quartiles are calculated as the median of the lower/upper half of the (sorted) list of values. The two values define the bottom and top of the box. Eg. if we have 10 sorted data values, then Q1 = median of values 1-5 and Q2 = median of values 6-10.
- Then, upper and lower threashold (TU/TL) values are calculated as:
TU = Q3 + (Q3-Q1)*1.5 TL = Q1 - (Q3-Q1)*1.5
- The highest data value that is less than or equal to TU defines the upper whisker and the lowest data value that is greater than or equal to TL defines the lower whisker.
So, now my question is if this algorithm is what you want? If not, it would be nice if someone post an alternate algorithm for how to calculate the values.
comment:5 Changed 11 years ago by
We would prefer that the TU and TL are calculated differently. The lower value should be the 5th percentile and the upper value should be the 95th percentile. If you have all the values in a sorted vector simply use the value at index 0.05*vector_size and 0.95*vector_size, respectively. Ties should be solved by taking the arithmetic average of the two neighbouring values.
(Q1 and Q3 are calculated similarly with factors 0.25 and 0.75, but the way outlined above works also.)
comment:6 follow-up: 7 Changed 11 years ago by
Ties should be solved by taking the arithmetic average of the two neighbouring values.
What exactly does this mean?
comment:7 Changed 11 years ago by
Replying to nicklas:
Ties should be solved by taking the arithmetic average of the two neighbouring values.
What exactly does this mean?
The counting for the 20th percentile may end up between two elements in the vector. Say you have a vector with 6 elements:
1 4 12 53 100 126
the 20th percentile is between index 1 and 2 ... the value should be (1+4)/2=2.5.
comment:8 Changed 11 years ago by
Hmmm... so if we use factor * vector_size
we get the index for that percentile...
It doesn't seem to work for medians which I guess is the same as the 50th percentile. And what about boundaries when we are close the the first and last element in the list?
- 25th percentile: 6 * 0.25 = 1.5 --> average of element 1+2
- median: 6 * 0.5 = 3 --> but the median should be the average of element 3+4
- 5th percentile: 6 * 0.05 = 0.3 --> value of element 1?
- 95th percentile: 6 * 0.95 = 5.7 --> average of element 5+6... but this is not symmetric with the 5th percentile??
What if we have 7 elements?
- 25th percentile: 7 * 0.25 = 1.75 --> average of element 1+2
- median: 7 * 0.5 = 3.5 --> but the median should be the value of element 4
What am I missing?
comment:9 Changed 11 years ago by
I found this: http://www.koders.com/java/fid867FA235DAF49EE794B20334EF719CE6C69E17E5.aspx
Does that algorithm makes sense?
comment:10 Changed 11 years ago by
The index determination should use (vector.length+1) and you will the proper index.
The code seems okay, the difference lies in the calculation of ties. I suggested a non-weighted average whereas the code interpolates between the values in the two neighbouring elements. Either will do, just document the choice made.
comment:11 Changed 11 years ago by
comment:12 Changed 11 years ago by
comment:13 Changed 11 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Everything seems to be ok now.
I don't understand what kind of plot you would like. Please specify and give an example.