Changes between Initial Version and Version 1 of Ticket #1120


Ignore:
Timestamp:
Apr 3, 2009, 10:36:34 AM (16 years ago)
Author:
Jari Häkkinen
Comment:

More arguments for storing information in BASE about how data is stored.

If users run the Jep Intensity Transformer plug-in and transforms all intensities to logged intensities the values are stored as logged values in BASE. After this point the user must remember that all work on this branch in the analysis tree is in log space (at least until data is transformed again). All that is fine but what if the user selects to normalize with 'Average Normalization' from the net.sf.basedb.normalizers package? This normalization will always use geometric mean as the averager. Geometric mean makes sense for non-logged data but arithmetic mean makes sense for logged data.

There is two solution to this; i) If BASE knows if data is in log space or not the plug-in could use this information to automatically select the appropriate averaging method, ii) Add 'select average method functionality' to the normalization plug-in. In the latter case the user needs to be informed about how to choose the average method, whereas if BASE stores log space information the plug-in automatically selects the appropriate average method. Of course, i) and ii) can co-exist where the plug-in uses log space information to set an appropriate default.

Legend:

Unmodified
Added
Removed
Modified
  • Ticket #1120

    • Property Summary The dynamic part of BASE should keep track whether data is stored logged or notThe dynamic part of BASE should keep track whether intensity data is in log space or not
  • Ticket #1120 – Description

    initial v1  
    11I know the tradition is to store data non-logged but there have been many cases previously where this informal rule has not been followed creating confusion. In many cases numbers are used in logged form only and now data has to be converted back and forth.
     2
     3One could argue that why treat log space differently from other spaces. A valid point, this ticket actually questions why we think the Euclidean space should have a special position in the database. If one is not satisfied with simply adding support for log/Euclidean space tracking one can imagine an enumeration of pre-defined spaces that is extended as further transforms should be supported ... just a thought.