Opened 16 years ago

Closed 16 years ago

Last modified 16 years ago

#1120 closed enhancement (fixed)

The dynamic part of BASE should keep track whether intensity data is in log space or not

Reported by: Jari Häkkinen Owned by: Nicklas Nordborg
Priority: major Milestone: BASE 2.12
Component: core Version:
Keywords: Cc:

Description (last modified by Jari Häkkinen)

I know the tradition is to store data non-logged but there have been many cases previously where this informal rule has not been followed creating confusion. In many cases numbers are used in logged form only and now data has to be converted back and forth.

One could argue that why treat log space differently from other spaces. A valid point, this ticket actually questions why we think the Euclidean space should have a special position in the database. If one is not satisfied with simply adding support for log/Euclidean space tracking one can imagine an enumeration of pre-defined spaces that is extended as further transforms should be supported ... just a thought.

Change History (15)

comment:1 by Jari Häkkinen, 16 years ago

Description: modified (diff)
Summary: The dynamic part of BASE should keep track whether data is stored logged or notThe dynamic part of BASE should keep track whether intensity data is in log space or not

More arguments for storing information in BASE about how data is stored.

If users run the Jep Intensity Transformer plug-in and transforms all intensities to logged intensities the values are stored as logged values in BASE. After this point the user must remember that all work on this branch in the analysis tree is in log space (at least until data is transformed again). All that is fine but what if the user selects to normalize with 'Average Normalization' from the net.sf.basedb.normalizers package? This normalization will always use geometric mean as the averager. Geometric mean makes sense for non-logged data but arithmetic mean makes sense for logged data.

There is two solution to this; i) If BASE knows if data is in log space or not the plug-in could use this information to automatically select the appropriate averaging method, ii) Add 'select average method functionality' to the normalization plug-in. In the latter case the user needs to be informed about how to choose the average method, whereas if BASE stores log space information the plug-in automatically selects the appropriate average method. Of course, i) and ii) can co-exist where the plug-in uses log space information to set an appropriate default.

comment:2 by Jari Häkkinen, 16 years ago

Milestone: BASE 2.12

comment:3 by Nicklas Nordborg, 16 years ago

Owner: changed from everyone to Nicklas Nordborg
Status: newassigned

comment:4 by Nicklas Nordborg, 16 years ago

This change will cause some backwards compatibility problems for existing plug-ins and other functions. I'll try to give a brief overview.

The current API contract only allows non-logged intensity values. If a user/plug-in produces logged values it is not expected that downstream plug-ins and tool will behave as expected. All core plug-ins, the plot tool, the experiment explorer, etc. expects non-logged values. If they are fed with logged values the results are no longer correct.

With this ticket we are changing the API contract to allow logged values as long as we tell BASE that the values are logged. With this follows an expectation that downstream plug-ins and tools should continue to work or at least detect this and give an appropriate error message. But this will not happen automatically. It is up to each plug-in/tool (and it's developer) to handle this.

The changes in the inner core are not very big:

  • The BioAssaySet class get's a new property that tells us if it's data is stored non-logged or logged (base 2 or base 10).
  • The VirtualColumn.channel() method will be deprecated and replaced with a new method: VirtualColumn.channelRaw(). This change is purely to simplify for external plug-in developers to detect if the are using the the core API in an unsafe way. Plug-ins that need the unlogged intensity values may use a new method: VirtualColumn.channelIntensity()
  • The constructors for the ChannelFunction class will be deprecated and replaced with new variants that also take a parameter carrying information if the current bioassay set has logged data or not. This is needed so that the ch() function in JEP expression can continue to work as expected (it must convert logged values to non-logged). A new JEP function, rawCh(), is created to allow formulas based on the stored values as they are.

The last two changes will propagate to a lot of places in the core, core plug-ins, and web client tools. In this phase we will focus on getting everything to work as before, which means that logged values are converted to non-logged before they are used by each plug-in or tool. Here is a preliminary list of things that are affected:

  • Experiment explorer: It's averaging functionality needs to be changed
  • Plot servlet: Needs data conversion if plotted formulas use the ch() function
  • BioAssaySetExporter (MeV format): Needs data conversion for logged values
  • JepIntensityTransformer: Needs data conversion if formulas use the ch() function
  • LowessNormalization: Needs data conversion for logged values
  • MediaRatioNormalization: Needs data conversion for logged values
  • Spot data listing: Needs data conversion for columns that are based on formulas that use the ch() function

The above list only contains the required changes. In addition to this there is need for some more "cosmetic" changes. For example:

  • We need a way to access and use the rawCh() function formulas, spot data lists, experiment explorer, etc.
  • We need an addition to the Query API for calculating an arbitrary exponential. The current API only have the natural exponential.
  • And possible more things that I have not yet discovered.

comment:5 by Nicklas Nordborg, 16 years ago

(In [4910]) References #1120: The dynamic part of BASE should keep track whether intensity data is in log space or not

Added this flag to bioassay set and made it possible to modify it in the web interface.

comment:6 by Nicklas Nordborg, 16 years ago

(In [4912]) References #1120: The dynamic part of BASE should keep track whether intensity data is in log space or not

Deprecated VirtualColumn.channel(). Added functionality in the core to make it possible to convert transformed intensities back to regular intensities. Added RawChannelFunction for cases where the stored intensity values are needed without conversion. What remains now is to check plug-ins and tools to make sure that the work correctly with both transformed and non-transformed data.

comment:7 by Nicklas Nordborg, 16 years ago

(In [4913]) References #1120: The dynamic part of BASE should keep track whether intensity data is in log space or not

BioAssaySetExporter can now handle transformed data.

comment:8 by Nicklas Nordborg, 16 years ago

(In [4914]) References #1120: The dynamic part of BASE should keep track whether intensity data is in log space or not

Lowess and median ration normalizations has been fixed.

comment:9 by Nicklas Nordborg, 16 years ago

(In [4915]) References #1120: The dynamic part of BASE should keep track whether intensity data is in log space or not

Additional changes in the core and test program fixes.

comment:10 by Nicklas Nordborg, 16 years ago

(In [4916]) References #1120: The dynamic part of BASE should keep track whether intensity data is in log space or not

Fixes for spot data listings and experiment explorer.

comment:11 by Nicklas Nordborg, 16 years ago

(In [4917]) References #1120: The dynamic part of BASE should keep track whether intensity data is in log space or not

Added intensity transformation information to formulas. Changed places were formulas are used so that only the proper type of formulas are displayed. It remains to change the GUI for editing formulas so that this information can be specified by users.

comment:12 by Nicklas Nordborg, 16 years ago

(In [4918]) References #1120: The dynamic part of BASE should keep track whether intensity data is in log space or not

GUI for forumlas has been updated.

comment:13 by Nicklas Nordborg, 16 years ago

(In [4919]) References #1120: The dynamic part of BASE should keep track whether intensity data is in log space or not

Base1PluginExecuter can now handle BASE 1 plug-ins that works with logged data.

comment:14 by Nicklas Nordborg, 16 years ago

Resolution: fixed
Status: assignedclosed

(In [4920]) Fixes #1120: The dynamic part of BASE should keep track whether intensity data is in log space or not

Updated documentation.

comment:15 by Nicklas Nordborg, 16 years ago

(In [4935]) References #1120: The dynamic part of BASE should keep track whether intensity data is in log space or not

Fixed a problem when seleting the 'any' option for source/result intensity transform when editing formulas.

Note: See TracTickets for help on using tickets.