Opened 17 years ago
Closed 17 years ago
#867 closed enhancement (wontfix)
Support for array designs without coordinate/position information for features — at Version 4
Reported by: | Nicklas Nordborg | Owned by: | Nicklas Nordborg |
---|---|---|---|
Priority: | blocker | Milestone: | |
Component: | core | Version: | |
Keywords: | Cc: |
Description (last modified by )
This ticket has been replaced by #894.
The current way of using an ArrayDesign with features is to lookup the reporter positioned at a given coordinate. There is a problem with this for platform which doesn't have a positioning system, for example Illumina. In this case you only know that a given set of reporters is present, but not their exact location. The current solution has been to generate "fake" coordinates by for example using block=1, column=1 and row=row number in file. The problem with this solution is that it only works if all data files are sorted in the same way. This is not always the case.
We propose that BASE can somehow be aware of that the positioning for some array designs are irrelevant. We can call these array design for virtual array design.
For backwards compatibility we must still generate fake coordinates, but this should be done in the background and need not be visible to importer file configurations. This means that the reporter map importer doesn't have to provide mappings for the block, row and column coordinates. This is done automatically by BASE in the background. For the raw data importer this means that array design validation should be done only with the reporter ID:s, not the coordinates. In the background BASE does still use fake coordinates, but this should never be visible externally.
This approach requires that a single reporter is only present once on the array desing, or that some pre-processing step has been performed that averages over identical reporters. This is exactly what is done with Illumina Beadstudio.
Change History (4)
comment:1 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:2 by , 17 years ago
comment:3 by , 17 years ago
Priority: | critical → blocker |
---|
comment:4 by , 17 years ago
Description: | modified (diff) |
---|---|
Milestone: | BASE 2.6 |
Resolution: | → wontfix |
Status: | assigned → closed |
I have been thinking a bit about how to do this. Here is a first draft.
Array design
Add a boolean flag,
isVirtual()
, to theArrayDesign
class. If set totrue
, theFeatureBatcher
ignores any coordinates values and assigns fake ones instead. The batcher also verify that a reporter is only used once. There could be an option if this should be an error condition or ignored. We should add a new method to the feature batcher with a reporter as the only argument. This method should only be usable ifisVirtual()==true
. For backwards compatibility reasons, the old method with takes coordinates and a reporter should delegate to the new method if the flag is set. This should be documented as a possible incompatible change in the interface (ignoring coordinates was never part of old promise).The
isVirtual()
flag can't be changed after features has been added.In the web interface, coordinate information for features should not be visible. The possibility to link with plates should be disabled, since that implies coordinate information.
Raw data
Importing data to a raw bioassay which uses a virtual array design is affected. The
RawDataBatcher
need to use the reporter ID instead of the coordinates as the key in the preloaded cache. ThedoInsert()
method should use the reporter ID instead of the coordinates to lookup the feature. Coordinate information in theRawData
object should be ignored and the fake coordinates from the feature are copied instead. This should be documented as a possible incompatible change.In the web interface, coordinate information should not be visible.
A remaining issue is how to handle the case were a user changes to another array design. This is more or less impossible right now since this validates that all coordinates and reporters are the same for both the new and old array design. This will probably never be the case if fake coordinates are generated by the core. This also includes the case of changing from a null array design to some specific array design. What we really need to do is to only match by reporter and to change the fake coordinates of the raw data.
I think this is all we NEED to do. Existing plug-ins will probably work as they are, but for better user experience some changes may be needed.
isInContext()
method.Update script
In the first step, the 'isVirtual' flag should be set to false for all existing array designs.
It would be nice to be able to detect which array designs that are Illumina designs and set the 'isVirtual' to true. How can we do this? By checking if a raw bioassay with the proper raw data type is linking to them? What if raw bioassays of other raw data types are also linking to them?