Opened 16 years ago

Closed 16 years ago

#867 closed enhancement (wontfix)

Support for array designs without coordinate/position information for features

Reported by: Nicklas Nordborg Owned by: Nicklas Nordborg
Priority: blocker Milestone:
Component: core Version:
Keywords: Cc:

Description (last modified by Nicklas Nordborg)

This ticket has been replaced by #894.

The current way of using an ArrayDesign with features is to lookup the reporter positioned at a given coordinate. There is a problem with this for platform which doesn't have a positioning system, for example Illumina. In this case you only know that a given set of reporters is present, but not their exact location. The current solution has been to generate "fake" coordinates by for example using block=1, column=1 and row=row number in file. The problem with this solution is that it only works if all data files are sorted in the same way. This is not always the case.

We propose that BASE can somehow be aware of that the positioning for some array designs are irrelevant. We can call these array design for virtual array design.

For backwards compatibility we must still generate fake coordinates, but this should be done in the background and need not be visible to importer file configurations. This means that the reporter map importer doesn't have to provide mappings for the block, row and column coordinates. This is done automatically by BASE in the background. For the raw data importer this means that array design validation should be done only with the reporter ID:s, not the coordinates. In the background BASE does still use fake coordinates, but this should never be visible externally.

This approach requires that a single reporter is only present once on the array desing, or that some pre-processing step has been performed that averages over identical reporters. This is exactly what is done with Illumina Beadstudio.

Change History (4)

comment:1 by Nicklas Nordborg, 16 years ago

Owner: changed from everyone to Nicklas Nordborg
Status: newassigned

comment:2 by Nicklas Nordborg, 16 years ago

I have been thinking a bit about how to do this. Here is a first draft.

Array design

Add a boolean flag, isVirtual(), to the ArrayDesign class. If set to true, the FeatureBatcher ignores any coordinates values and assigns fake ones instead. The batcher also verify that a reporter is only used once. There could be an option if this should be an error condition or ignored. We should add a new method to the feature batcher with a reporter as the only argument. This method should only be usable if isVirtual()==true. For backwards compatibility reasons, the old method with takes coordinates and a reporter should delegate to the new method if the flag is set. This should be documented as a possible incompatible change in the interface (ignoring coordinates was never part of old promise).

The isVirtual() flag can't be changed after features has been added.

In the web interface, coordinate information for features should not be visible. The possibility to link with plates should be disabled, since that implies coordinate information.

Raw data

Importing data to a raw bioassay which uses a virtual array design is affected. The RawDataBatcher need to use the reporter ID instead of the coordinates as the key in the preloaded cache. The doInsert() method should use the reporter ID instead of the coordinates to lookup the feature. Coordinate information in the RawData object should be ignored and the fake coordinates from the feature are copied instead. This should be documented as a possible incompatible change.

In the web interface, coordinate information should not be visible.

A remaining issue is how to handle the case were a user changes to another array design. This is more or less impossible right now since this validates that all coordinates and reporters are the same for both the new and old array design. This will probably never be the case if fake coordinates are generated by the core. This also includes the case of changing from a null array design to some specific array design. What we really need to do is to only match by reporter and to change the fake coordinates of the raw data.

I think this is all we NEED to do. Existing plug-ins will probably work as they are, but for better user experience some changes may be needed.

  • Illumina raw data importer:
    • If an array design is selected, it should be verified that the 'isVirtual' flag is set.
    • Doesn't have to set coordinates if an array design is selected. Should continue generating fake coordinates if no array design is selected
  • Reporter map importer
    • Should ignore coordinate mappings if importing to an array design with 'isVirtual=true'
    • Use the new method in the feature batcher
  • Print map importer
    • Can't be used with virtual array designs which should be reported by isInContext() method.
  • Plug-in configurations
    • The reporter map configuration for Illumina should not have mappings for coordinates.

Update script

In the first step, the 'isVirtual' flag should be set to false for all existing array designs.

It would be nice to be able to detect which array designs that are Illumina designs and set the 'isVirtual' to true. How can we do this? By checking if a raw bioassay with the proper raw data type is linking to them? What if raw bioassays of other raw data types are also linking to them?

comment:3 by Nicklas Nordborg, 16 years ago

Priority: criticalblocker

comment:4 by Nicklas Nordborg, 16 years ago

Description: modified (diff)
Milestone: BASE 2.6
Resolution: wontfix
Status: assignedclosed
Note: See TracTickets for help on using tickets.