Base1PluginExecuter doesn't import plug-in generated BASEfiles as expected
The first case was a filtering plug-in which caused the raw data links to get lost. The reason for this was that the plug-in generated an "assays" section in the generated file. This triggered a route in the Base1 plug-in executor where a new data cube is created, which in it's turn means that the spot->raw data mapping is lost. Further analysis of the Base1 plug-in executor code has shown that there are four cases but only two of the generate the correct result:
- The output has no "assays" section and all positions has the same reporter as in the input file. This works as expected. The same data cube is used and no mappings are lost.
- The output has no "assays" section, but one or more positions has a different reporter than in the input. This doesn't works as expected. The same data cube is still used which means that the old postion->reporter mapping is kept. The new mapping in the output file is ignored.
- There is an "assays" section and all positions has the same reporter as in the input file. This doesn't work as expected. A new data cube is created and also a new position->reporter mapping. The mapping to raw data is lost.
- There is an "assays" section and one or more positions has a different reporter than in the input. This works as expected. A new data cube is created and also a new position->reporter mapping. The mapping to raw data is lost.
In addition to the four cases above there are some conditions that lead to an error. For example, if the position->reporter mapping is not consistent in the output file or if it is referring to a non-existing parent bioassay.
Basing the decision about creating a new data cube or not purely on the existence of the "assays" section is clearly not a good idea. It is a bit more complicated than that. Here is a first attempt to create a list of conditions. There may be more that I don't know of yet. Anyway, a new data cube should be create if at least one of the conditions below are true. If none if them is true the data should be imported into the existing data cube.
- One or more positions have a different reporter in the output than in the input
- There a new positions in the output that didn't exist in the input
- There is an "assays" section AND at least one assay has more than one "parent"
In the case where C is the only condition that is true it should be possible preserve the raw data mapping, but it may be a question of performance as well.
Basing the decision about creating a new data cube or not on conditions, e.g., A-C above, seems to be the right way to go. This would ensure that case scenario 2 (above) wouldn’t happen and also take care of the erroneous behavior described in case scenario 3.
Decision-making conditions A-C appear pretty solid and covers possibilities I can foresee based on the BASE1 plugins we’ve used in Lund (probably I have overlooked something).
However, there are plugins (such as Merge bioassays) that would return true for condition C although in this case one would want to retain mapping to raw data. For this plugin (and other BASE1 plugins that behave similarly) it might be better to re-implement the functionality as a BASE2 plugin rather than having the Base1 Plugin Executer handle mapping?
Example files (stdin and stdout) for a number of plugins when running in BASE1 are attached.
I do not thing any existing plugins (at least used in Lund) fall under condition 2 or 4.