Ticket #1135: multi-rawdata-importer-1.txt

File multi-rawdata-importer-1.txt, 5.1 KB (added by Nicklas Nordborg, 15 years ago)

Implementation specifications v.1

Line 
1
2Generic multi-importer for raw data
3===================================
4
5What it is
6----------
7A plug-in that can import data to multiple raw bioassays in
8one go. The plug-in doesn't do the data import itself, but uses
9already existing plug-ins that import raw data to a single raw
10bioassay at a time.
11
12The plug-in can only import data of a single type at a time. This
13means that the actual plug-in/file format that performs the data
14import must be the same for all raw bioassays that are imported
15in a single session. We believe that this limitation is not a
16very severe one, since in the typical use case there is only a
17limited number of data types/file formats present in a experiment.
18Most experiments probably only have one, and a dye-swap experiment
19two.
20
21In the text below we will call the actual import plug-in for the
22"worker plug-in".
23
24
25When and were to use the plug-in
26--------------------------------
27The importer should be an import-type plug-in that works from
28the single-item view of an experiment.
29Eg. GuiContext = EXPERIMENT, ITEM
30
31It should only work for platforms that supports importing raw
32data into the database. Eg. isInContext() should check:
33Experiment.getRawDataType().isStoredInDb() == true.
34
35It should also check that at least one of the raw bioassays in the
36experiment doesn't have imported data already, and that there
37are files attached that it is possible to use for the raw data import.
38
39
40
41Parameter input
42---------------
43
44Step 1.
45The first step is to select which of the raw bioassays that doesn't
46have raw data that we should import raw data to. The requirement is
47that the raw bioassays doesn't already have raw data and has files
48attached to them.
49
50In this step the user should also select if the a worker plug-in and/or
51file format should be selected manually or by trying to auto-detect
52a suitable file format.
53
54
55Step 2.
56In manual selection mode, the user is allowed to select a worker
57plug-in/file format.
58
59In auto-detection mode, the user should confirm the result of the
60auto-detection or select a worker plug-in/file format if more than
61one was found.
62
63Step 3.
64This invokes the job configuration sequence for the selected worker plug-in.
65This is not straightforward, since the worker plug-in most likely has
66parameters that it is hard to provide values for. For example, the worker
67plug-in most likely requires a single raw bioassay item and a single file
68item. But we have many raw bioassays, each one with different files.
69Another case is the Illumina IBS platform which may have more than one
70file.
71
72We need some kind of proxy wrapper so we can "fool" the job configuration
73sequence to complete. Then, when the multi-import is executing we need to
74replace the wrapper parameters with the real raw bioassay and file(s).
75
76This parameter wrapping can be designed in a couple of different ways.
77The best would be if we could let the worker plug-in tell us about which
78parameter to wrap and how to provide values for them. This can be done by
79defining an interface that the worker plug-ins can implement. But,
80we also need to support existing plug-ins and this means that we somehow
81need to guess/assume a few things about them. I can think of the following:
82
831. The worker plug-in must have a parameter asking for a single raw bioassay.
84 The value for this parameter is set to each of the selected raw bioassays.
85
862. The worker plug-in must have at least one parameter asking for a file.
87 The value for this parameter is set to the file that is attached to
88 the raw bioassay that has the generic type FileType.RAW_DATA.
89
903. If the worker plug-in has more than one file parameter (for example the
91 Illumina IBS platform), there must be the same number of files attached
92 to the raw bioassay. In the Illumina case it doesn't matter which file we
93 attach to which file parameter, and I don't know how we should be able to
94 guess which file goes were if it does.
95
96Step 4.
97When all parameters (error handling options, etc.) for the worker plug-in
98has been selected the job is queued as any other job.
99
100
101Running the plug-in
102-------------------
103The actual running of the multi-importer plug-in follows this outline:
104
1051. We are looping over the selected raw bioassays
1062. An instance of the selected worker plug-in is created
107 The Plug-in API doesn't allow us to re-use a plug-in
108 instance, so a new instance has to be created for each
109 raw bioassay. There are a few things to be aware of:
110 - The instance must be properly initialised in a way that
111 is compatible with how the core initialises a plug-in.
112 - Signalling must be setup if we want to support aborting
113 the plug-in (and we do want that).
1142. The job configuration sequence for the worker plug-in is
115 started. The current raw bioassay and file are used as
116 parameters according to the rules above. Other parameters,
117 such as error handling options are taken from the multi-
118 importer job configuration.
1193. The worker plug-in performs the import.
1204. 2 and 3 is repeated for each raw bioassay.
121
122The result is reported as the number of raw bioassays that got
123imported/failed. The multi-importer should support logging to
124a log file more detailed information about the indivudual imports.
125