Context Navigation

Back to Ticket #1135

Ticket #1135: multi-rawdata-importer-1.txt

File multi-rawdata-importer-1.txt, 5.1 KB (added by Nicklas Nordborg, 16 years ago)
Implementation specifications v.1

Line
1
2	Generic multi-importer for raw data
3	===================================
4
5	What it is
6	----------
7	A plug-in that can import data to multiple raw bioassays in
8	one go. The plug-in doesn't do the data import itself, but uses
9	already existing plug-ins that import raw data to a single raw
10	bioassay at a time.
11
12	The plug-in can only import data of a single type at a time. This
13	means that the actual plug-in/file format that performs the data
14	import must be the same for all raw bioassays that are imported
15	in a single session. We believe that this limitation is not a
16	very severe one, since in the typical use case there is only a
17	limited number of data types/file formats present in a experiment.
18	Most experiments probably only have one, and a dye-swap experiment
19	two.
20
21	In the text below we will call the actual import plug-in for the
22	"worker plug-in".
23
24
25	When and were to use the plug-in
26	--------------------------------
27	The importer should be an import-type plug-in that works from
28	the single-item view of an experiment.
29	Eg. GuiContext = EXPERIMENT, ITEM
30
31	It should only work for platforms that supports importing raw
32	data into the database. Eg. isInContext() should check:
33	Experiment.getRawDataType().isStoredInDb() == true.
34
35	It should also check that at least one of the raw bioassays in the
36	experiment doesn't have imported data already, and that there
37	are files attached that it is possible to use for the raw data import.
38
39
40
41	Parameter input
42	---------------
43
44	Step 1.
45	The first step is to select which of the raw bioassays that doesn't
46	have raw data that we should import raw data to. The requirement is
47	that the raw bioassays doesn't already have raw data and has files
48	attached to them.
49
50	In this step the user should also select if the a worker plug-in and/or
51	file format should be selected manually or by trying to auto-detect
52	a suitable file format.
53
54
55	Step 2.
56	In manual selection mode, the user is allowed to select a worker
57	plug-in/file format.
58
59	In auto-detection mode, the user should confirm the result of the
60	auto-detection or select a worker plug-in/file format if more than
61	one was found.
62
63	Step 3.
64	This invokes the job configuration sequence for the selected worker plug-in.
65	This is not straightforward, since the worker plug-in most likely has
66	parameters that it is hard to provide values for. For example, the worker
67	plug-in most likely requires a single raw bioassay item and a single file
68	item. But we have many raw bioassays, each one with different files.
69	Another case is the Illumina IBS platform which may have more than one
70	file.
71
72	We need some kind of proxy wrapper so we can "fool" the job configuration
73	sequence to complete. Then, when the multi-import is executing we need to
74	replace the wrapper parameters with the real raw bioassay and file(s).
75
76	This parameter wrapping can be designed in a couple of different ways.
77	The best would be if we could let the worker plug-in tell us about which
78	parameter to wrap and how to provide values for them. This can be done by
79	defining an interface that the worker plug-ins can implement. But,
80	we also need to support existing plug-ins and this means that we somehow
81	need to guess/assume a few things about them. I can think of the following:
82
83	1. The worker plug-in must have a parameter asking for a single raw bioassay.
84	The value for this parameter is set to each of the selected raw bioassays.
85
86	2. The worker plug-in must have at least one parameter asking for a file.
87	The value for this parameter is set to the file that is attached to
88	the raw bioassay that has the generic type FileType.RAW_DATA.
89
90	3. If the worker plug-in has more than one file parameter (for example the
91	Illumina IBS platform), there must be the same number of files attached
92	to the raw bioassay. In the Illumina case it doesn't matter which file we
93	attach to which file parameter, and I don't know how we should be able to
94	guess which file goes were if it does.
95
96	Step 4.
97	When all parameters (error handling options, etc.) for the worker plug-in
98	has been selected the job is queued as any other job.
99
100
101	Running the plug-in
102	-------------------
103	The actual running of the multi-importer plug-in follows this outline:
104
105	1. We are looping over the selected raw bioassays
106	2. An instance of the selected worker plug-in is created
107	The Plug-in API doesn't allow us to re-use a plug-in
108	instance, so a new instance has to be created for each
109	raw bioassay. There are a few things to be aware of:
110	- The instance must be properly initialised in a way that
111	is compatible with how the core initialises a plug-in.
112	- Signalling must be setup if we want to support aborting
113	the plug-in (and we do want that).
114	2. The job configuration sequence for the worker plug-in is
115	started. The current raw bioassay and file are used as
116	parameters according to the rules above. Other parameters,
117	such as error handling options are taken from the multi-
118	importer job configuration.
119	3. The worker plug-in performs the import.
120	4. 2 and 3 is repeated for each raw bioassay.
121
122	The result is reported as the number of raw bioassays that got
123	imported/failed. The multi-importer should support logging to
124	a log file more detailed information about the indivudual imports.
125

Download in other formats:

Original Format