Ticket #1028: batchimport_userperspective-2.txt

File batchimport_userperspective-2.txt, 10.9 KB (added by Jari Häkkinen, 14 years ago)
Line 
1There is a need to facilitate batch upload, creation, and modification
2of items in BASE. Some batch tools already exists such as
3
4 - batch upload of files using zip files
5 - batch creation of array slides
6 - batch addition/deletion of reporters
7 - import of annotations
8 - list views offer an import button but which view do actually offer
9   a plug-in that does anything?
10
11For a single or few experiment setting there is not so urgent need for
12batch tools but for a microarray facility where many experiments are
13prepared by facility staff the need is eminent. At a facility site
14many experiments are conducted by few people and all data upload is
15done by these staff members. To ease the upload of data to BASE we
16suggest to create one or several plug-ins that can create or modify
17several items in a batch by reading information from tab separated
18files. The idea here is not to create one single monolithic plug-in
19that imports a complete experiment and creates all necessary items,
20but rather imports, creates, or modifies items for a given context and
21makes the proper associations to parents. The word 'import' is used in
22this document but it could just as well be create or modify depending
23on user requirements.
24
25There is ongoing work on a full experiment import from tab2mage
26formatted files, see
27http://baseplugins.thep.lu.se/wiki/uk.ac.ebi.Tab2MageImporter.
28
29The plug-in requirements outlined here is to be used in a context
30where the user ideally works interactively with BASE in a step-by-step
31procedure. The idea is that the interaction with BASE starts on some
32level and data is added from this level down. Here a sample work
33session is outlined where RNA is extracted and labeled starting from
34some source of biomaterial. In BASE this follows the path of biosource
35- sample - extract - labeled extract, and the continuing with
36hybridization - scan - raw bioassay - bioassay - bioassay sets -
37analysis.
38
39Starting at the bio source level, the user must make an initial import
40of biosource information or use the BASE web interface for adding
41biosource items. Samples are created from these biosources, in BASE
42context this means that sample information needs to be added. In this
43example we want to associate the samples to their parents, changing
44sample properties follows a similar path but the import files do not
45require parent information. The import of sample data is started with
46selecting the biosources associated with the samples in BASE, and then
47exporting this information to a file. This file is used as a template
48for entering sample data to be stored in BASE. The reason for using
49this template is to ensure that the correct biosource identifiers are
50used for the samples. (A user can of course create the file without the
51export from BASE but has to make sure that items are properly
52referenced.) The biosource identifiers are required for making
53parent-child association within BASE. When the samples are added to
54this file, the file is imported into BASE. After this import, the
55sample information is exported to a file again, and this file is used
56as a template for the extracts information. Again, the reason for this
57is to make sure that proper BASE identifiers are used. Extract
58information is added to the template and imported back to BASE. This
59procedure is performed for each level of data entry.
60
61The information optionally exported to be used as templates above are
62simple tab separated files with a few columns of information about the
63items. The columns exported have a two-fold purpose; i) make sure that
64BASE can make the proper associations when importing data, ii) guide
65the users when adding information to the template file, i.e.,
66descriptive names for human interpretation.
67
68Items that should be supported by the item importer(s) are:
69
70 - Biosources: top level, currently no parent items to associate
71
72 - Samples: biosource (or pooled samples) to associate. Create sample
73   events for pools to decrease pooled samples.
74
75 - Extracts: samples (or pooled extracts) and protocols to
76   associate. Create sample events to decrease sample amounts and
77   extract events to decrease extract amounts for pooled extracts.
78
79 - Labeled extracts: extracts (or pooled labeled extracts) and
80   protocols to associate. Create extract events to decrease extract
81   amounts and labeled extract events to decrease pooled amounts.
82
83 - Hybridizations: multiple lableled extracts, ... more to come
84 - scan: ...
85 - Raw bioassays: ...
86 - Experiments: ...
87   + bioassay
88   + bioassay sets
89   + analysis
90
91
92A detailed discussion on the different export/import steps, sample
93files for the different item types are available as attachements to
94ticket 1028 (http://base.thep.lu.se/ticket/1028) at the BASE web site.
95
96Sample files based on trunk revision 4301 were exported and modified
97for items from biosource level down to the labeled extracts level. An
98OpenOffice.org spreadsheet (batchimport_sample.ods) that contain
99format information with explanations in one document is also
100available. A tentatiove aim is that the spreadsheet may be used by
101laborative staff to fill information to be used in import to BASE.
102
103Dry-run that explain what will be done during import should be
104supported.  Potential dangers and errors should be reported. This will
105allow the user to check that the import will behave as expected.
106
107
108Biosource
109
110This is currently the top level of associations. No association are
111needed except for the optional reference to an external item (a
112property of the biosource). The import is a straightforward tab
113separated import to fill the item properties.
114
115Fields to import are: 'Name', 'Description', 'External id'
116
117Mandatory columns for imports: 'Name'
118
119Sample export file: biosource_out.txt
120Sample import file: biosource_in.txt
121
122
123Sample
124
125The import of item properties is a straightforward tab separated
126import. Compared to biosource items there are additional columns for
127associations to other items (the parent biosource and protocol). There
128is one parent only if the parent is a biosource, pooled samples may
129have multiple parents (other samples) defined using multiple lines.
130
131Pooled samples create 'Event's that decrease the parent amount.
132
133Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
134'External id', 'Protocol', 'Created', 'Pooled'
135
136Items to make associations to: Biosource
137
138Mandatory columns for imports: 'Name'
139
140The important difference compared with biosource items is the possible
141associations to bioassays and protcols.
142
143Sample export file: sample_out.txt
144Sample import file: sample_in.txt
145
146
147Extract
148
149The import of item properties is a straightforward tab separated
150import. There are additional columns for associations to the parent
151item and other items. There is one parent only if the parent is a
152sample, pooled extracts may have multiple parents (other extracts)
153defined using multiple lines.
154
155Extracts and pooled extracts create 'Event's that decrease the parent
156amount.
157
158Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
159'External id', 'Protocol', 'Created', 'Pooled'
160
161Items to make associations to: Sample (also decrease quantity),
162Protocol
163
164Mandatory columns for imports: 'Name'
165
166Extract export file: extract_out.txt
167Extract import file: extract_in.txt
168
169
170Labeled Extract
171
172The import of item properties is a straightforward tab separated
173import. There are additional columns for associations to the parent
174item and other items. There is one parent only if the
175parent is an extract, pooled labeled extracts may have multiple
176parents (other labeled extracts) defined using multiple lines.
177
178There is an additional column as compared to the extract items, Label.
179
180Fields to import are: 'Name', 'Label', 'Original quantity (µg)',
181'Description', 'External id', 'Protocol', 'Created', 'Pooled'
182
183Items to make associations to: Extract (also decrease quantity),
184Protocol, Label
185
186Mandatory columns for imports: 'Name'
187
188Labeledextract export file: labeledextract_out.txt
189Labeledextract import file: labeledextract_in.txt
190
191
192Hybridization (To be written)
193Below text is just copy/pasted.
194
195The import of item properties is a straightforward tab separated
196import. There are additional columns for associations to the parent
197item and other items. There is one parent only if the
198parent is an extract, pooled labeled extracts may have multiple
199parents (other labeled extracts) defined using multiple lines.
200
201There is an additional column as compared to the extract items, Label.
202
203Fields to import are: 'Name', 'Label', 'Original quantity (µg)',
204'Description', 'External id', 'Protocol', 'Created', 'Pooled'
205
206Items to make associations to: Extract (also decrease quantity),
207Protocol, Label
208
209Mandatory columns for imports: 'Name'
210
211Hybridization export file: hybridization_out.txt
212Hybridization import file: hybrdidization_in.txt
213
214
215Scan (To be written)
216Below text is just copy/pasted.
217
218The import of item properties is a straightforward tab separated
219import. There are additional columns for associations to the parent
220item and other items. There is one parent only if the
221parent is an extract, pooled labeled extracts may have multiple
222parents (other labeled extracts) defined using multiple lines.
223
224There is an additional column as compared to the extract items, Label.
225
226Fields to import are: 'Name', 'Label', 'Original quantity (µg)',
227'Description', 'External id', 'Protocol', 'Created', 'Pooled'
228
229Items to make associations to: Extract (also decrease quantity),
230Protocol, Label
231
232Mandatory columns for imports: 'Name'
233
234Scan export file: scan_out.txt
235Scan import file: scan_in.txt
236
237
238Raw bioassay (To be written)
239Below text is just copy/pasted.
240
241The import of item properties is a straightforward tab separated
242import. There are additional columns for associations to the parent
243item and other items. There is one parent only if the
244parent is an extract, pooled labeled extracts may have multiple
245parents (other labeled extracts) defined using multiple lines.
246
247There is an additional column as compared to the extract items, Label.
248
249Fields to import are: 'Name', 'Label', 'Original quantity (µg)',
250'Description', 'External id', 'Protocol', 'Created', 'Pooled'
251
252Items to make associations to: Extract (also decrease quantity),
253Protocol, Label
254
255Mandatory columns for imports: 'Name'
256
257rawbioassay export file: rawbioassay_out.txt
258rawbioassay import file: rawbioassay_in.txt
259
260
261Experiment (To be written)
262Below text is just copy/pasted.
263
264The import of item properties is a straightforward tab separated
265import. There are additional columns for associations to the parent
266item and other items. There is one parent only if the
267parent is an extract, pooled labeled extracts may have multiple
268parents (other labeled extracts) defined using multiple lines.
269
270There is an additional column as compared to the extract items, Label.
271
272Fields to import are: 'Name', 'Label', 'Original quantity (µg)',
273'Description', 'External id', 'Protocol', 'Created', 'Pooled'
274
275Items to make associations to: Extract (also decrease quantity),
276Protocol, Label
277
278Mandatory columns for imports: 'Name'
279
280Experiment export file: experiment_out.txt
281Experiment import file: experiment_in.txt