Ticket #1028: batchimport_userperspective-3.txt

File batchimport_userperspective-3.txt, 11.1 KB (added by Jari Häkkinen, 14 years ago)
Line 
1There is a need to facilitate batch upload, creation, and modification
2of items in BASE. Some batch tools already exists such as
3
4 - batch upload of files using zip files
5 - batch creation of array slides
6 - batch addition/deletion of reporters
7 - import of annotations
8 - list views offer an import button but which view do actually offer
9   a plug-in that does anything?
10
11For a single or few experiment setting there is not so urgent need for
12batch tools but for a microarray facility where many experiments are
13prepared by facility staff the need is eminent. At a facility site
14many experiments are conducted by few people and all data upload is
15done by these staff members. To ease the upload of data to BASE we
16suggest to create one or several plug-ins that can create or modify
17several items in a batch by reading information from tab separated
18files. The idea here is not to create one single monolithic plug-in
19that imports a complete experiment and creates all necessary items,
20but rather imports, creates, or modifies items for a given context and
21makes the proper associations to parents. The word 'import' is used in
22this document but it could just as well be create or modify depending
23on user requirements.
24
25There is ongoing work on a full experiment import from tab2mage
26formatted files, see
27http://baseplugins.thep.lu.se/wiki/uk.ac.ebi.Tab2MageImporter.
28
29The plug-in requirements outlined here is to be used in a context
30where the user ideally works interactively with BASE in a step-by-step
31procedure. The idea is that the interaction with BASE starts on some
32level and data is added from this level down. Here a sample work
33session is outlined where RNA is extracted and labeled starting from
34some source of bio-material. In BASE this follows the path of
35'biosource' - 'sample' - 'extract' - 'labeled extract', and then
36continuing with 'hybridization' - 'scan' - 'raw bioassay' - 'bioassay'
37- 'bioassay sets' - 'analysis'. However, it is recommended that array
38information is imported before hybridization import following the path
39'array design' - 'array batch' - 'array slide' - 'hybridization' -
40'scan' ...
41
42The proposed plug-in should be usable starting from 'biosource' and
43'array design' views down to the raw bioassay step, beyond this manual
44import or use of other plug-ins are required. There is already a batch
45raw data importer available at the BASE plug-in site
46(http://baseplugins.thep.lu.se/wiki/uk.ac.scri.batchimporter) for
47import of raw data and experiment creation. The SCRI batch importer
48should be adopted, if necessary, to create experiments and raw
49bioassays in line with the batch importer plug-in we create.
50
51The import of array design items will not create fully working designs
52because of two reasons; i) File upload is required for the design
53definition and ii) file upload may be required for feature import. The
54proposed plug-in does not support file upload. However, the items can
55be created and modified, but files and features must be manually
56fixed. This is not a big issue in reality since new design are not
57created too often.
58
59Starting at the bio source level, the user must make an initial import
60of biosource information or use the BASE web interface for adding
61biosource items. Samples are created from these biosources, in BASE
62context this means that sample information needs to be added. In this
63example we want to associate the samples to their parents, changing
64sample properties follows a similar path but the import files do not
65require parent information. The import of sample data is started with
66selecting the biosources associated with the samples in BASE, and then
67exporting this information to a file. This file is used as a template
68for entering sample data to be stored in BASE. The reason for using
69this template is to ensure that the correct biosource identifiers are
70used for the samples. (A user can of course create the file without the
71export from BASE but has to make sure that items are properly
72referenced.) The biosource identifiers are required for making
73parent-child association within BASE. When the samples are added to
74this file, the file is imported into BASE. After this import, the
75sample information is exported to a file again, and this file is used
76as a template for the extracts information. Again, the reason for this
77is to make sure that proper BASE identifiers are used. Extract
78information is added to the template and imported back to BASE. This
79procedure is performed for each level of data entry.
80
81The information optionally exported to be used as templates above are
82simple tab separated files with a few columns of information about the
83items. The columns exported have a two-fold purpose; i) make sure that
84BASE can make the proper associations when importing data, ii) guide
85the users when adding information to the template file, i.e.,
86descriptive names for human interpretation.
87
88Dry-run that explain what will be done during import should be
89supported. Potential dangers and errors should be reported. This
90feature will allow the user to check that the import will behave as
91expected.
92
93Below follows a short description of item types that should be
94supported by the importer. An OpenOffice.org spreadsheet
95(batchimport_sample.ods) that contain format information with
96explanations in one document is maintained and made available as an
97attachment to ticket 1028 (http://base.thep.lu.se/ticket/1028) at the
98BASE web site. The spreadsheet is work in progress and may change
99depending on requirements until the batch import is finalized. Example
100import files can be created from the spread sheet.
101
102A tentative aim is that the spreadsheet may be used by laborative
103staff to fill information to be used in import to BASE.
104
105
106A short description on the different item types to be imported by the
107batch importer:
108
109Biosource
110
111This is currently the top level of associations. No association are
112needed except for the optional reference to an external item (a
113property of the biosource). The import is a straightforward tab
114separated import to fill the item properties.
115
116Fields to import are: 'Name', 'Description', 'External id'
117
118Mandatory columns for imports: 'Name'
119
120Sample export file: biosource_out.txt
121
122
123Sample
124
125The import of item properties is a straightforward tab separated
126import. Compared to biosource items there are additional columns for
127associations to other items (the parent biosource and protocol). There
128is one parent only if the parent is a biosource, pooled samples may
129have multiple parents (other samples) defined using multiple lines.
130
131Pooled samples create 'Event's that decrease the parent amount. The
132original quantity of a pooled sample is the sum of the pooled
133components.
134
135Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
136'External id', 'Created', 'Pooled'
137
138Items to make associations to: 'Biosource', 'Protocol', 'Sample' for
139pooled entries (also decrease quantity 'Sample Used')
140
141Mandatory columns for imports: 'Name'
142
143The important difference compared with biosource items is the possible
144associations to bioassays and protocols.
145
146Sample export file: sample_out.txt
147
148
149Extract
150
151The import of item properties is a straightforward tab separated
152import. There are additional columns for associations to the parent
153item and other items. There is one parent only if the parent is a
154sample, pooled extracts may have multiple parents (other extracts)
155defined using multiple lines.
156
157Extracts and pooled extracts create 'Event's that decrease the parent
158amount. The original quantity of a pooled extract is the sum of the
159pooled components.
160
161Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
162'External id', 'Created', 'Pooled'
163
164Items to make associations to: 'Sample' (also decrease quantity
165'Sample Used'), 'Protocol'
166
167Mandatory columns for imports: 'Name'
168
169Extract export file: extract_out.txt
170
171
172Labeled Extract
173
174The import of item properties is a straightforward tab separated
175import. There are additional columns for associations to the parent
176item and other items. There is one parent only if the
177parent is an extract, pooled labeled extracts may have multiple
178parents (other labeled extracts) defined using multiple lines.
179
180Labeled extracts and pooled labeled extracts create 'Event's that
181decrease the parent amount. The original quantity of a pooled labeled
182extract is the sum of the pooled components.
183
184There is an additional column as compared to the extract items, Label.
185
186Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
187'External id', 'Created', 'Pooled'
188
189Items to make associations to: 'Extract' (also decrease quantity
190'Extract Used'), 'Protocol', 'Label'
191
192Mandatory columns for imports: 'Name'
193
194Labeledextract export file: labeledextract_out.txt
195
196
197Array Design
198
199The import of item properties is a straightforward tab separated
200import. There is one additional column for association to the parent
201item. Note, the import of array design items will not create fully
202working designs, see more information above.
203
204Fields to import are: 'Name', 'Description', 'Arrays/slide'
205
206Items to make associations to: 'Platform/Variant'
207
208Mandatory columns for imports: 'Name', 'Platform/Variant',
209'Arrays/slide'
210
211Array batch export file: arraybatch_out.txt
212
213
214Array Batch
215
216The import of item properties is a straightforward tab separated
217import. There is one additional column for association to the parent
218item.
219
220Fields to import are: 'Name', 'Description'
221
222Items to make associations to: 'Array design', 'Protocol', 'Hardware'
223
224Mandatory columns for imports: 'Name', 'Array design'
225
226Array batch export file: arraybatch_out.txt
227
228
229Array Slide
230
231The import of item properties is a straightforward tab separated
232import. There is one additional columns for association to the parent
233item.
234
235Fields to import are: 'Name', 'Description', 'Barcode', 'Destroyed'
236
237Items to make associations to: 'Array batch'
238
239Mandatory columns for imports: 'Name', 'Array batch'
240
241Array slide export file: arrayslide_out.txt
242
243
244Hybridization
245
246The import of item properties is a straightforward tab separated
247import. There are additional columns for associations to parent items
248and other items. There may be one or more parents (labeled extracts),
249the number depends on how many arrays there are on each slide and on
250the number of channels of the platform used. Multiple parents are
251defined on multiple lines.
252
253There are additional columns as compared to the labeled extract items,
254'Hardware', 'Array slide', 'Arrays', and 'Array index'.
255
256Fields to import are: 'Name' 'Description', 'Created', 'Arrays',
257'Array index'
258
259Items to make associations to: 'Lableled extract' (also decrease
260quantity 'Parent Used'), 'Protocol', 'Array slide', 'Hardware'
261
262Mandatory columns for imports: 'Name', 'Arrays'
263
264Hybridization export file: hybridization_out.txt
265
266
267Scan
268
269The import of item properties is a straightforward tab separated
270import. There are additional columns for associations to the parent
271item and other items. There is one parent hybridization only.
272
273There are no additional column as compared with the previous
274items. Image upload through is not supported by the importer.
275
276Fields to import are: 'Name', 'Description'
277
278Items to make associations to: 'Hybridization', 'Protocol', 'Hardware'
279
280Mandatory columns for imports: 'Name'
281
282Scan export file: scan_out.txt