Context Navigation

Back to Ticket #1028

Ticket #1028: batchimport_userperspective-3.txt

File batchimport_userperspective-3.txt, 11.1 KB (added by Jari Häkkinen, 16 years ago)

Line
1	There is a need to facilitate batch upload, creation, and modification
2	of items in BASE. Some batch tools already exists such as
3
4	- batch upload of files using zip files
5	- batch creation of array slides
6	- batch addition/deletion of reporters
7	- import of annotations
8	- list views offer an import button but which view do actually offer
9	a plug-in that does anything?
10
11	For a single or few experiment setting there is not so urgent need for
12	batch tools but for a microarray facility where many experiments are
13	prepared by facility staff the need is eminent. At a facility site
14	many experiments are conducted by few people and all data upload is
15	done by these staff members. To ease the upload of data to BASE we
16	suggest to create one or several plug-ins that can create or modify
17	several items in a batch by reading information from tab separated
18	files. The idea here is not to create one single monolithic plug-in
19	that imports a complete experiment and creates all necessary items,
20	but rather imports, creates, or modifies items for a given context and
21	makes the proper associations to parents. The word 'import' is used in
22	this document but it could just as well be create or modify depending
23	on user requirements.
24
25	There is ongoing work on a full experiment import from tab2mage
26	formatted files, see
27	http://baseplugins.thep.lu.se/wiki/uk.ac.ebi.Tab2MageImporter.
28
29	The plug-in requirements outlined here is to be used in a context
30	where the user ideally works interactively with BASE in a step-by-step
31	procedure. The idea is that the interaction with BASE starts on some
32	level and data is added from this level down. Here a sample work
33	session is outlined where RNA is extracted and labeled starting from
34	some source of bio-material. In BASE this follows the path of
35	'biosource' - 'sample' - 'extract' - 'labeled extract', and then
36	continuing with 'hybridization' - 'scan' - 'raw bioassay' - 'bioassay'
37	- 'bioassay sets' - 'analysis'. However, it is recommended that array
38	information is imported before hybridization import following the path
39	'array design' - 'array batch' - 'array slide' - 'hybridization' -
40	'scan' ...
41
42	The proposed plug-in should be usable starting from 'biosource' and
43	'array design' views down to the raw bioassay step, beyond this manual
44	import or use of other plug-ins are required. There is already a batch
45	raw data importer available at the BASE plug-in site
46	(http://baseplugins.thep.lu.se/wiki/uk.ac.scri.batchimporter) for
47	import of raw data and experiment creation. The SCRI batch importer
48	should be adopted, if necessary, to create experiments and raw
49	bioassays in line with the batch importer plug-in we create.
50
51	The import of array design items will not create fully working designs
52	because of two reasons; i) File upload is required for the design
53	definition and ii) file upload may be required for feature import. The
54	proposed plug-in does not support file upload. However, the items can
55	be created and modified, but files and features must be manually
56	fixed. This is not a big issue in reality since new design are not
57	created too often.
58
59	Starting at the bio source level, the user must make an initial import
60	of biosource information or use the BASE web interface for adding
61	biosource items. Samples are created from these biosources, in BASE
62	context this means that sample information needs to be added. In this
63	example we want to associate the samples to their parents, changing
64	sample properties follows a similar path but the import files do not
65	require parent information. The import of sample data is started with
66	selecting the biosources associated with the samples in BASE, and then
67	exporting this information to a file. This file is used as a template
68	for entering sample data to be stored in BASE. The reason for using
69	this template is to ensure that the correct biosource identifiers are
70	used for the samples. (A user can of course create the file without the
71	export from BASE but has to make sure that items are properly
72	referenced.) The biosource identifiers are required for making
73	parent-child association within BASE. When the samples are added to
74	this file, the file is imported into BASE. After this import, the
75	sample information is exported to a file again, and this file is used
76	as a template for the extracts information. Again, the reason for this
77	is to make sure that proper BASE identifiers are used. Extract
78	information is added to the template and imported back to BASE. This
79	procedure is performed for each level of data entry.
80
81	The information optionally exported to be used as templates above are
82	simple tab separated files with a few columns of information about the
83	items. The columns exported have a two-fold purpose; i) make sure that
84	BASE can make the proper associations when importing data, ii) guide
85	the users when adding information to the template file, i.e.,
86	descriptive names for human interpretation.
87
88	Dry-run that explain what will be done during import should be
89	supported. Potential dangers and errors should be reported. This
90	feature will allow the user to check that the import will behave as
91	expected.
92
93	Below follows a short description of item types that should be
94	supported by the importer. An OpenOffice.org spreadsheet
95	(batchimport_sample.ods) that contain format information with
96	explanations in one document is maintained and made available as an
97	attachment to ticket 1028 (http://base.thep.lu.se/ticket/1028) at the
98	BASE web site. The spreadsheet is work in progress and may change
99	depending on requirements until the batch import is finalized. Example
100	import files can be created from the spread sheet.
101
102	A tentative aim is that the spreadsheet may be used by laborative
103	staff to fill information to be used in import to BASE.
104
105
106	A short description on the different item types to be imported by the
107	batch importer:
108
109	Biosource
110
111	This is currently the top level of associations. No association are
112	needed except for the optional reference to an external item (a
113	property of the biosource). The import is a straightforward tab
114	separated import to fill the item properties.
115
116	Fields to import are: 'Name', 'Description', 'External id'
117
118	Mandatory columns for imports: 'Name'
119
120	Sample export file: biosource_out.txt
121
122
123	Sample
124
125	The import of item properties is a straightforward tab separated
126	import. Compared to biosource items there are additional columns for
127	associations to other items (the parent biosource and protocol). There
128	is one parent only if the parent is a biosource, pooled samples may
129	have multiple parents (other samples) defined using multiple lines.
130
131	Pooled samples create 'Event's that decrease the parent amount. The
132	original quantity of a pooled sample is the sum of the pooled
133	components.
134
135	Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
136	'External id', 'Created', 'Pooled'
137
138	Items to make associations to: 'Biosource', 'Protocol', 'Sample' for
139	pooled entries (also decrease quantity 'Sample Used')
140
141	Mandatory columns for imports: 'Name'
142
143	The important difference compared with biosource items is the possible
144	associations to bioassays and protocols.
145
146	Sample export file: sample_out.txt
147
148
149	Extract
150
151	The import of item properties is a straightforward tab separated
152	import. There are additional columns for associations to the parent
153	item and other items. There is one parent only if the parent is a
154	sample, pooled extracts may have multiple parents (other extracts)
155	defined using multiple lines.
156
157	Extracts and pooled extracts create 'Event's that decrease the parent
158	amount. The original quantity of a pooled extract is the sum of the
159	pooled components.
160
161	Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
162	'External id', 'Created', 'Pooled'
163
164	Items to make associations to: 'Sample' (also decrease quantity
165	'Sample Used'), 'Protocol'
166
167	Mandatory columns for imports: 'Name'
168
169	Extract export file: extract_out.txt
170
171
172	Labeled Extract
173
174	The import of item properties is a straightforward tab separated
175	import. There are additional columns for associations to the parent
176	item and other items. There is one parent only if the
177	parent is an extract, pooled labeled extracts may have multiple
178	parents (other labeled extracts) defined using multiple lines.
179
180	Labeled extracts and pooled labeled extracts create 'Event's that
181	decrease the parent amount. The original quantity of a pooled labeled
182	extract is the sum of the pooled components.
183
184	There is an additional column as compared to the extract items, Label.
185
186	Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
187	'External id', 'Created', 'Pooled'
188
189	Items to make associations to: 'Extract' (also decrease quantity
190	'Extract Used'), 'Protocol', 'Label'
191
192	Mandatory columns for imports: 'Name'
193
194	Labeledextract export file: labeledextract_out.txt
195
196
197	Array Design
198
199	The import of item properties is a straightforward tab separated
200	import. There is one additional column for association to the parent
201	item. Note, the import of array design items will not create fully
202	working designs, see more information above.
203
204	Fields to import are: 'Name', 'Description', 'Arrays/slide'
205
206	Items to make associations to: 'Platform/Variant'
207
208	Mandatory columns for imports: 'Name', 'Platform/Variant',
209	'Arrays/slide'
210
211	Array batch export file: arraybatch_out.txt
212
213
214	Array Batch
215
216	The import of item properties is a straightforward tab separated
217	import. There is one additional column for association to the parent
218	item.
219
220	Fields to import are: 'Name', 'Description'
221
222	Items to make associations to: 'Array design', 'Protocol', 'Hardware'
223
224	Mandatory columns for imports: 'Name', 'Array design'
225
226	Array batch export file: arraybatch_out.txt
227
228
229	Array Slide
230
231	The import of item properties is a straightforward tab separated
232	import. There is one additional columns for association to the parent
233	item.
234
235	Fields to import are: 'Name', 'Description', 'Barcode', 'Destroyed'
236
237	Items to make associations to: 'Array batch'
238
239	Mandatory columns for imports: 'Name', 'Array batch'
240
241	Array slide export file: arrayslide_out.txt
242
243
244	Hybridization
245
246	The import of item properties is a straightforward tab separated
247	import. There are additional columns for associations to parent items
248	and other items. There may be one or more parents (labeled extracts),
249	the number depends on how many arrays there are on each slide and on
250	the number of channels of the platform used. Multiple parents are
251	defined on multiple lines.
252
253	There are additional columns as compared to the labeled extract items,
254	'Hardware', 'Array slide', 'Arrays', and 'Array index'.
255
256	Fields to import are: 'Name' 'Description', 'Created', 'Arrays',
257	'Array index'
258
259	Items to make associations to: 'Lableled extract' (also decrease
260	quantity 'Parent Used'), 'Protocol', 'Array slide', 'Hardware'
261
262	Mandatory columns for imports: 'Name', 'Arrays'
263
264	Hybridization export file: hybridization_out.txt
265
266
267	Scan
268
269	The import of item properties is a straightforward tab separated
270	import. There are additional columns for associations to the parent
271	item and other items. There is one parent hybridization only.
272
273	There are no additional column as compared with the previous
274	items. Image upload through is not supported by the importer.
275
276	Fields to import are: 'Name', 'Description'
277
278	Items to make associations to: 'Hybridization', 'Protocol', 'Hardware'
279
280	Mandatory columns for imports: 'Name'
281
282	Scan export file: scan_out.txt

Download in other formats:

Original Format