Ticket #1028: batchimport_userperspective.txt

File batchimport_userperspective.txt, 7.6 KB (added by Jari Häkkinen, 14 years ago)

First draft of a users view

Line 
1There is a need to facilitate batch upload, creation, and modification
2of items in BASE. Some batch tools already exists such as
3
4 - batch upload of files using zip files
5 - batch creation of array slides
6 - batch addition/deletion of reporters
7 - import of annotations
8 - list views offer an import button but which view do actually offer
9   a plug-in that does anything?
10
11For a single or few experiment setting there is not so urgent need for
12batch tools but for a microarray facility where many experiments are
13prepared by facility staff the need is eminent. At a facility site
14many experiments are conducted by few people and all data upload is
15done by these staff members. To ease the upload of data to BASE we
16suggest to create one or several plug-ins that can create or modify
17several items in a batch by reading information from tab separated
18files. The idea here is not to create one single monolithic plug-in
19that imports a complete experiment and creates all necessary items,
20but rather imports, creates, or modifies items for a given context and
21makes the proper associations to parents. The word 'import' is used in
22this document but it could just as well be create or modify depending
23on user requirements.
24
25There is ongoing work on a full experiment import from tab2mage
26formatted files, see
27http://baseplugins.thep.lu.se/wiki/uk.ac.ebi.Tab2MageImporter.
28
29The plug-in requirements outlined here is to be used in a context
30where the user ideally works interactively with BASE in a step-by-step
31procedure. The idea is that the interaction with BASE starts on some
32level and data is added from this level down. Here a sample work
33session is outlined where RNA is extracted and labeled starting from
34some source of biomaterial. In BASE this follows the path of biosource
35- sample - extract - labeled extract, and the continuing with
36hybridization - scan - raw bioassay - bioassay - bioassay sets -
37analysis.
38
39Starting at the bio source level, the user must make an initial import
40of biosource information or use the BASE web interface for adding
41biosource items. Samples are created from these biosources, in BASE
42context this means that sample information needs to be added. In this
43example we want to associate the samples to their parents, changing
44sample properties follows a similar path but the import files do not
45require parent information. The import of sample data is started with
46selecting the biosources associated with the samples in BASE, and then
47exporting this information to a file. This file is used as a template
48for entering sample data to be stored in BASE. The reason for using
49this template is to ensure that the correct biosource identifiers are
50used for the samples. (A user can of course create the file without the
51export from BASE but has to make sure that items are properly
52referenced.) The biosource identifiers are required for making
53parent-child association within BASE. When the samples are added to
54this file, the file is imported into BASE. After this import, the
55sample information is exported to a file again, and this file is used
56as a template for the extracts information. Again, the reason for this
57is to make sure that proper BASE identifiers are used. Extract
58information is added to the template and imported back to BASE. This
59procedure is performed for each level of data entry.
60
61The information optionally exported to be used as templates above are
62simple tab separated files with a few columns of information about the
63items. The columns exported have a two-fold purpose; i) make sure that
64BASE can make the proper associations when importing data, ii) guide
65the users when adding information to the template file, i.e.,
66descriptive names for human interpretation.
67
68Items that should be supported by the item importer(s) are:
69
70 - Biosources: top level, currently no parent items to associate
71
72 - Samples: biosource (or pooled samples) to associate. Create sample
73   events for pools to decrease pooled samples.
74
75 - Extracts: samples (or pooled extracts) and protocols to
76   associate. Create sample events to decrease sample amounts and
77   extract events to decrease extract amounts for pooled extracts.
78
79 - Labeled extracts: extracts (or pooled labeled extracts) and
80   protocols to associate. Create extract events to decrease extract
81   amounts and labeled extract events to decrease pooled amounts.
82
83 - Hybridizations: multiple lableled extracts, ... more to come
84 - scan
85 - Raw bioassays: ...
86 - Experiments: ...
87
88
89A detailed discussion on the different export/import steps, sample
90files for the different item types are available as attachements to
91ticket 1028 (http://base.thep.lu.se/ticket/1028) at the BASE web site.
92
93Sample files based on trunk revision 4301 were exported and modified
94for items from biosource level down to the labeled extracts level. An
95OpenOffice.org spreadsheet (batchimport_sample.ods) that contain
96format information with explanations in one document is also
97available. A tentatiove aim is that the spreadsheet may be used by
98laborative staff to fill information to be used in import to BASE.
99
100Dry-run that explain what will be done during import should be
101supported.  Potential dangers and errors should be reported. This will
102allow the user to check that the import will behave as expected.
103
104
105Biosource
106
107This is currently the top level of associations. No association are
108needed except for the optional reference to an external item (a
109property of the biosource). The import is a straightforward tab
110separated import to fill the item properties.
111
112The available fields to import are: 'Name', 'Description', 'External
113id'
114
115Mandatory columns for imports: 'Name'
116
117Sample export file: biosource_out.txt
118Sample import file: biosource_in.txt
119
120
121Sample
122
123The import of item properties is a straightforward tab separated
124import. Compared to biosource items there are additional columns for
125associations to other items (the parent biosource and protocol). There
126is one parent only if the parent is a biosource, pooled samples may
127have multiple parents (other samples) defined using multiple lines.
128
129Pooled samples create 'Event's that decrease the parent amount.
130
131The available fields to import are: 'Name', 'Original quantity (µg)',
132'Description', 'External id', 'Protocol', 'Created', 'Pooled'
133
134Mandatory columns for imports: 'Name'
135
136The important difference compared with biosource items is the possible
137associations to bioassays and protcols.
138
139Sample export file: sample_out.txt
140Sample import file: sample_in.txt
141
142
143Extract
144
145The import of item properties is a straightforward tab separated
146import. There are additional columns for associations to the parent
147item and other items. There is one parent only if the parent is a
148sample, pooled extracts may have multiple parents (other extracts)
149defined using multiple lines.
150
151Extracts and pooled extracts create 'Event's that decrease the parent
152amount.
153
154The available fields to import are: 'Name', 'Original quantity (µg)',
155'Description', 'External id', 'Protocol', 'Created', 'Pooled'
156
157Mandatory columns for imports: 'Name'
158
159Extract export file: extract_out.txt
160Extract import file: extract_in.txt
161
162
163Labeled Extract
164
165The import of item properties is a straightforward tab separated
166import. There are additional columns for associations to the parent
167item and other items. There is one parent only if the
168parent is an extract, pooled labeled extracts may have multiple
169parents (other labeled extracts) defined using multiple lines.
170
171There is an additional column as compared to the extract items, Label.
172
173The available fields to import are: 'Name', 'Label', 'Original
174quantity (µg)', 'Description', 'External id', 'Protocol', 'Created',
175'Pooled'
176
177Mandatory columns for imports: 'Name'
178
179Labeledextract export file: labeledextract_out.txt
180Labeledextract import file: labeledextract_in.txt