1 | There is a need to facilitate batch upload, creation, and modification
2 | of items in BASE. Some batch tools already exists such as
3 |
4 | - batch upload of files using zip files
5 | - batch creation of array slides
6 | - batch addition/deletion of reporters
7 | - import of annotations
8 | - list views offer an import button but which view do actually offer
9 | a plug-in that does anything?
10 |
11 | For a single or few experiment setting there is not so urgent need for
12 | batch tools but for a microarray facility where many experiments are
13 | prepared by facility staff the need is eminent. At a facility site
14 | many experiments are conducted by few people and all data upload is
15 | done by these staff members. To ease the upload of data to BASE we
16 | suggest to create one or several plug-ins that can create or modify
17 | several items in a batch by reading information from tab separated
18 | files. The idea here is not to create one single monolithic plug-in
19 | that imports a complete experiment and creates all necessary items,
20 | but rather imports, creates, or modifies items for a given context and
21 | makes the proper associations to parents. The word 'import' is used in
22 | this document but it could just as well be create or modify depending
23 | on user requirements.
24 |
25 | There is ongoing work on a full experiment import from tab2mage
26 | formatted files, see
27 | http://baseplugins.thep.lu.se/wiki/uk.ac.ebi.Tab2MageImporter.
28 |
29 | The plug-in requirements outlined here is to be used in a context
30 | where the user ideally works interactively with BASE in a step-by-step
31 | procedure. The idea is that the interaction with BASE starts on some
32 | level and data is added from this level down. Here a sample work
33 | session is outlined where RNA is extracted and labeled starting from
34 | some source of bio-material. In BASE this follows the path of
35 | 'biosource' - 'sample' - 'extract' - 'labeled extract', and then
36 | continuing with 'hybridization' - 'scan' - 'raw bioassay' - 'bioassay'
37 | - 'bioassay sets' - 'analysis'. However, it is recommended that array
38 | information is imported before hybridization import following the path
39 | 'array design' - 'array batch' - 'array slide' - 'hybridization' -
40 | 'scan' ...
41 |
42 | The proposed plug-in should be usable starting from 'biosource' and
43 | 'array design' views down to the raw bioassay step, beyond this manual
44 | import or use of other plug-ins are required. There is already a batch
45 | raw data importer available at the BASE plug-in site
46 | (http://baseplugins.thep.lu.se/wiki/uk.ac.scri.batchimporter) for
47 | import of raw data and experiment creation. The SCRI batch importer
48 | should be adopted, if necessary, to create experiments and raw
49 | bioassays in line with the batch importer plug-in we create.
50 |
51 | The import of array design items will not create fully working designs
52 | because of two reasons; i) File upload is required for the design
53 | definition and ii) file upload may be required for feature import. The
54 | proposed plug-in does not support file upload. However, the items can
55 | be created and modified, but files and features must be manually
56 | fixed. This is not a big issue in reality since new design are not
57 | created too often.
58 |
59 | Starting at the bio source level, the user must make an initial import
60 | of biosource information or use the BASE web interface for adding
61 | biosource items. Samples are created from these biosources, in BASE
62 | context this means that sample information needs to be added. In this
63 | example we want to associate the samples to their parents, changing
64 | sample properties follows a similar path but the import files do not
65 | require parent information. The import of sample data is started with
66 | selecting the biosources associated with the samples in BASE, and then
67 | exporting this information to a file. This file is used as a template
68 | for entering sample data to be stored in BASE. The reason for using
69 | this template is to ensure that the correct biosource identifiers are
70 | used for the samples. (A user can of course create the file without the
71 | export from BASE but has to make sure that items are properly
72 | referenced.) The biosource identifiers are required for making
73 | parent-child association within BASE. When the samples are added to
74 | this file, the file is imported into BASE. After this import, the
75 | sample information is exported to a file again, and this file is used
76 | as a template for the extracts information. Again, the reason for this
77 | is to make sure that proper BASE identifiers are used. Extract
78 | information is added to the template and imported back to BASE. This
79 | procedure is performed for each level of data entry.
80 |
81 | The information optionally exported to be used as templates above are
82 | simple tab separated files with a few columns of information about the
83 | items. The columns exported have a two-fold purpose; i) make sure that
84 | BASE can make the proper associations when importing data, ii) guide
85 | the users when adding information to the template file, i.e.,
86 | descriptive names for human interpretation.
87 |
88 | Dry-run that explain what will be done during import should be
89 | supported. Potential dangers and errors should be reported. This
90 | feature will allow the user to check that the import will behave as
91 | expected.
92 |
93 | Below follows a short description of item types that should be
94 | supported by the importer. An OpenOffice.org spreadsheet
95 | (batchimport_sample.ods) that contain format information with
96 | explanations in one document is maintained and made available as an
97 | attachment to ticket 1028 (http://base.thep.lu.se/ticket/1028) at the
98 | BASE web site. The spreadsheet is work in progress and may change
99 | depending on requirements until the batch import is finalized. Example
100 | import files can be created from the spread sheet.
101 |
102 | A tentative aim is that the spreadsheet may be used by laborative
103 | staff to fill information to be used in import to BASE.
104 |
105 |
106 | A short description on the different item types to be imported by the
107 | batch importer:
108 |
109 | Biosource
110 |
111 | This is currently the top level of associations. No association are
112 | needed except for the optional reference to an external item (a
113 | property of the biosource). The import is a straightforward tab
114 | separated import to fill the item properties.
115 |
116 | Fields to import are: 'Name', 'Description', 'External id'
117 |
118 | Mandatory columns for imports: 'Name'
119 |
120 | Sample export file: biosource_out.txt
121 |
122 |
123 | Sample
124 |
125 | The import of item properties is a straightforward tab separated
126 | import. Compared to biosource items there are additional columns for
127 | associations to other items (the parent biosource and protocol). There
128 | is one parent only if the parent is a biosource, pooled samples may
129 | have multiple parents (other samples) defined using multiple lines.
130 |
131 | Pooled samples create 'Event's that decrease the parent amount. The
132 | original quantity of a pooled sample is the sum of the pooled
133 | components.
134 |
135 | Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
136 | 'External id', 'Created', 'Pooled'
137 |
138 | Items to make associations to: 'Biosource', 'Protocol', 'Sample' for
139 | pooled entries (also decrease quantity 'Sample Used')
140 |
141 | Mandatory columns for imports: 'Name'
142 |
143 | The important difference compared with biosource items is the possible
144 | associations to bioassays and protocols.
145 |
146 | Sample export file: sample_out.txt
147 |
148 |
149 | Extract
150 |
151 | The import of item properties is a straightforward tab separated
152 | import. There are additional columns for associations to the parent
153 | item and other items. There is one parent only if the parent is a
154 | sample, pooled extracts may have multiple parents (other extracts)
155 | defined using multiple lines.
156 |
157 | Extracts and pooled extracts create 'Event's that decrease the parent
158 | amount. The original quantity of a pooled extract is the sum of the
159 | pooled components.
160 |
161 | Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
162 | 'External id', 'Created', 'Pooled'
163 |
164 | Items to make associations to: 'Sample' (also decrease quantity
165 | 'Sample Used'), 'Protocol'
166 |
167 | Mandatory columns for imports: 'Name'
168 |
169 | Extract export file: extract_out.txt
170 |
171 |
172 | Labeled Extract
173 |
174 | The import of item properties is a straightforward tab separated
175 | import. There are additional columns for associations to the parent
176 | item and other items. There is one parent only if the
177 | parent is an extract, pooled labeled extracts may have multiple
178 | parents (other labeled extracts) defined using multiple lines.
179 |
180 | Labeled extracts and pooled labeled extracts create 'Event's that
181 | decrease the parent amount. The original quantity of a pooled labeled
182 | extract is the sum of the pooled components.
183 |
184 | There is an additional column as compared to the extract items, Label.
185 |
186 | Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
187 | 'External id', 'Created', 'Pooled'
188 |
189 | Items to make associations to: 'Extract' (also decrease quantity
190 | 'Extract Used'), 'Protocol', 'Label'
191 |
192 | Mandatory columns for imports: 'Name'
193 |
194 | Labeledextract export file: labeledextract_out.txt
195 |
196 |
197 | Array Design
198 |
199 | The import of item properties is a straightforward tab separated
200 | import. There is one additional column for association to the parent
201 | item. Note, the import of array design items will not create fully
202 | working designs, see more information above.
203 |
204 | Fields to import are: 'Name', 'Description', 'Arrays/slide'
205 |
206 | Items to make associations to: 'Platform/Variant'
207 |
208 | Mandatory columns for imports: 'Name', 'Platform/Variant',
209 | 'Arrays/slide'
210 |
211 | Array batch export file: arraybatch_out.txt
212 |
213 |
214 | Array Batch
215 |
216 | The import of item properties is a straightforward tab separated
217 | import. There is one additional column for association to the parent
218 | item.
219 |
220 | Fields to import are: 'Name', 'Description'
221 |
222 | Items to make associations to: 'Array design', 'Protocol', 'Hardware'
223 |
224 | Mandatory columns for imports: 'Name', 'Array design'
225 |
226 | Array batch export file: arraybatch_out.txt
227 |
228 |
229 | Array Slide
230 |
231 | The import of item properties is a straightforward tab separated
232 | import. There is one additional columns for association to the parent
233 | item.
234 |
235 | Fields to import are: 'Name', 'Description', 'Barcode', 'Destroyed'
236 |
237 | Items to make associations to: 'Array batch'
238 |
239 | Mandatory columns for imports: 'Name', 'Array batch'
240 |
241 | Array slide export file: arrayslide_out.txt
242 |
243 |
244 | Hybridization
245 |
246 | The import of item properties is a straightforward tab separated
247 | import. There are additional columns for associations to parent items
248 | and other items. There may be one or more parents (labeled extracts),
249 | the number depends on how many arrays there are on each slide and on
250 | the number of channels of the platform used. Multiple parents are
251 | defined on multiple lines.
252 |
253 | There are additional columns as compared to the labeled extract items,
254 | 'Hardware', 'Array slide', 'Arrays', and 'Array index'.
255 |
256 | Fields to import are: 'Name' 'Description', 'Created', 'Arrays',
257 | 'Array index'
258 |
259 | Items to make associations to: 'Lableled extract' (also decrease
260 | quantity 'Parent Used'), 'Protocol', 'Array slide', 'Hardware'
261 |
262 | Mandatory columns for imports: 'Name', 'Arrays'
263 |
264 | Hybridization export file: hybridization_out.txt
265 |
266 |
267 | Scan
268 |
269 | The import of item properties is a straightforward tab separated
270 | import. There are additional columns for associations to the parent
271 | item and other items. There is one parent hybridization only.
272 |
273 | There are no additional column as compared with the previous
274 | items. Image upload through is not supported by the importer.
275 |
276 | Fields to import are: 'Name', 'Description'
277 |
278 | Items to make associations to: 'Hybridization', 'Protocol', 'Hardware'
279 |
280 | Mandatory columns for imports: 'Name'
281 |
282 | Scan export file: scan_out.txt