1 | There is a need to facilitate batch upload, creation, and modification
|
---|
2 | of items in BASE. Some batch tools already exists such as
|
---|
3 |
|
---|
4 | - batch upload of files using zip files
|
---|
5 | - batch creation of array slides
|
---|
6 | - batch addition/deletion of reporters
|
---|
7 | - import of annotations
|
---|
8 | - list views offer an import button but which view do actually offer
|
---|
9 | a plug-in that does anything?
|
---|
10 |
|
---|
11 | For a single or few experiment setting there is not so urgent need for
|
---|
12 | batch tools but for a microarray facility where many experiments are
|
---|
13 | prepared by facility staff the need is eminent. At a facility site
|
---|
14 | many experiments are conducted by few people and all data upload is
|
---|
15 | done by these staff members. To ease the upload of data to BASE we
|
---|
16 | suggest to create one or several plug-ins that can create or modify
|
---|
17 | several items in a batch by reading information from tab separated
|
---|
18 | files. The idea here is not to create one single monolithic plug-in
|
---|
19 | that imports a complete experiment and creates all necessary items,
|
---|
20 | but rather imports, creates, or modifies items for a given context and
|
---|
21 | makes the proper associations to parents. The word 'import' is used in
|
---|
22 | this document but it could just as well be create or modify depending
|
---|
23 | on user requirements.
|
---|
24 |
|
---|
25 | There is ongoing work on a full experiment import from tab2mage
|
---|
26 | formatted files, see
|
---|
27 | http://baseplugins.thep.lu.se/wiki/uk.ac.ebi.Tab2MageImporter.
|
---|
28 |
|
---|
29 | The plug-in requirements outlined here is to be used in a context
|
---|
30 | where the user ideally works interactively with BASE in a step-by-step
|
---|
31 | procedure. The idea is that the interaction with BASE starts on some
|
---|
32 | level and data is added from this level down. Here a sample work
|
---|
33 | session is outlined where RNA is extracted and labeled starting from
|
---|
34 | some source of bio-material. In BASE this follows the path of
|
---|
35 | 'biosource' - 'sample' - 'extract' - 'labeled extract', and then
|
---|
36 | continuing with 'hybridization' - 'scan' - 'raw bioassay' - 'bioassay'
|
---|
37 | - 'bioassay sets' - 'analysis'. However, it is recommended that array
|
---|
38 | information is imported before hybridization import following the path
|
---|
39 | 'array design' - 'array batch' - 'array slide' - 'hybridization' -
|
---|
40 | 'scan' ...
|
---|
41 |
|
---|
42 | The proposed plug-in should be usable starting from 'biosource' and
|
---|
43 | 'array design' views down to the raw bioassay step, beyond this manual
|
---|
44 | import or use of other plug-ins are required. There is already a batch
|
---|
45 | raw data importer available at the BASE plug-in site
|
---|
46 | (http://baseplugins.thep.lu.se/wiki/uk.ac.scri.batchimporter) for
|
---|
47 | import of raw data and experiment creation. The SCRI batch importer
|
---|
48 | should be adopted, if necessary, to create experiments and raw
|
---|
49 | bioassays in line with the batch importer plug-in we create.
|
---|
50 |
|
---|
51 | The import of array design items will not create fully working designs
|
---|
52 | because of two reasons; i) File upload is required for the design
|
---|
53 | definition and ii) file upload may be required for feature import. The
|
---|
54 | proposed plug-in does not support file upload. However, the items can
|
---|
55 | be created and modified, but files and features must be manually
|
---|
56 | fixed. This is not a big issue in reality since new design are not
|
---|
57 | created too often.
|
---|
58 |
|
---|
59 | Starting at the bio source level, the user must make an initial import
|
---|
60 | of biosource information or use the BASE web interface for adding
|
---|
61 | biosource items. Samples are created from these biosources, in BASE
|
---|
62 | context this means that sample information needs to be added. In this
|
---|
63 | example we want to associate the samples to their parents, changing
|
---|
64 | sample properties follows a similar path but the import files do not
|
---|
65 | require parent information. The import of sample data is started with
|
---|
66 | selecting the biosources associated with the samples in BASE, and then
|
---|
67 | exporting this information to a file. This file is used as a template
|
---|
68 | for entering sample data to be stored in BASE. The reason for using
|
---|
69 | this template is to ensure that the correct biosource identifiers are
|
---|
70 | used for the samples. (A user can of course create the file without the
|
---|
71 | export from BASE but has to make sure that items are properly
|
---|
72 | referenced.) The biosource identifiers are required for making
|
---|
73 | parent-child association within BASE. When the samples are added to
|
---|
74 | this file, the file is imported into BASE. After this import, the
|
---|
75 | sample information is exported to a file again, and this file is used
|
---|
76 | as a template for the extracts information. Again, the reason for this
|
---|
77 | is to make sure that proper BASE identifiers are used. Extract
|
---|
78 | information is added to the template and imported back to BASE. This
|
---|
79 | procedure is performed for each level of data entry.
|
---|
80 |
|
---|
81 | The information optionally exported to be used as templates above are
|
---|
82 | simple tab separated files with a few columns of information about the
|
---|
83 | items. The columns exported have a two-fold purpose; i) make sure that
|
---|
84 | BASE can make the proper associations when importing data, ii) guide
|
---|
85 | the users when adding information to the template file, i.e.,
|
---|
86 | descriptive names for human interpretation.
|
---|
87 |
|
---|
88 | Dry-run that explain what will be done during import should be
|
---|
89 | supported. Potential dangers and errors should be reported. This
|
---|
90 | feature will allow the user to check that the import will behave as
|
---|
91 | expected.
|
---|
92 |
|
---|
93 | Below follows a short description of item types that should be
|
---|
94 | supported by the importer. An OpenOffice.org spreadsheet
|
---|
95 | (batchimport_sample.ods) that contain format information with
|
---|
96 | explanations in one document is maintained and made available as an
|
---|
97 | attachment to ticket 1028 (http://base.thep.lu.se/ticket/1028) at the
|
---|
98 | BASE web site. The spreadsheet is work in progress and may change
|
---|
99 | depending on requirements until the batch import is finalized. Example
|
---|
100 | import files can be created from the spread sheet.
|
---|
101 |
|
---|
102 | A tentative aim is that the spreadsheet may be used by laborative
|
---|
103 | staff to fill information to be used in import to BASE.
|
---|
104 |
|
---|
105 |
|
---|
106 | A short description on the different item types to be imported by the
|
---|
107 | batch importer:
|
---|
108 |
|
---|
109 | Biosource
|
---|
110 |
|
---|
111 | This is currently the top level of associations. No association are
|
---|
112 | needed except for the optional reference to an external item (a
|
---|
113 | property of the biosource). The import is a straightforward tab
|
---|
114 | separated import to fill the item properties.
|
---|
115 |
|
---|
116 | Fields to import are: 'Name', 'Description', 'External id'
|
---|
117 |
|
---|
118 | Mandatory columns for imports: 'Name'
|
---|
119 |
|
---|
120 | Sample export file: biosource_out.txt
|
---|
121 |
|
---|
122 |
|
---|
123 | Sample
|
---|
124 |
|
---|
125 | The import of item properties is a straightforward tab separated
|
---|
126 | import. Compared to biosource items there are additional columns for
|
---|
127 | associations to other items (the parent biosource and protocol). There
|
---|
128 | is one parent only if the parent is a biosource, pooled samples may
|
---|
129 | have multiple parents (other samples) defined using multiple lines.
|
---|
130 |
|
---|
131 | Pooled samples create 'Event's that decrease the parent amount. The
|
---|
132 | original quantity of a pooled sample is the sum of the pooled
|
---|
133 | components.
|
---|
134 |
|
---|
135 | Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
|
---|
136 | 'External id', 'Created', 'Pooled'
|
---|
137 |
|
---|
138 | Items to make associations to: 'Biosource', 'Protocol', 'Sample' for
|
---|
139 | pooled entries (also decrease quantity 'Sample Used')
|
---|
140 |
|
---|
141 | Mandatory columns for imports: 'Name'
|
---|
142 |
|
---|
143 | The important difference compared with biosource items is the possible
|
---|
144 | associations to bioassays and protocols.
|
---|
145 |
|
---|
146 | Sample export file: sample_out.txt
|
---|
147 |
|
---|
148 |
|
---|
149 | Extract
|
---|
150 |
|
---|
151 | The import of item properties is a straightforward tab separated
|
---|
152 | import. There are additional columns for associations to the parent
|
---|
153 | item and other items. There is one parent only if the parent is a
|
---|
154 | sample, pooled extracts may have multiple parents (other extracts)
|
---|
155 | defined using multiple lines.
|
---|
156 |
|
---|
157 | Extracts and pooled extracts create 'Event's that decrease the parent
|
---|
158 | amount. The original quantity of a pooled extract is the sum of the
|
---|
159 | pooled components.
|
---|
160 |
|
---|
161 | Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
|
---|
162 | 'External id', 'Created', 'Pooled'
|
---|
163 |
|
---|
164 | Items to make associations to: 'Sample' (also decrease quantity
|
---|
165 | 'Sample Used'), 'Protocol'
|
---|
166 |
|
---|
167 | Mandatory columns for imports: 'Name'
|
---|
168 |
|
---|
169 | Extract export file: extract_out.txt
|
---|
170 |
|
---|
171 |
|
---|
172 | Labeled Extract
|
---|
173 |
|
---|
174 | The import of item properties is a straightforward tab separated
|
---|
175 | import. There are additional columns for associations to the parent
|
---|
176 | item and other items. There is one parent only if the
|
---|
177 | parent is an extract, pooled labeled extracts may have multiple
|
---|
178 | parents (other labeled extracts) defined using multiple lines.
|
---|
179 |
|
---|
180 | Labeled extracts and pooled labeled extracts create 'Event's that
|
---|
181 | decrease the parent amount. The original quantity of a pooled labeled
|
---|
182 | extract is the sum of the pooled components.
|
---|
183 |
|
---|
184 | There is an additional column as compared to the extract items, Label.
|
---|
185 |
|
---|
186 | Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
|
---|
187 | 'External id', 'Created', 'Pooled'
|
---|
188 |
|
---|
189 | Items to make associations to: 'Extract' (also decrease quantity
|
---|
190 | 'Extract Used'), 'Protocol', 'Label'
|
---|
191 |
|
---|
192 | Mandatory columns for imports: 'Name'
|
---|
193 |
|
---|
194 | Labeledextract export file: labeledextract_out.txt
|
---|
195 |
|
---|
196 |
|
---|
197 | Array Design
|
---|
198 |
|
---|
199 | The import of item properties is a straightforward tab separated
|
---|
200 | import. There is one additional column for association to the parent
|
---|
201 | item. Note, the import of array design items will not create fully
|
---|
202 | working designs, see more information above.
|
---|
203 |
|
---|
204 | Fields to import are: 'Name', 'Description', 'Arrays/slide'
|
---|
205 |
|
---|
206 | Items to make associations to: 'Platform/Variant'
|
---|
207 |
|
---|
208 | Mandatory columns for imports: 'Name', 'Platform/Variant',
|
---|
209 | 'Arrays/slide'
|
---|
210 |
|
---|
211 | Array batch export file: arraybatch_out.txt
|
---|
212 |
|
---|
213 |
|
---|
214 | Array Batch
|
---|
215 |
|
---|
216 | The import of item properties is a straightforward tab separated
|
---|
217 | import. There is one additional column for association to the parent
|
---|
218 | item.
|
---|
219 |
|
---|
220 | Fields to import are: 'Name', 'Description'
|
---|
221 |
|
---|
222 | Items to make associations to: 'Array design', 'Protocol', 'Hardware'
|
---|
223 |
|
---|
224 | Mandatory columns for imports: 'Name', 'Array design'
|
---|
225 |
|
---|
226 | Array batch export file: arraybatch_out.txt
|
---|
227 |
|
---|
228 |
|
---|
229 | Array Slide
|
---|
230 |
|
---|
231 | The import of item properties is a straightforward tab separated
|
---|
232 | import. There is one additional columns for association to the parent
|
---|
233 | item.
|
---|
234 |
|
---|
235 | Fields to import are: 'Name', 'Description', 'Barcode', 'Destroyed'
|
---|
236 |
|
---|
237 | Items to make associations to: 'Array batch'
|
---|
238 |
|
---|
239 | Mandatory columns for imports: 'Name', 'Array batch'
|
---|
240 |
|
---|
241 | Array slide export file: arrayslide_out.txt
|
---|
242 |
|
---|
243 |
|
---|
244 | Hybridization
|
---|
245 |
|
---|
246 | The import of item properties is a straightforward tab separated
|
---|
247 | import. There are additional columns for associations to parent items
|
---|
248 | and other items. There may be one or more parents (labeled extracts),
|
---|
249 | the number depends on how many arrays there are on each slide and on
|
---|
250 | the number of channels of the platform used. Multiple parents are
|
---|
251 | defined on multiple lines.
|
---|
252 |
|
---|
253 | There are additional columns as compared to the labeled extract items,
|
---|
254 | 'Hardware', 'Array slide', 'Arrays', and 'Array index'.
|
---|
255 |
|
---|
256 | Fields to import are: 'Name' 'Description', 'Created', 'Arrays',
|
---|
257 | 'Array index'
|
---|
258 |
|
---|
259 | Items to make associations to: 'Lableled extract' (also decrease
|
---|
260 | quantity 'Parent Used'), 'Protocol', 'Array slide', 'Hardware'
|
---|
261 |
|
---|
262 | Mandatory columns for imports: 'Name', 'Arrays'
|
---|
263 |
|
---|
264 | Hybridization export file: hybridization_out.txt
|
---|
265 |
|
---|
266 |
|
---|
267 | Scan
|
---|
268 |
|
---|
269 | The import of item properties is a straightforward tab separated
|
---|
270 | import. There are additional columns for associations to the parent
|
---|
271 | item and other items. There is one parent hybridization only.
|
---|
272 |
|
---|
273 | There are no additional column as compared with the previous
|
---|
274 | items. Image upload through is not supported by the importer.
|
---|
275 |
|
---|
276 | Fields to import are: 'Name', 'Description'
|
---|
277 |
|
---|
278 | Items to make associations to: 'Hybridization', 'Protocol', 'Hardware'
|
---|
279 |
|
---|
280 | Mandatory columns for imports: 'Name'
|
---|
281 |
|
---|
282 | Scan export file: scan_out.txt
|
---|