1 | There is a need to facilitate batch upload, creation, and modification |
---|
2 | of items in BASE. Some batch tools already exists such as |
---|
3 | |
---|
4 | - batch upload of files using zip files |
---|
5 | - batch creation of array slides |
---|
6 | - batch addition/deletion of reporters |
---|
7 | - import of annotations |
---|
8 | - list views offer an import button but which view do actually offer |
---|
9 | a plug-in that does anything? |
---|
10 | |
---|
11 | For a single or few experiment setting there is not so urgent need for |
---|
12 | batch tools but for a microarray facility where many experiments are |
---|
13 | prepared by facility staff the need is eminent. At a facility site |
---|
14 | many experiments are conducted by few people and all data upload is |
---|
15 | done by these staff members. To ease the upload of data to BASE we |
---|
16 | suggest to create one or several plug-ins that can create or modify |
---|
17 | several items in a batch by reading information from tab separated |
---|
18 | files. The idea here is not to create one single monolithic plug-in |
---|
19 | that imports a complete experiment and creates all necessary items, |
---|
20 | but rather imports, creates, or modifies items for a given context and |
---|
21 | makes the proper associations to parents. The word 'import' is used in |
---|
22 | this document but it could just as well be create or modify depending |
---|
23 | on user requirements. |
---|
24 | |
---|
25 | There is ongoing work on a full experiment import from tab2mage |
---|
26 | formatted files, see |
---|
27 | http://baseplugins.thep.lu.se/wiki/uk.ac.ebi.Tab2MageImporter. |
---|
28 | |
---|
29 | The plug-in requirements outlined here is to be used in a context |
---|
30 | where the user ideally works interactively with BASE in a step-by-step |
---|
31 | procedure. The idea is that the interaction with BASE starts on some |
---|
32 | level and data is added from this level down. Here a sample work |
---|
33 | session is outlined where RNA is extracted and labeled starting from |
---|
34 | some source of bio-material. In BASE this follows the path of |
---|
35 | 'biosource' - 'sample' - 'extract' - 'labeled extract', and then |
---|
36 | continuing with 'hybridization' - 'scan' - 'raw bioassay' - 'bioassay' |
---|
37 | - 'bioassay sets' - 'analysis'. However, it is recommended that array |
---|
38 | information is imported before hybridization import following the path |
---|
39 | 'array design' - 'array batch' - 'array slide' - 'hybridization' - |
---|
40 | 'scan' ... |
---|
41 | |
---|
42 | The proposed plug-in should be usable starting from 'biosource' and |
---|
43 | 'array design' views down to the raw bioassay step, beyond this manual |
---|
44 | import or use of other plug-ins are required. There is already a batch |
---|
45 | raw data importer available at the BASE plug-in site |
---|
46 | (http://baseplugins.thep.lu.se/wiki/uk.ac.scri.batchimporter) for |
---|
47 | import of raw data and experiment creation. The SCRI batch importer |
---|
48 | should be adopted, if necessary, to create experiments and raw |
---|
49 | bioassays in line with the batch importer plug-in we create. |
---|
50 | |
---|
51 | The import of array design items will not create fully working designs |
---|
52 | because of two reasons; i) File upload is required for the design |
---|
53 | definition and ii) file upload may be required for feature import. The |
---|
54 | proposed plug-in does not support file upload. However, the items can |
---|
55 | be created and modified, but files and features must be manually |
---|
56 | fixed. This is not a big issue in reality since new design are not |
---|
57 | created too often. |
---|
58 | |
---|
59 | Starting at the bio source level, the user must make an initial import |
---|
60 | of biosource information or use the BASE web interface for adding |
---|
61 | biosource items. Samples are created from these biosources, in BASE |
---|
62 | context this means that sample information needs to be added. In this |
---|
63 | example we want to associate the samples to their parents, changing |
---|
64 | sample properties follows a similar path but the import files do not |
---|
65 | require parent information. The import of sample data is started with |
---|
66 | selecting the biosources associated with the samples in BASE, and then |
---|
67 | exporting this information to a file. This file is used as a template |
---|
68 | for entering sample data to be stored in BASE. The reason for using |
---|
69 | this template is to ensure that the correct biosource identifiers are |
---|
70 | used for the samples. (A user can of course create the file without the |
---|
71 | export from BASE but has to make sure that items are properly |
---|
72 | referenced.) The biosource identifiers are required for making |
---|
73 | parent-child association within BASE. When the samples are added to |
---|
74 | this file, the file is imported into BASE. After this import, the |
---|
75 | sample information is exported to a file again, and this file is used |
---|
76 | as a template for the extracts information. Again, the reason for this |
---|
77 | is to make sure that proper BASE identifiers are used. Extract |
---|
78 | information is added to the template and imported back to BASE. This |
---|
79 | procedure is performed for each level of data entry. |
---|
80 | |
---|
81 | The information optionally exported to be used as templates above are |
---|
82 | simple tab separated files with a few columns of information about the |
---|
83 | items. The columns exported have a two-fold purpose; i) make sure that |
---|
84 | BASE can make the proper associations when importing data, ii) guide |
---|
85 | the users when adding information to the template file, i.e., |
---|
86 | descriptive names for human interpretation. |
---|
87 | |
---|
88 | Dry-run that explain what will be done during import should be |
---|
89 | supported. Potential dangers and errors should be reported. This |
---|
90 | feature will allow the user to check that the import will behave as |
---|
91 | expected. |
---|
92 | |
---|
93 | Below follows a short description of item types that should be |
---|
94 | supported by the importer. An OpenOffice.org spreadsheet |
---|
95 | (batchimport_sample.ods) that contain format information with |
---|
96 | explanations in one document is maintained and made available as an |
---|
97 | attachment to ticket 1028 (http://base.thep.lu.se/ticket/1028) at the |
---|
98 | BASE web site. The spreadsheet is work in progress and may change |
---|
99 | depending on requirements until the batch import is finalized. Example |
---|
100 | import files can be created from the spread sheet. |
---|
101 | |
---|
102 | A tentative aim is that the spreadsheet may be used by laborative |
---|
103 | staff to fill information to be used in import to BASE. |
---|
104 | |
---|
105 | |
---|
106 | A short description on the different item types to be imported by the |
---|
107 | batch importer: |
---|
108 | |
---|
109 | Biosource |
---|
110 | |
---|
111 | This is currently the top level of associations. No association are |
---|
112 | needed except for the optional reference to an external item (a |
---|
113 | property of the biosource). The import is a straightforward tab |
---|
114 | separated import to fill the item properties. |
---|
115 | |
---|
116 | Fields to import are: 'Name', 'Description', 'External id' |
---|
117 | |
---|
118 | Mandatory columns for imports: 'Name' |
---|
119 | |
---|
120 | Sample export file: biosource_out.txt |
---|
121 | |
---|
122 | |
---|
123 | Sample |
---|
124 | |
---|
125 | The import of item properties is a straightforward tab separated |
---|
126 | import. Compared to biosource items there are additional columns for |
---|
127 | associations to other items (the parent biosource and protocol). There |
---|
128 | is one parent only if the parent is a biosource, pooled samples may |
---|
129 | have multiple parents (other samples) defined using multiple lines. |
---|
130 | |
---|
131 | Pooled samples create 'Event's that decrease the parent amount. The |
---|
132 | original quantity of a pooled sample is the sum of the pooled |
---|
133 | components. |
---|
134 | |
---|
135 | Fields to import are: 'Name', 'Original quantity (µg)', 'Description', |
---|
136 | 'External id', 'Created', 'Pooled' |
---|
137 | |
---|
138 | Items to make associations to: 'Biosource', 'Protocol', 'Sample' for |
---|
139 | pooled entries (also decrease quantity 'Sample Used') |
---|
140 | |
---|
141 | Mandatory columns for imports: 'Name' |
---|
142 | |
---|
143 | The important difference compared with biosource items is the possible |
---|
144 | associations to bioassays and protocols. |
---|
145 | |
---|
146 | Sample export file: sample_out.txt |
---|
147 | |
---|
148 | |
---|
149 | Extract |
---|
150 | |
---|
151 | The import of item properties is a straightforward tab separated |
---|
152 | import. There are additional columns for associations to the parent |
---|
153 | item and other items. There is one parent only if the parent is a |
---|
154 | sample, pooled extracts may have multiple parents (other extracts) |
---|
155 | defined using multiple lines. |
---|
156 | |
---|
157 | Extracts and pooled extracts create 'Event's that decrease the parent |
---|
158 | amount. The original quantity of a pooled extract is the sum of the |
---|
159 | pooled components. |
---|
160 | |
---|
161 | Fields to import are: 'Name', 'Original quantity (µg)', 'Description', |
---|
162 | 'External id', 'Created', 'Pooled' |
---|
163 | |
---|
164 | Items to make associations to: 'Sample' (also decrease quantity |
---|
165 | 'Sample Used'), 'Protocol' |
---|
166 | |
---|
167 | Mandatory columns for imports: 'Name' |
---|
168 | |
---|
169 | Extract export file: extract_out.txt |
---|
170 | |
---|
171 | |
---|
172 | Labeled Extract |
---|
173 | |
---|
174 | The import of item properties is a straightforward tab separated |
---|
175 | import. There are additional columns for associations to the parent |
---|
176 | item and other items. There is one parent only if the |
---|
177 | parent is an extract, pooled labeled extracts may have multiple |
---|
178 | parents (other labeled extracts) defined using multiple lines. |
---|
179 | |
---|
180 | Labeled extracts and pooled labeled extracts create 'Event's that |
---|
181 | decrease the parent amount. The original quantity of a pooled labeled |
---|
182 | extract is the sum of the pooled components. |
---|
183 | |
---|
184 | There is an additional column as compared to the extract items, Label. |
---|
185 | |
---|
186 | Fields to import are: 'Name', 'Original quantity (µg)', 'Description', |
---|
187 | 'External id', 'Created', 'Pooled' |
---|
188 | |
---|
189 | Items to make associations to: 'Extract' (also decrease quantity |
---|
190 | 'Extract Used'), 'Protocol', 'Label' |
---|
191 | |
---|
192 | Mandatory columns for imports: 'Name' |
---|
193 | |
---|
194 | Labeledextract export file: labeledextract_out.txt |
---|
195 | |
---|
196 | |
---|
197 | Array Design |
---|
198 | |
---|
199 | The import of item properties is a straightforward tab separated |
---|
200 | import. There is one additional column for association to the parent |
---|
201 | item. Note, the import of array design items will not create fully |
---|
202 | working designs, see more information above. |
---|
203 | |
---|
204 | Fields to import are: 'Name', 'Description', 'Arrays/slide' |
---|
205 | |
---|
206 | Items to make associations to: 'Platform/Variant' |
---|
207 | |
---|
208 | Mandatory columns for imports: 'Name', 'Platform/Variant', |
---|
209 | 'Arrays/slide' |
---|
210 | |
---|
211 | Array batch export file: arraybatch_out.txt |
---|
212 | |
---|
213 | |
---|
214 | Array Batch |
---|
215 | |
---|
216 | The import of item properties is a straightforward tab separated |
---|
217 | import. There is one additional column for association to the parent |
---|
218 | item. |
---|
219 | |
---|
220 | Fields to import are: 'Name', 'Description' |
---|
221 | |
---|
222 | Items to make associations to: 'Array design', 'Protocol', 'Hardware' |
---|
223 | |
---|
224 | Mandatory columns for imports: 'Name', 'Array design' |
---|
225 | |
---|
226 | Array batch export file: arraybatch_out.txt |
---|
227 | |
---|
228 | |
---|
229 | Array Slide |
---|
230 | |
---|
231 | The import of item properties is a straightforward tab separated |
---|
232 | import. There is one additional columns for association to the parent |
---|
233 | item. |
---|
234 | |
---|
235 | Fields to import are: 'Name', 'Description', 'Barcode', 'Destroyed' |
---|
236 | |
---|
237 | Items to make associations to: 'Array batch' |
---|
238 | |
---|
239 | Mandatory columns for imports: 'Name', 'Array batch' |
---|
240 | |
---|
241 | Array slide export file: arrayslide_out.txt |
---|
242 | |
---|
243 | |
---|
244 | Hybridization |
---|
245 | |
---|
246 | The import of item properties is a straightforward tab separated |
---|
247 | import. There are additional columns for associations to parent items |
---|
248 | and other items. There may be one or more parents (labeled extracts), |
---|
249 | the number depends on how many arrays there are on each slide and on |
---|
250 | the number of channels of the platform used. Multiple parents are |
---|
251 | defined on multiple lines. |
---|
252 | |
---|
253 | There are additional columns as compared to the labeled extract items, |
---|
254 | 'Hardware', 'Array slide', 'Arrays', and 'Array index'. |
---|
255 | |
---|
256 | Fields to import are: 'Name' 'Description', 'Created', 'Arrays', |
---|
257 | 'Array index' |
---|
258 | |
---|
259 | Items to make associations to: 'Lableled extract' (also decrease |
---|
260 | quantity 'Parent Used'), 'Protocol', 'Array slide', 'Hardware' |
---|
261 | |
---|
262 | Mandatory columns for imports: 'Name', 'Arrays' |
---|
263 | |
---|
264 | Hybridization export file: hybridization_out.txt |
---|
265 | |
---|
266 | |
---|
267 | Scan |
---|
268 | |
---|
269 | The import of item properties is a straightforward tab separated |
---|
270 | import. There are additional columns for associations to the parent |
---|
271 | item and other items. There is one parent hybridization only. |
---|
272 | |
---|
273 | There are no additional column as compared with the previous |
---|
274 | items. Image upload through is not supported by the importer. |
---|
275 | |
---|
276 | Fields to import are: 'Name', 'Description' |
---|
277 | |
---|
278 | Items to make associations to: 'Hybridization', 'Protocol', 'Hardware' |
---|
279 | |
---|
280 | Mandatory columns for imports: 'Name' |
---|
281 | |
---|
282 | Scan export file: scan_out.txt |
---|