1 | There is a need to facilitate batch upload, creation, and modification
|
---|
2 | of items in BASE. Some batch tools already exists such as
|
---|
3 |
|
---|
4 | - batch upload of files using zip files
|
---|
5 | - batch creation of array slides
|
---|
6 | - batch addition/deletion of reporters
|
---|
7 | - import of annotations
|
---|
8 | - list views offer an import button but which view do actually offer
|
---|
9 | a plug-in that does anything?
|
---|
10 |
|
---|
11 | For a single or few experiment setting there is not so urgent need for
|
---|
12 | batch tools but for a microarray facility where many experiments are
|
---|
13 | prepared by facility staff the need is eminent. At a facility site
|
---|
14 | many experiments are conducted by few people and all data upload is
|
---|
15 | done by these staff members. To ease the upload of data to BASE we
|
---|
16 | suggest to create one or several plug-ins that can create or modify
|
---|
17 | several items in a batch by reading information from tab separated
|
---|
18 | files. The idea here is not to create one single monolithic plug-in
|
---|
19 | that imports a complete experiment and creates all necessary items,
|
---|
20 | but rather imports, creates, or modifies items for a given context and
|
---|
21 | makes the proper associations to parents. The word 'import' is used in
|
---|
22 | this document but it could just as well be create or modify depending
|
---|
23 | on user requirements.
|
---|
24 |
|
---|
25 | There is ongoing work on a full experiment import from tab2mage
|
---|
26 | formatted files, see
|
---|
27 | http://baseplugins.thep.lu.se/wiki/uk.ac.ebi.Tab2MageImporter.
|
---|
28 |
|
---|
29 | The plug-in requirements outlined here is to be used in a context
|
---|
30 | where the user ideally works interactively with BASE in a step-by-step
|
---|
31 | procedure. The idea is that the interaction with BASE starts on some
|
---|
32 | level and data is added from this level down. Here a sample work
|
---|
33 | session is outlined where RNA is extracted and labeled starting from
|
---|
34 | some source of biomaterial. In BASE this follows the path of biosource
|
---|
35 | - sample - extract - labeled extract, and the continuing with
|
---|
36 | hybridization - scan - raw bioassay - bioassay - bioassay sets -
|
---|
37 | analysis.
|
---|
38 |
|
---|
39 | Starting at the bio source level, the user must make an initial import
|
---|
40 | of biosource information or use the BASE web interface for adding
|
---|
41 | biosource items. Samples are created from these biosources, in BASE
|
---|
42 | context this means that sample information needs to be added. In this
|
---|
43 | example we want to associate the samples to their parents, changing
|
---|
44 | sample properties follows a similar path but the import files do not
|
---|
45 | require parent information. The import of sample data is started with
|
---|
46 | selecting the biosources associated with the samples in BASE, and then
|
---|
47 | exporting this information to a file. This file is used as a template
|
---|
48 | for entering sample data to be stored in BASE. The reason for using
|
---|
49 | this template is to ensure that the correct biosource identifiers are
|
---|
50 | used for the samples. (A user can of course create the file without the
|
---|
51 | export from BASE but has to make sure that items are properly
|
---|
52 | referenced.) The biosource identifiers are required for making
|
---|
53 | parent-child association within BASE. When the samples are added to
|
---|
54 | this file, the file is imported into BASE. After this import, the
|
---|
55 | sample information is exported to a file again, and this file is used
|
---|
56 | as a template for the extracts information. Again, the reason for this
|
---|
57 | is to make sure that proper BASE identifiers are used. Extract
|
---|
58 | information is added to the template and imported back to BASE. This
|
---|
59 | procedure is performed for each level of data entry.
|
---|
60 |
|
---|
61 | The information optionally exported to be used as templates above are
|
---|
62 | simple tab separated files with a few columns of information about the
|
---|
63 | items. The columns exported have a two-fold purpose; i) make sure that
|
---|
64 | BASE can make the proper associations when importing data, ii) guide
|
---|
65 | the users when adding information to the template file, i.e.,
|
---|
66 | descriptive names for human interpretation.
|
---|
67 |
|
---|
68 | Items that should be supported by the item importer(s) are:
|
---|
69 |
|
---|
70 | - Biosources: top level, currently no parent items to associate
|
---|
71 |
|
---|
72 | - Samples: biosource (or pooled samples) to associate. Create sample
|
---|
73 | events for pools to decrease pooled samples.
|
---|
74 |
|
---|
75 | - Extracts: samples (or pooled extracts) and protocols to
|
---|
76 | associate. Create sample events to decrease sample amounts and
|
---|
77 | extract events to decrease extract amounts for pooled extracts.
|
---|
78 |
|
---|
79 | - Labeled extracts: extracts (or pooled labeled extracts) and
|
---|
80 | protocols to associate. Create extract events to decrease extract
|
---|
81 | amounts and labeled extract events to decrease pooled amounts.
|
---|
82 |
|
---|
83 | - Hybridizations: multiple lableled extracts, ... more to come
|
---|
84 | - scan
|
---|
85 | - Raw bioassays: ...
|
---|
86 | - Experiments: ...
|
---|
87 |
|
---|
88 |
|
---|
89 | A detailed discussion on the different export/import steps, sample
|
---|
90 | files for the different item types are available as attachements to
|
---|
91 | ticket 1028 (http://base.thep.lu.se/ticket/1028) at the BASE web site.
|
---|
92 |
|
---|
93 | Sample files based on trunk revision 4301 were exported and modified
|
---|
94 | for items from biosource level down to the labeled extracts level. An
|
---|
95 | OpenOffice.org spreadsheet (batchimport_sample.ods) that contain
|
---|
96 | format information with explanations in one document is also
|
---|
97 | available. A tentatiove aim is that the spreadsheet may be used by
|
---|
98 | laborative staff to fill information to be used in import to BASE.
|
---|
99 |
|
---|
100 | Dry-run that explain what will be done during import should be
|
---|
101 | supported. Potential dangers and errors should be reported. This will
|
---|
102 | allow the user to check that the import will behave as expected.
|
---|
103 |
|
---|
104 |
|
---|
105 | Biosource
|
---|
106 |
|
---|
107 | This is currently the top level of associations. No association are
|
---|
108 | needed except for the optional reference to an external item (a
|
---|
109 | property of the biosource). The import is a straightforward tab
|
---|
110 | separated import to fill the item properties.
|
---|
111 |
|
---|
112 | The available fields to import are: 'Name', 'Description', 'External
|
---|
113 | id'
|
---|
114 |
|
---|
115 | Mandatory columns for imports: 'Name'
|
---|
116 |
|
---|
117 | Sample export file: biosource_out.txt
|
---|
118 | Sample import file: biosource_in.txt
|
---|
119 |
|
---|
120 |
|
---|
121 | Sample
|
---|
122 |
|
---|
123 | The import of item properties is a straightforward tab separated
|
---|
124 | import. Compared to biosource items there are additional columns for
|
---|
125 | associations to other items (the parent biosource and protocol). There
|
---|
126 | is one parent only if the parent is a biosource, pooled samples may
|
---|
127 | have multiple parents (other samples) defined using multiple lines.
|
---|
128 |
|
---|
129 | Pooled samples create 'Event's that decrease the parent amount.
|
---|
130 |
|
---|
131 | The available fields to import are: 'Name', 'Original quantity (µg)',
|
---|
132 | 'Description', 'External id', 'Protocol', 'Created', 'Pooled'
|
---|
133 |
|
---|
134 | Mandatory columns for imports: 'Name'
|
---|
135 |
|
---|
136 | The important difference compared with biosource items is the possible
|
---|
137 | associations to bioassays and protcols.
|
---|
138 |
|
---|
139 | Sample export file: sample_out.txt
|
---|
140 | Sample import file: sample_in.txt
|
---|
141 |
|
---|
142 |
|
---|
143 | Extract
|
---|
144 |
|
---|
145 | The import of item properties is a straightforward tab separated
|
---|
146 | import. There are additional columns for associations to the parent
|
---|
147 | item and other items. There is one parent only if the parent is a
|
---|
148 | sample, pooled extracts may have multiple parents (other extracts)
|
---|
149 | defined using multiple lines.
|
---|
150 |
|
---|
151 | Extracts and pooled extracts create 'Event's that decrease the parent
|
---|
152 | amount.
|
---|
153 |
|
---|
154 | The available fields to import are: 'Name', 'Original quantity (µg)',
|
---|
155 | 'Description', 'External id', 'Protocol', 'Created', 'Pooled'
|
---|
156 |
|
---|
157 | Mandatory columns for imports: 'Name'
|
---|
158 |
|
---|
159 | Extract export file: extract_out.txt
|
---|
160 | Extract import file: extract_in.txt
|
---|
161 |
|
---|
162 |
|
---|
163 | Labeled Extract
|
---|
164 |
|
---|
165 | The import of item properties is a straightforward tab separated
|
---|
166 | import. There are additional columns for associations to the parent
|
---|
167 | item and other items. There is one parent only if the
|
---|
168 | parent is an extract, pooled labeled extracts may have multiple
|
---|
169 | parents (other labeled extracts) defined using multiple lines.
|
---|
170 |
|
---|
171 | There is an additional column as compared to the extract items, Label.
|
---|
172 |
|
---|
173 | The available fields to import are: 'Name', 'Label', 'Original
|
---|
174 | quantity (µg)', 'Description', 'External id', 'Protocol', 'Created',
|
---|
175 | 'Pooled'
|
---|
176 |
|
---|
177 | Mandatory columns for imports: 'Name'
|
---|
178 |
|
---|
179 | Labeledextract export file: labeledextract_out.txt
|
---|
180 | Labeledextract import file: labeledextract_in.txt
|
---|