1 | There is a need to facilitate batch upload, creation, and modification |
---|
2 | of items in BASE. Some batch tools already exists such as |
---|
3 | |
---|
4 | - batch upload of files using zip files |
---|
5 | - batch creation of array slides |
---|
6 | - batch addition/deletion of reporters |
---|
7 | - import of annotations |
---|
8 | - list views offer an import button but which view do actually offer |
---|
9 | a plug-in that does anything? |
---|
10 | |
---|
11 | For a single or few experiment setting there is not so urgent need for |
---|
12 | batch tools but for a microarray facility where many experiments are |
---|
13 | prepared by facility staff the need is eminent. At a facility site |
---|
14 | many experiments are conducted by few people and all data upload is |
---|
15 | done by these staff members. To ease the upload of data to BASE we |
---|
16 | suggest to create one or several plug-ins that can create or modify |
---|
17 | several items in a batch by reading information from tab separated |
---|
18 | files. The idea here is not to create one single monolithic plug-in |
---|
19 | that imports a complete experiment and creates all necessary items, |
---|
20 | but rather imports, creates, or modifies items for a given context and |
---|
21 | makes the proper associations to parents. The word 'import' is used in |
---|
22 | this document but it could just as well be create or modify depending |
---|
23 | on user requirements. |
---|
24 | |
---|
25 | There is ongoing work on a full experiment import from tab2mage |
---|
26 | formatted files, see |
---|
27 | http://baseplugins.thep.lu.se/wiki/uk.ac.ebi.Tab2MageImporter. |
---|
28 | |
---|
29 | The plug-in requirements outlined here is to be used in a context |
---|
30 | where the user ideally works interactively with BASE in a step-by-step |
---|
31 | procedure. The idea is that the interaction with BASE starts on some |
---|
32 | level and data is added from this level down. Here a sample work |
---|
33 | session is outlined where RNA is extracted and labeled starting from |
---|
34 | some source of biomaterial. In BASE this follows the path of biosource |
---|
35 | - sample - extract - labeled extract, and the continuing with |
---|
36 | hybridization - scan - raw bioassay - bioassay - bioassay sets - |
---|
37 | analysis. |
---|
38 | |
---|
39 | Starting at the bio source level, the user must make an initial import |
---|
40 | of biosource information or use the BASE web interface for adding |
---|
41 | biosource items. Samples are created from these biosources, in BASE |
---|
42 | context this means that sample information needs to be added. In this |
---|
43 | example we want to associate the samples to their parents, changing |
---|
44 | sample properties follows a similar path but the import files do not |
---|
45 | require parent information. The import of sample data is started with |
---|
46 | selecting the biosources associated with the samples in BASE, and then |
---|
47 | exporting this information to a file. This file is used as a template |
---|
48 | for entering sample data to be stored in BASE. The reason for using |
---|
49 | this template is to ensure that the correct biosource identifiers are |
---|
50 | used for the samples. (A user can of course create the file without the |
---|
51 | export from BASE but has to make sure that items are properly |
---|
52 | referenced.) The biosource identifiers are required for making |
---|
53 | parent-child association within BASE. When the samples are added to |
---|
54 | this file, the file is imported into BASE. After this import, the |
---|
55 | sample information is exported to a file again, and this file is used |
---|
56 | as a template for the extracts information. Again, the reason for this |
---|
57 | is to make sure that proper BASE identifiers are used. Extract |
---|
58 | information is added to the template and imported back to BASE. This |
---|
59 | procedure is performed for each level of data entry. |
---|
60 | |
---|
61 | The information optionally exported to be used as templates above are |
---|
62 | simple tab separated files with a few columns of information about the |
---|
63 | items. The columns exported have a two-fold purpose; i) make sure that |
---|
64 | BASE can make the proper associations when importing data, ii) guide |
---|
65 | the users when adding information to the template file, i.e., |
---|
66 | descriptive names for human interpretation. |
---|
67 | |
---|
68 | Items that should be supported by the item importer(s) are: |
---|
69 | |
---|
70 | - Biosources: top level, currently no parent items to associate |
---|
71 | |
---|
72 | - Samples: biosource (or pooled samples) to associate. Create sample |
---|
73 | events for pools to decrease pooled samples. |
---|
74 | |
---|
75 | - Extracts: samples (or pooled extracts) and protocols to |
---|
76 | associate. Create sample events to decrease sample amounts and |
---|
77 | extract events to decrease extract amounts for pooled extracts. |
---|
78 | |
---|
79 | - Labeled extracts: extracts (or pooled labeled extracts) and |
---|
80 | protocols to associate. Create extract events to decrease extract |
---|
81 | amounts and labeled extract events to decrease pooled amounts. |
---|
82 | |
---|
83 | - Hybridizations: multiple lableled extracts, ... more to come |
---|
84 | - scan |
---|
85 | - Raw bioassays: ... |
---|
86 | - Experiments: ... |
---|
87 | |
---|
88 | |
---|
89 | A detailed discussion on the different export/import steps, sample |
---|
90 | files for the different item types are available as attachements to |
---|
91 | ticket 1028 (http://base.thep.lu.se/ticket/1028) at the BASE web site. |
---|
92 | |
---|
93 | Sample files based on trunk revision 4301 were exported and modified |
---|
94 | for items from biosource level down to the labeled extracts level. An |
---|
95 | OpenOffice.org spreadsheet (batchimport_sample.ods) that contain |
---|
96 | format information with explanations in one document is also |
---|
97 | available. A tentatiove aim is that the spreadsheet may be used by |
---|
98 | laborative staff to fill information to be used in import to BASE. |
---|
99 | |
---|
100 | Dry-run that explain what will be done during import should be |
---|
101 | supported. Potential dangers and errors should be reported. This will |
---|
102 | allow the user to check that the import will behave as expected. |
---|
103 | |
---|
104 | |
---|
105 | Biosource |
---|
106 | |
---|
107 | This is currently the top level of associations. No association are |
---|
108 | needed except for the optional reference to an external item (a |
---|
109 | property of the biosource). The import is a straightforward tab |
---|
110 | separated import to fill the item properties. |
---|
111 | |
---|
112 | The available fields to import are: 'Name', 'Description', 'External |
---|
113 | id' |
---|
114 | |
---|
115 | Mandatory columns for imports: 'Name' |
---|
116 | |
---|
117 | Sample export file: biosource_out.txt |
---|
118 | Sample import file: biosource_in.txt |
---|
119 | |
---|
120 | |
---|
121 | Sample |
---|
122 | |
---|
123 | The import of item properties is a straightforward tab separated |
---|
124 | import. Compared to biosource items there are additional columns for |
---|
125 | associations to other items (the parent biosource and protocol). There |
---|
126 | is one parent only if the parent is a biosource, pooled samples may |
---|
127 | have multiple parents (other samples) defined using multiple lines. |
---|
128 | |
---|
129 | Pooled samples create 'Event's that decrease the parent amount. |
---|
130 | |
---|
131 | The available fields to import are: 'Name', 'Original quantity (µg)', |
---|
132 | 'Description', 'External id', 'Protocol', 'Created', 'Pooled' |
---|
133 | |
---|
134 | Mandatory columns for imports: 'Name' |
---|
135 | |
---|
136 | The important difference compared with biosource items is the possible |
---|
137 | associations to bioassays and protcols. |
---|
138 | |
---|
139 | Sample export file: sample_out.txt |
---|
140 | Sample import file: sample_in.txt |
---|
141 | |
---|
142 | |
---|
143 | Extract |
---|
144 | |
---|
145 | The import of item properties is a straightforward tab separated |
---|
146 | import. There are additional columns for associations to the parent |
---|
147 | item and other items. There is one parent only if the parent is a |
---|
148 | sample, pooled extracts may have multiple parents (other extracts) |
---|
149 | defined using multiple lines. |
---|
150 | |
---|
151 | Extracts and pooled extracts create 'Event's that decrease the parent |
---|
152 | amount. |
---|
153 | |
---|
154 | The available fields to import are: 'Name', 'Original quantity (µg)', |
---|
155 | 'Description', 'External id', 'Protocol', 'Created', 'Pooled' |
---|
156 | |
---|
157 | Mandatory columns for imports: 'Name' |
---|
158 | |
---|
159 | Extract export file: extract_out.txt |
---|
160 | Extract import file: extract_in.txt |
---|
161 | |
---|
162 | |
---|
163 | Labeled Extract |
---|
164 | |
---|
165 | The import of item properties is a straightforward tab separated |
---|
166 | import. There are additional columns for associations to the parent |
---|
167 | item and other items. There is one parent only if the |
---|
168 | parent is an extract, pooled labeled extracts may have multiple |
---|
169 | parents (other labeled extracts) defined using multiple lines. |
---|
170 | |
---|
171 | There is an additional column as compared to the extract items, Label. |
---|
172 | |
---|
173 | The available fields to import are: 'Name', 'Label', 'Original |
---|
174 | quantity (µg)', 'Description', 'External id', 'Protocol', 'Created', |
---|
175 | 'Pooled' |
---|
176 | |
---|
177 | Mandatory columns for imports: 'Name' |
---|
178 | |
---|
179 | Labeledextract export file: labeledextract_out.txt |
---|
180 | Labeledextract import file: labeledextract_in.txt |
---|