There is a need to facilitate batch upload, creation, and modification
of items in BASE. Some batch tools already exists such as

 - batch upload of files using zip files
 - batch creation of array slides
 - batch addition/deletion of reporters
 - import of annotations
 - list views offer an import button but which view do actually offer
   a plug-in that does anything?

For a single or few experiment setting there is not so urgent need for
batch tools but for a microarray facility where many experiments are
prepared by facility staff the need is eminent. At a facility site
many experiments are conducted by few people and all data upload is
done by these staff members. To ease the upload of data to BASE we
suggest to create one or several plug-ins that can create or modify
several items in a batch by reading information from tab separated
files. The idea here is not to create one single monolithic plug-in
that imports a complete experiment and creates all necessary items,
but rather imports, creates, or modifies items for a given context and
makes the proper associations to parents. The word 'import' is used in
this document but it could just as well be create or modify depending
on user requirements.

There is ongoing work on a full experiment import from tab2mage
formatted files, see
http://baseplugins.thep.lu.se/wiki/uk.ac.ebi.Tab2MageImporter.

The plug-in requirements outlined here is to be used in a context
where the user ideally works interactively with BASE in a step-by-step
procedure. The idea is that the interaction with BASE starts on some
level and data is added from this level down. Here a sample work
session is outlined where RNA is extracted and labeled starting from
some source of bio-material. In BASE this follows the path of
'biosource' - 'sample' - 'extract' - 'labeled extract', and then
continuing with 'hybridization' - 'scan' - 'raw bioassay' - 'bioassay'
- 'bioassay sets' - 'analysis'. However, it is recommended that array
information is imported before hybridization import following the path
'array design' - 'array batch' - 'array slide' - 'hybridization' -
'scan' ...

The proposed plug-in should be usable starting from 'biosource' and
'array design' views down to the raw bioassay step, beyond this manual
import or use of other plug-ins are required. There is already a batch
raw data importer available at the BASE plug-in site
(http://baseplugins.thep.lu.se/wiki/uk.ac.scri.batchimporter) for
import of raw data and experiment creation. The SCRI batch importer
should be adopted, if necessary, to create experiments and raw
bioassays in line with the batch importer plug-in we create.

The import of array design items will not create fully working designs
because of two reasons; i) File upload is required for the design
definition and ii) file upload may be required for feature import. The
proposed plug-in does not support file upload. However, the items can
be created and modified, but files and features must be manually
fixed. This is not a big issue in reality since new design are not
created too often.

Starting at the bio source level, the user must make an initial import
of biosource information or use the BASE web interface for adding
biosource items. Samples are created from these biosources, in BASE
context this means that sample information needs to be added. In this
example we want to associate the samples to their parents, changing
sample properties follows a similar path but the import files do not
require parent information. The import of sample data is started with
selecting the biosources associated with the samples in BASE, and then
exporting this information to a file. This file is used as a template
for entering sample data to be stored in BASE. The reason for using
this template is to ensure that the correct biosource identifiers are
used for the samples. (A user can of course create the file without the
export from BASE but has to make sure that items are properly
referenced.) The biosource identifiers are required for making
parent-child association within BASE. When the samples are added to
this file, the file is imported into BASE. After this import, the
sample information is exported to a file again, and this file is used
as a template for the extracts information. Again, the reason for this
is to make sure that proper BASE identifiers are used. Extract
information is added to the template and imported back to BASE. This
procedure is performed for each level of data entry.

The information optionally exported to be used as templates above are
simple tab separated files with a few columns of information about the
items. The columns exported have a two-fold purpose; i) make sure that
BASE can make the proper associations when importing data, ii) guide
the users when adding information to the template file, i.e.,
descriptive names for human interpretation.

Dry-run that explain what will be done during import should be
supported. Potential dangers and errors should be reported. This
feature will allow the user to check that the import will behave as
expected.

Below follows a short description of item types that should be
supported by the importer. An OpenOffice.org spreadsheet
(batchimport_sample.ods) that contain format information with
explanations in one document is maintained and made available as an
attachment to ticket 1028 (http://base.thep.lu.se/ticket/1028) at the
BASE web site. The spreadsheet is work in progress and may change
depending on requirements until the batch import is finalized. Example
import files can be created from the spread sheet.

A tentative aim is that the spreadsheet may be used by laborative
staff to fill information to be used in import to BASE.


A short description on the different item types to be imported by the
batch importer:

Biosource

This is currently the top level of associations. No association are
needed except for the optional reference to an external item (a
property of the biosource). The import is a straightforward tab
separated import to fill the item properties.

Fields to import are: 'Name', 'Description', 'External id'

Mandatory columns for imports: 'Name'

Sample export file: biosource_out.txt


Sample

The import of item properties is a straightforward tab separated
import. Compared to biosource items there are additional columns for
associations to other items (the parent biosource and protocol). There
is one parent only if the parent is a biosource, pooled samples may
have multiple parents (other samples) defined using multiple lines.

Pooled samples create 'Event's that decrease the parent amount. The
original quantity of a pooled sample is the sum of the pooled
components.

Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
'External id', 'Created', 'Pooled'

Items to make associations to: 'Biosource', 'Protocol', 'Sample' for
pooled entries (also decrease quantity 'Sample Used')

Mandatory columns for imports: 'Name'

The important difference compared with biosource items is the possible
associations to bioassays and protocols.

Sample export file: sample_out.txt


Extract

The import of item properties is a straightforward tab separated
import. There are additional columns for associations to the parent
item and other items. There is one parent only if the parent is a
sample, pooled extracts may have multiple parents (other extracts)
defined using multiple lines.

Extracts and pooled extracts create 'Event's that decrease the parent
amount. The original quantity of a pooled extract is the sum of the
pooled components.

Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
'External id', 'Created', 'Pooled'

Items to make associations to: 'Sample' (also decrease quantity
'Sample Used'), 'Protocol'

Mandatory columns for imports: 'Name'

Extract export file: extract_out.txt


Labeled Extract

The import of item properties is a straightforward tab separated
import. There are additional columns for associations to the parent
item and other items. There is one parent only if the
parent is an extract, pooled labeled extracts may have multiple
parents (other labeled extracts) defined using multiple lines.

Labeled extracts and pooled labeled extracts create 'Event's that
decrease the parent amount. The original quantity of a pooled labeled
extract is the sum of the pooled components.

There is an additional column as compared to the extract items, Label.

Fields to import are: 'Name', 'Original quantity (µg)', 'Description',
'External id', 'Created', 'Pooled'

Items to make associations to: 'Extract' (also decrease quantity
'Extract Used'), 'Protocol', 'Label'

Mandatory columns for imports: 'Name'

Labeledextract export file: labeledextract_out.txt


Array Design

The import of item properties is a straightforward tab separated
import. There is one additional column for association to the parent
item. Note, the import of array design items will not create fully
working designs, see more information above.

Fields to import are: 'Name', 'Description', 'Arrays/slide'

Items to make associations to: 'Platform/Variant'

Mandatory columns for imports: 'Name', 'Platform/Variant',
'Arrays/slide'

Array batch export file: arraybatch_out.txt


Array Batch

The import of item properties is a straightforward tab separated
import. There is one additional column for association to the parent
item.

Fields to import are: 'Name', 'Description'

Items to make associations to: 'Array design', 'Protocol', 'Hardware'

Mandatory columns for imports: 'Name', 'Array design'

Array batch export file: arraybatch_out.txt


Array Slide

The import of item properties is a straightforward tab separated
import. There is one additional columns for association to the parent
item.

Fields to import are: 'Name', 'Description', 'Barcode', 'Destroyed'

Items to make associations to: 'Array batch'

Mandatory columns for imports: 'Name', 'Array batch'

Array slide export file: arrayslide_out.txt


Hybridization

The import of item properties is a straightforward tab separated
import. There are additional columns for associations to parent items
and other items. There may be one or more parents (labeled extracts),
the number depends on how many arrays there are on each slide and on
the number of channels of the platform used. Multiple parents are
defined on multiple lines.

There are additional columns as compared to the labeled extract items,
'Hardware', 'Array slide', 'Arrays', and 'Array index'.

Fields to import are: 'Name' 'Description', 'Created', 'Arrays',
'Array index'

Items to make associations to: 'Lableled extract' (also decrease
quantity 'Parent Used'), 'Protocol', 'Array slide', 'Hardware'

Mandatory columns for imports: 'Name', 'Arrays'

Hybridization export file: hybridization_out.txt


Scan

The import of item properties is a straightforward tab separated
import. There are additional columns for associations to the parent
item and other items. There is one parent hybridization only.

There are no additional column as compared with the previous
items. Image upload through is not supported by the importer.

Fields to import are: 'Name', 'Description'

Items to make associations to: 'Hybridization', 'Protocol', 'Hardware'

Mandatory columns for imports: 'Name'

Scan export file: scan_out.txt