There is a need to facilitate batch upload, creation, and modification of items in BASE. Some batch tools already exists such as - batch upload of files using zip files - batch creation of array slides - batch addition/deletion of reporters - import of annotations - list views offer an import button but which view do actually offer a plug-in that does anything? For a single or few experiment setting there is not so urgent need for batch tools but for a microarray facility where many experiments are prepared by facility staff the need is eminent. At a facility site many experiments are conducted by few people and all data upload is done by these staff members. To ease the upload of data to BASE we suggest to create one or several plug-ins that can create or modify several items in a batch by reading information from tab separated files. The idea here is not to create one single monolithic plug-in that imports a complete experiment and creates all necessary items, but rather imports, creates, or modifies items for a given context and makes the proper associations to parents. The word 'import' is used in this document but it could just as well be create or modify depending on user requirements. There is ongoing work on a full experiment import from tab2mage formatted files, see http://baseplugins.thep.lu.se/wiki/uk.ac.ebi.Tab2MageImporter. The plug-in requirements outlined here is to be used in a context where the user ideally works interactively with BASE in a step-by-step procedure. The idea is that the interaction with BASE starts on some level and data is added from this level down. Here a sample work session is outlined where RNA is extracted and labeled starting from some source of bio-material. In BASE this follows the path of 'biosource' - 'sample' - 'extract' - 'labeled extract', and then continuing with 'hybridization' - 'scan' - 'raw bioassay' - 'bioassay' - 'bioassay sets' - 'analysis'. However, it is recommended that array information is imported before hybridization import following the path 'array design' - 'array batch' - 'array slide' - 'hybridization' - 'scan' ... The proposed plug-in should be usable starting from 'biosource' and 'array design' views down to the raw bioassay step, beyond this manual import or use of other plug-ins are required. There is already a batch raw data importer available at the BASE plug-in site (http://baseplugins.thep.lu.se/wiki/uk.ac.scri.batchimporter) for import of raw data and experiment creation. The SCRI batch importer should be adopted, if necessary, to create experiments and raw bioassays in line with the batch importer plug-in we create. The import of array design items will not create fully working designs because of two reasons; i) File upload is required for the design definition and ii) file upload may be required for feature import. The proposed plug-in does not support file upload. However, the items can be created and modified, but files and features must be manually fixed. This is not a big issue in reality since new design are not created too often. Starting at the bio source level, the user must make an initial import of biosource information or use the BASE web interface for adding biosource items. Samples are created from these biosources, in BASE context this means that sample information needs to be added. In this example we want to associate the samples to their parents, changing sample properties follows a similar path but the import files do not require parent information. The import of sample data is started with selecting the biosources associated with the samples in BASE, and then exporting this information to a file. This file is used as a template for entering sample data to be stored in BASE. The reason for using this template is to ensure that the correct biosource identifiers are used for the samples. (A user can of course create the file without the export from BASE but has to make sure that items are properly referenced.) The biosource identifiers are required for making parent-child association within BASE. When the samples are added to this file, the file is imported into BASE. After this import, the sample information is exported to a file again, and this file is used as a template for the extracts information. Again, the reason for this is to make sure that proper BASE identifiers are used. Extract information is added to the template and imported back to BASE. This procedure is performed for each level of data entry. The information optionally exported to be used as templates above are simple tab separated files with a few columns of information about the items. The columns exported have a two-fold purpose; i) make sure that BASE can make the proper associations when importing data, ii) guide the users when adding information to the template file, i.e., descriptive names for human interpretation. Dry-run that explain what will be done during import should be supported. Potential dangers and errors should be reported. This feature will allow the user to check that the import will behave as expected. Below follows a short description of item types that should be supported by the importer. An OpenOffice.org spreadsheet (batchimport_sample.ods) that contain format information with explanations in one document is maintained and made available as an attachment to ticket 1028 (http://base.thep.lu.se/ticket/1028) at the BASE web site. The spreadsheet is work in progress and may change depending on requirements until the batch import is finalized. Example import files can be created from the spread sheet. A tentative aim is that the spreadsheet may be used by laborative staff to fill information to be used in import to BASE. A short description on the different item types to be imported by the batch importer: Biosource This is currently the top level of associations. No association are needed except for the optional reference to an external item (a property of the biosource). The import is a straightforward tab separated import to fill the item properties. Fields to import are: 'Name', 'Description', 'External id' Mandatory columns for imports: 'Name' Sample export file: biosource_out.txt Sample The import of item properties is a straightforward tab separated import. Compared to biosource items there are additional columns for associations to other items (the parent biosource and protocol). There is one parent only if the parent is a biosource, pooled samples may have multiple parents (other samples) defined using multiple lines. Pooled samples create 'Event's that decrease the parent amount. The original quantity of a pooled sample is the sum of the pooled components. Fields to import are: 'Name', 'Original quantity (µg)', 'Description', 'External id', 'Created', 'Pooled' Items to make associations to: 'Biosource', 'Protocol', 'Sample' for pooled entries (also decrease quantity 'Sample Used') Mandatory columns for imports: 'Name' The important difference compared with biosource items is the possible associations to bioassays and protocols. Sample export file: sample_out.txt Extract The import of item properties is a straightforward tab separated import. There are additional columns for associations to the parent item and other items. There is one parent only if the parent is a sample, pooled extracts may have multiple parents (other extracts) defined using multiple lines. Extracts and pooled extracts create 'Event's that decrease the parent amount. The original quantity of a pooled extract is the sum of the pooled components. Fields to import are: 'Name', 'Original quantity (µg)', 'Description', 'External id', 'Created', 'Pooled' Items to make associations to: 'Sample' (also decrease quantity 'Sample Used'), 'Protocol' Mandatory columns for imports: 'Name' Extract export file: extract_out.txt Labeled Extract The import of item properties is a straightforward tab separated import. There are additional columns for associations to the parent item and other items. There is one parent only if the parent is an extract, pooled labeled extracts may have multiple parents (other labeled extracts) defined using multiple lines. Labeled extracts and pooled labeled extracts create 'Event's that decrease the parent amount. The original quantity of a pooled labeled extract is the sum of the pooled components. There is an additional column as compared to the extract items, Label. Fields to import are: 'Name', 'Original quantity (µg)', 'Description', 'External id', 'Created', 'Pooled' Items to make associations to: 'Extract' (also decrease quantity 'Extract Used'), 'Protocol', 'Label' Mandatory columns for imports: 'Name' Labeledextract export file: labeledextract_out.txt Array Design The import of item properties is a straightforward tab separated import. There is one additional column for association to the parent item. Note, the import of array design items will not create fully working designs, see more information above. Fields to import are: 'Name', 'Description', 'Arrays/slide' Items to make associations to: 'Platform/Variant' Mandatory columns for imports: 'Name', 'Platform/Variant', 'Arrays/slide' Array batch export file: arraybatch_out.txt Array Batch The import of item properties is a straightforward tab separated import. There is one additional column for association to the parent item. Fields to import are: 'Name', 'Description' Items to make associations to: 'Array design', 'Protocol', 'Hardware' Mandatory columns for imports: 'Name', 'Array design' Array batch export file: arraybatch_out.txt Array Slide The import of item properties is a straightforward tab separated import. There is one additional columns for association to the parent item. Fields to import are: 'Name', 'Description', 'Barcode', 'Destroyed' Items to make associations to: 'Array batch' Mandatory columns for imports: 'Name', 'Array batch' Array slide export file: arrayslide_out.txt Hybridization The import of item properties is a straightforward tab separated import. There are additional columns for associations to parent items and other items. There may be one or more parents (labeled extracts), the number depends on how many arrays there are on each slide and on the number of channels of the platform used. Multiple parents are defined on multiple lines. There are additional columns as compared to the labeled extract items, 'Hardware', 'Array slide', 'Arrays', and 'Array index'. Fields to import are: 'Name' 'Description', 'Created', 'Arrays', 'Array index' Items to make associations to: 'Lableled extract' (also decrease quantity 'Parent Used'), 'Protocol', 'Array slide', 'Hardware' Mandatory columns for imports: 'Name', 'Arrays' Hybridization export file: hybridization_out.txt Scan The import of item properties is a straightforward tab separated import. There are additional columns for associations to the parent item and other items. There is one parent hybridization only. There are no additional column as compared with the previous items. Image upload through is not supported by the importer. Fields to import are: 'Name', 'Description' Items to make associations to: 'Hybridization', 'Protocol', 'Hardware' Mandatory columns for imports: 'Name' Scan export file: scan_out.txt