This document describes how the BFS format is used with bioassay spot
data when communicating with plug-ins.

A typical plug-in execution sequence is:
 1. Export current data to BFS
 2. Execute the plug-in which processes the data
 3. Import the transformed data to BASE


This document discusses the import part of the procedure.

The import takes place after a plug-in has taken some kind of action on
the exported data and generated one or more output files. BASE can import
the following type of information:

 * Intensity values (logged or non-logged). One value for each channel is
   required.
 * Extra values. As many as the plug-in generates.
 * Reporter lists. As many as the plug-in generates.

The number of files needed and where to place the information depends on
the subtype (matrix or serial) that is used. A plug-in should output the
same subtype as it got for input.

A plug-in can also generate any other type of file, for example, images,
pdf files, etc. These files are only uploaded to BASE and attached to the
new bioassay set.

The plug-in must create a metadata file so that the importer knows what
to look for.

The metadata file (import)
==========================

There are two BFS subtypes:

* matrix: One data file is required for each value/formula to
  import. The columns in the data files represents assays.

* serial: One data file is required for each assay. The columns
  in the data files represents values/formulas.

Files
-----

The [files] section is used to name the data files. The following
entries are recognised and required:

 * rdata: The filename of a file containing new reporter information. The ID
   column is always the position number which must be a unique positive
	integer. Additional columns may be required depending on the import 
	settings.
 * pdata: The filename of the file containing new assay information.
   The ID column is in most cases the ID of the parent assay, but
	if the 'multi-assay-parents' setting has been enabled, the ID can be any
	positive unique integer, and the Parent ID column holds a list of
	parent ID:s.
 * sdata1,...,sdataN: N entries numbered from 1 to N with the filenames
   of the files containing spot data. If the 'serial' subtype is used there
   should be one file for each assay in the bioassay set. If the 'matrix' 
   subtype is used there should be one file for each entry in the [sdata]
   section.

Additionally, all entries starting with 'x-' are considered to be extra files
that should be uploaded to BASE and attached to the new bioassay set.

Settings
--------

The [settings] is used to control some aspects of the import. The following
settings have been defined:

 * new-data-cube: If a value of '1' is specified the data is imported into a 
   new data cube. A new data cube is needed whenever the position/reporter
	mapping has been changed or when parent assays have been merged. When a
	new data cube is used the 'rdata' file needs one of 'Internal ID' or
	'External ID' columns so that the importer can map that position to a
	reporter.
 * multi-assay-parents: If a value of '1' is specified, it indicates that child
   assays may have more than one parent assay (eg. due to a merge). A new
	data cube is needed and this setting is ignored, unless also the 
	'new-data-cube' settings has been enabled. The 'pdata' file must have 
	'Parent ID' column that holds a comma-separated list with the ID:s of the
	parent assays.
 * transform: If not specified, the child spot data is assumed to use the same
   intensity transform as the parent data. The values to choose from are: NONE,
	LOG2, LOG10.

Spot data
---------

The [sdata] section contains metadata about the spot data (intensity values
and spot extra values) that the plug-in generated. The order in this section
is important.

If the 'matrix' subtype is used the order must correspond to the 'sdataX'
entries in the [files] section. Eg. The file named for key 'sdata1' is data
for the first entry in this section.

If the 'serial' subtype is used the order must correspond to the column
order inside each of the 'sdataX' files. Eg. the first column is data for
the first entry in this section.

Entries with keys like 'Ch 1', 'Ch 2', etc. are reserved and corresponds to
channel intensities. There must be exactly one entry for each channel in the 
experiment.

Data values are always float values but they may be logged. This is conrolled
by the 'transform' settings. All intensities must use the same intensity 
transform.

Entries starting with 'x-' are extra values. The values are either in separate
files (matrix subtype) or in their own columns (serial subtype). The value is
the data type of the extra value. Allowed values are: 'text', 'float' and 'int'.
The part of the key after 'x-' should be the name or external id of an already
existing extra value type.

Example:

[sdata]
ch1	float
ch2	float
x-abc	float


Reporter annotation file (import)
=================================

This file is used to link spot data with the correct positions in the bioassay
set. Required columns depends on if data is imported to the same data cube as 
the parent or not.

 * ID: The  position numbers. This column is always needed. Values must be
   positive integers and duplicates are not allowed. The order doesn't matter.
	Since the position number has no specific meaning, we recommend that plug-
	ins that generate data for a new data cube simply start at 1 and then 
	increment the value for each line.
 * Internal ID or External ID: Either the internal or external id:s of the
   reporter that is assigned to the given position. At least one of those
	columns are needed when importing data to a new data cube. The same reporter 
	may be assigned to more than one position and the reporter must already
	exist in BASE.

All sdata files should have the same number of rows (not counting the header
line) as this file.

Assay annotation file (import)
==============================

This file is used to link spot data with the correct child assay. This file
should have one entry for each child bioassay that should be created. 

 * ID: Either the ID of a parent assay or a unique positive integer. This 
   column is always needed. If the 'multi-assay-parents' option is enabled 
	there is no special meaning to the value, otherwise the ID must be the 
	ID of the parent assay.
 * Name: An optional column. If present, the child assay will be given the
   specified name. Otherwise a name is automatically generated. Typically
	the same as the parent assay.
 * Parent ID: Required if 'multi-assay-parents' is enabled. The value is a
   comma-separated list of parent assay ID:s.

If the 'serial' subtype is used, the number of lines in this file should match
the number of 'sdataX' entries in the [files] section. Data for the assay on
the first line is found in the file specified by sdata1 and so on.

If the 'matrix' subtype is used, the number of lines in this file should match
the number of columns in each of the 'sdataX' files. Data for the assay on the
first line is found in the first column in each data file and so on.

Data files (import)
===================

Data files should follow the same rules as exported data files.