This document describes how the BFS format is used with bioassay spot data when communicating with plug-ins. A typical plug-in execution sequence is: 1. Export current data to BFS 2. Execute the plug-in which processes the data 3. Import the transformed data to BASE This document discusses the import part of the procedure. The import takes place after a plug-in has taken some kind of action on the exported data and generated one or more output files. BASE can import the following type of information: * Intensity values (logged or non-logged). One value for each channel is required. * Extra values. As many as the plug-in generates. * Reporter lists. As many as the plug-in generates. The number of files needed and where to place the information depends on the subtype (matrix or serial) that is used. A plug-in should output the same subtype as it got for input. A plug-in can also generate any other type of file, for example, images, pdf files, etc. These files are only uploaded to BASE and attached to the new bioassay set. The plug-in must create a metadata file so that the importer knows what to look for. The metadata file (import) ========================== There are two BFS subtypes: * matrix: One data file is required for each value/formula to import. The columns in the data files represents assays. * serial: One data file is required for each assay. The columns in the data files represents values/formulas. Files ----- The [files] section is used to name the data files. The following entries are recognised and required: * rdata: The filename of a file containing new reporter information. The ID column is always the position number which must be a unique positive integer. Additional columns may be required depending on the import settings. * pdata: The filename of the file containing new assay information. The ID column is in most cases the ID of the parent assay, but if the 'multi-assay-parents' setting has been enabled, the ID can be any positive unique integer, and the Parent ID column holds a list of parent ID:s. * sdata1,...,sdataN: N entries numbered from 1 to N with the filenames of the files containing spot data. If the 'serial' subtype is used there should be one file for each assay in the bioassay set. If the 'matrix' subtype is used there should be one file for each entry in the [sdata] section. Additionally, all entries starting with 'x-' are considered to be extra files that should be uploaded to BASE and attached to the new bioassay set. Settings -------- The [settings] is used to control some aspects of the import. The following settings have been defined: * new-data-cube: If a value of '1' is specified the data is imported into a new data cube. A new data cube is needed whenever the position/reporter mapping has been changed or when parent assays have been merged. When a new data cube is used the 'rdata' file needs one of 'Internal ID' or 'External ID' columns so that the importer can map that position to a reporter. * multi-assay-parents: If a value of '1' is specified, it indicates that child assays may have more than one parent assay (eg. due to a merge). A new data cube is needed and this setting is ignored, unless also the 'new-data-cube' settings has been enabled. The 'pdata' file must have 'Parent ID' column that holds a comma-separated list with the ID:s of the parent assays. * transform: If not specified, the child spot data is assumed to use the same intensity transform as the parent data. The values to choose from are: NONE, LOG2, LOG10. Spot data --------- The [sdata] section contains metadata about the spot data (intensity values and spot extra values) that the plug-in generated. The order in this section is important. If the 'matrix' subtype is used the order must correspond to the 'sdataX' entries in the [files] section. Eg. The file named for key 'sdata1' is data for the first entry in this section. If the 'serial' subtype is used the order must correspond to the column order inside each of the 'sdataX' files. Eg. the first column is data for the first entry in this section. Entries with keys like 'Ch 1', 'Ch 2', etc. are reserved and corresponds to channel intensities. There must be exactly one entry for each channel in the experiment. Data values are always float values but they may be logged. This is conrolled by the 'transform' settings. All intensities must use the same intensity transform. Entries starting with 'x-' are extra values. The values are either in separate files (matrix subtype) or in their own columns (serial subtype). The value is the data type of the extra value. Allowed values are: 'text', 'float' and 'int'. The part of the key after 'x-' should be the name or external id of an already existing extra value type. Example: [sdata] ch1 float ch2 float x-abc float Reporter annotation file (import) ================================= This file is used to link spot data with the correct positions in the bioassay set. Required columns depends on if data is imported to the same data cube as the parent or not. * ID: The position numbers. This column is always needed. Values must be positive integers and duplicates are not allowed. The order doesn't matter. Since the position number has no specific meaning, we recommend that plug- ins that generate data for a new data cube simply start at 1 and then increment the value for each line. * Internal ID or External ID: Either the internal or external id:s of the reporter that is assigned to the given position. At least one of those columns are needed when importing data to a new data cube. The same reporter may be assigned to more than one position and the reporter must already exist in BASE. All sdata files should have the same number of rows (not counting the header line) as this file. Assay annotation file (import) ============================== This file is used to link spot data with the correct child assay. This file should have one entry for each child bioassay that should be created. * ID: Either the ID of a parent assay or a unique positive integer. This column is always needed. If the 'multi-assay-parents' option is enabled there is no special meaning to the value, otherwise the ID must be the ID of the parent assay. * Name: An optional column. If present, the child assay will be given the specified name. Otherwise a name is automatically generated. Typically the same as the parent assay. * Parent ID: Required if 'multi-assay-parents' is enabled. The value is a comma-separated list of parent assay ID:s. If the 'serial' subtype is used, the number of lines in this file should match the number of 'sdataX' entries in the [files] section. Data for the assay on the first line is found in the file specified by sdata1 and so on. If the 'matrix' subtype is used, the number of lines in this file should match the number of columns in each of the 'sdataX' files. Data for the assay on the first line is found in the first column in each data file and so on. Data files (import) =================== Data files should follow the same rules as exported data files.