| 1 | |
|---|
| 2 | This document describes how the BFS format is used with bioassay spot |
|---|
| 3 | data when communicating with plug-ins. |
|---|
| 4 | |
|---|
| 5 | A typical plug-in execution sequence is: |
|---|
| 6 | 1. Export current data to BFS |
|---|
| 7 | 2. Execute the plug-in which processes the data |
|---|
| 8 | 3. Import the transformed data to BASE |
|---|
| 9 | |
|---|
| 10 | |
|---|
| 11 | This document discusses the import part of the procedure. |
|---|
| 12 | |
|---|
| 13 | The import takes place after a plug-in has taken some kind of action on |
|---|
| 14 | the exported data and generated one or more output files. BASE can import |
|---|
| 15 | the following type of information: |
|---|
| 16 | |
|---|
| 17 | * Intensity values (logged or non-logged). One value for each channel is |
|---|
| 18 | required. |
|---|
| 19 | * Extra values. As many as the plug-in generates. |
|---|
| 20 | * Reporter lists. As many as the plug-in generates. |
|---|
| 21 | |
|---|
| 22 | The number of files needed and where to place the information depends on |
|---|
| 23 | the subtype (matrix or serial) that is used. A plug-in should output the |
|---|
| 24 | same subtype as it got for input. |
|---|
| 25 | |
|---|
| 26 | A plug-in can also generate any other type of file, for example, images, |
|---|
| 27 | pdf files, etc. These files are only uploaded to BASE and attached to the |
|---|
| 28 | new bioassay set. |
|---|
| 29 | |
|---|
| 30 | The plug-in must create a metadata file so that the importer knows what |
|---|
| 31 | to look for. |
|---|
| 32 | |
|---|
| 33 | The metadata file (import) |
|---|
| 34 | ========================== |
|---|
| 35 | |
|---|
| 36 | There are two BFS subtypes: |
|---|
| 37 | |
|---|
| 38 | * matrix: One data file is required for each value/formula to |
|---|
| 39 | import. The columns in the data files represents assays. |
|---|
| 40 | |
|---|
| 41 | * serial: One data file is required for each assay. The columns |
|---|
| 42 | in the data files represents values/formulas. |
|---|
| 43 | |
|---|
| 44 | Files |
|---|
| 45 | ----- |
|---|
| 46 | |
|---|
| 47 | The [files] section is used to name the data files. The following |
|---|
| 48 | entries are recognised and required: |
|---|
| 49 | |
|---|
| 50 | * rdata: The filename of a file containing new reporter information. The ID |
|---|
| 51 | column is always the position number which must be a unique positive |
|---|
| 52 | integer. Additional columns may be required depending on the import |
|---|
| 53 | settings. |
|---|
| 54 | * pdata: The filename of the file containing new assay information. |
|---|
| 55 | The ID column is in most cases the ID of the parent assay, but |
|---|
| 56 | if the 'multi-assay-parents' setting has been enabled, the ID can be any |
|---|
| 57 | positive unique integer, and the Parent ID column holds a list of |
|---|
| 58 | parent ID:s. |
|---|
| 59 | * sdata1,...,sdataN: N entries numbered from 1 to N with the filenames |
|---|
| 60 | of the files containing spot data. If the 'serial' subtype is used there |
|---|
| 61 | should be one file for each assay in the bioassay set. If the 'matrix' |
|---|
| 62 | subtype is used there should be one file for each entry in the [sdata] |
|---|
| 63 | section. |
|---|
| 64 | |
|---|
| 65 | Additionally, all entries starting with 'x-' are considered to be extra files |
|---|
| 66 | that should be uploaded to BASE and attached to the new bioassay set. |
|---|
| 67 | |
|---|
| 68 | Settings |
|---|
| 69 | -------- |
|---|
| 70 | |
|---|
| 71 | The [settings] is used to control some aspects of the import. The following |
|---|
| 72 | settings have been defined: |
|---|
| 73 | |
|---|
| 74 | * new-data-cube: If a value of '1' is specified the data is imported into a |
|---|
| 75 | new data cube. A new data cube is needed whenever the position/reporter |
|---|
| 76 | mapping has been changed or when parent assays have been merged. When a |
|---|
| 77 | new data cube is used the 'rdata' file needs one of 'Internal ID' or |
|---|
| 78 | 'External ID' columns so that the importer can map that position to a |
|---|
| 79 | reporter. |
|---|
| 80 | * multi-assay-parents: If a value of '1' is specified, it indicates that child |
|---|
| 81 | assays may have more than one parent assay (eg. due to a merge). A new |
|---|
| 82 | data cube is needed and this setting is ignored, unless also the |
|---|
| 83 | 'new-data-cube' settings has been enabled. The 'pdata' file must have |
|---|
| 84 | 'Parent ID' column that holds a comma-separated list with the ID:s of the |
|---|
| 85 | parent assays. |
|---|
| 86 | * transform: If not specified, the child spot data is assumed to use the same |
|---|
| 87 | intensity transform as the parent data. The values to choose from are: NONE, |
|---|
| 88 | LOG2, LOG10. |
|---|
| 89 | |
|---|
| 90 | Spot data |
|---|
| 91 | --------- |
|---|
| 92 | |
|---|
| 93 | The [sdata] section contains metadata about the spot data (intensity values |
|---|
| 94 | and spot extra values) that the plug-in generated. The order in this section |
|---|
| 95 | is important. |
|---|
| 96 | |
|---|
| 97 | If the 'matrix' subtype is used the order must correspond to the 'sdataX' |
|---|
| 98 | entries in the [files] section. Eg. The file named for key 'sdata1' is data |
|---|
| 99 | for the first entry in this section. |
|---|
| 100 | |
|---|
| 101 | If the 'serial' subtype is used the order must correspond to the column |
|---|
| 102 | order inside each of the 'sdataX' files. Eg. the first column is data for |
|---|
| 103 | the first entry in this section. |
|---|
| 104 | |
|---|
| 105 | Entries with keys like 'Ch 1', 'Ch 2', etc. are reserved and corresponds to |
|---|
| 106 | channel intensities. There must be exactly one entry for each channel in the |
|---|
| 107 | experiment. |
|---|
| 108 | |
|---|
| 109 | Data values are always float values but they may be logged. This is conrolled |
|---|
| 110 | by the 'transform' settings. All intensities must use the same intensity |
|---|
| 111 | transform. |
|---|
| 112 | |
|---|
| 113 | Entries starting with 'x-' are extra values. The values are either in separate |
|---|
| 114 | files (matrix subtype) or in their own columns (serial subtype). The value is |
|---|
| 115 | the data type of the extra value. Allowed values are: 'text', 'float' and 'int'. |
|---|
| 116 | The part of the key after 'x-' should be the name or external id of an already |
|---|
| 117 | existing extra value type. |
|---|
| 118 | |
|---|
| 119 | Example: |
|---|
| 120 | |
|---|
| 121 | [sdata] |
|---|
| 122 | ch1 float |
|---|
| 123 | ch2 float |
|---|
| 124 | x-abc float |
|---|
| 125 | |
|---|
| 126 | |
|---|
| 127 | Reporter annotation file (import) |
|---|
| 128 | ================================= |
|---|
| 129 | |
|---|
| 130 | This file is used to link spot data with the correct positions in the bioassay |
|---|
| 131 | set. Required columns depends on if data is imported to the same data cube as |
|---|
| 132 | the parent or not. |
|---|
| 133 | |
|---|
| 134 | * ID: The position numbers. This column is always needed. Values must be |
|---|
| 135 | positive integers and duplicates are not allowed. The order doesn't matter. |
|---|
| 136 | Since the position number has no specific meaning, we recommend that plug- |
|---|
| 137 | ins that generate data for a new data cube simply start at 1 and then |
|---|
| 138 | increment the value for each line. |
|---|
| 139 | * Internal ID or External ID: Either the internal or external id:s of the |
|---|
| 140 | reporter that is assigned to the given position. At least one of those |
|---|
| 141 | columns are needed when importing data to a new data cube. The same reporter |
|---|
| 142 | may be assigned to more than one position and the reporter must already |
|---|
| 143 | exist in BASE. |
|---|
| 144 | |
|---|
| 145 | All sdata files should have the same number of rows (not counting the header |
|---|
| 146 | line) as this file. |
|---|
| 147 | |
|---|
| 148 | Assay annotation file (import) |
|---|
| 149 | ============================== |
|---|
| 150 | |
|---|
| 151 | This file is used to link spot data with the correct child assay. This file |
|---|
| 152 | should have one entry for each child bioassay that should be created. |
|---|
| 153 | |
|---|
| 154 | * ID: Either the ID of a parent assay or a unique positive integer. This |
|---|
| 155 | column is always needed. If the 'multi-assay-parents' option is enabled |
|---|
| 156 | there is no special meaning to the value, otherwise the ID must be the |
|---|
| 157 | ID of the parent assay. |
|---|
| 158 | * Name: An optional column. If present, the child assay will be given the |
|---|
| 159 | specified name. Otherwise a name is automatically generated. Typically |
|---|
| 160 | the same as the parent assay. |
|---|
| 161 | * Parent ID: Required if 'multi-assay-parents' is enabled. The value is a |
|---|
| 162 | comma-separated list of parent assay ID:s. |
|---|
| 163 | |
|---|
| 164 | If the 'serial' subtype is used, the number of lines in this file should match |
|---|
| 165 | the number of 'sdataX' entries in the [files] section. Data for the assay on |
|---|
| 166 | the first line is found in the file specified by sdata1 and so on. |
|---|
| 167 | |
|---|
| 168 | If the 'matrix' subtype is used, the number of lines in this file should match |
|---|
| 169 | the number of columns in each of the 'sdataX' files. Data for the assay on the |
|---|
| 170 | first line is found in the first column in each data file and so on. |
|---|
| 171 | |
|---|
| 172 | Data files (import) |
|---|
| 173 | =================== |
|---|
| 174 | |
|---|
| 175 | Data files should follow the same rules as exported data files. |
|---|