1 |
|
---|
2 | This document describes how the BFS format is used with bioassay spot
|
---|
3 | data when communicating with plug-ins.
|
---|
4 |
|
---|
5 | A typical plug-in execution sequence is:
|
---|
6 | 1. Export current data to BFS
|
---|
7 | 2. Execute the plug-in which processes the data
|
---|
8 | 3. Import the transformed data to BASE
|
---|
9 |
|
---|
10 |
|
---|
11 | This document discusses the import part of the procedure.
|
---|
12 |
|
---|
13 | The import takes place after a plug-in has taken some kind of action on
|
---|
14 | the exported data and generated one or more output files. BASE can import
|
---|
15 | the following type of information:
|
---|
16 |
|
---|
17 | * Intensity values (logged or non-logged). One value for each channel is
|
---|
18 | required.
|
---|
19 | * Extra values. As many as the plug-in generates.
|
---|
20 | * Reporter lists. As many as the plug-in generates.
|
---|
21 |
|
---|
22 | The number of files needed and where to place the information depends on
|
---|
23 | the subtype (matrix or serial) that is used. A plug-in should output the
|
---|
24 | same subtype as it got for input.
|
---|
25 |
|
---|
26 | A plug-in can also generate any other type of file, for example, images,
|
---|
27 | pdf files, etc. These files are only uploaded to BASE and attached to the
|
---|
28 | new bioassay set.
|
---|
29 |
|
---|
30 | The plug-in must create a metadata file so that the importer knows what
|
---|
31 | to look for.
|
---|
32 |
|
---|
33 | The metadata file (import)
|
---|
34 | ==========================
|
---|
35 |
|
---|
36 | There are two BFS subtypes:
|
---|
37 |
|
---|
38 | * matrix: One data file is required for each value/formula to
|
---|
39 | import. The columns in the data files represents assays.
|
---|
40 |
|
---|
41 | * serial: One data file is required for each assay. The columns
|
---|
42 | in the data files represents values/formulas.
|
---|
43 |
|
---|
44 | Files
|
---|
45 | -----
|
---|
46 |
|
---|
47 | The [files] section is used to name the data files. The following
|
---|
48 | entries are recognised and required:
|
---|
49 |
|
---|
50 | * rdata: The filename of a file containing new reporter information. The ID
|
---|
51 | column is always the position number which must be a unique positive
|
---|
52 | integer. Additional columns may be required depending on the import
|
---|
53 | settings.
|
---|
54 | * pdata: The filename of the file containing new assay information.
|
---|
55 | The ID column is in most cases the ID of the parent assay, but
|
---|
56 | if the 'multi-assay-parents' setting has been enabled, the ID can be any
|
---|
57 | positive unique integer, and the Parent ID column holds a list of
|
---|
58 | parent ID:s.
|
---|
59 | * sdata1,...,sdataN: N entries numbered from 1 to N with the filenames
|
---|
60 | of the files containing spot data. If the 'serial' subtype is used there
|
---|
61 | should be one file for each assay in the bioassay set. If the 'matrix'
|
---|
62 | subtype is used there should be one file for each entry in the [sdata]
|
---|
63 | section.
|
---|
64 |
|
---|
65 | Additionally, all entries starting with 'x-' are considered to be extra files
|
---|
66 | that should be uploaded to BASE and attached to the new bioassay set.
|
---|
67 |
|
---|
68 | Settings
|
---|
69 | --------
|
---|
70 |
|
---|
71 | The [settings] is used to control some aspects of the import. The following
|
---|
72 | settings have been defined:
|
---|
73 |
|
---|
74 | * new-data-cube: If a value of '1' is specified the data is imported into a
|
---|
75 | new data cube. A new data cube is needed whenever the position/reporter
|
---|
76 | mapping has been changed or when parent assays have been merged. When a
|
---|
77 | new data cube is used the 'rdata' file needs one of 'Internal ID' or
|
---|
78 | 'External ID' columns so that the importer can map that position to a
|
---|
79 | reporter.
|
---|
80 | * multi-assay-parents: If a value of '1' is specified, it indicates that child
|
---|
81 | assays may have more than one parent assay (eg. due to a merge). A new
|
---|
82 | data cube is needed and this setting is ignored, unless also the
|
---|
83 | 'new-data-cube' settings has been enabled. The 'pdata' file must have
|
---|
84 | 'Parent ID' column that holds a comma-separated list with the ID:s of the
|
---|
85 | parent assays.
|
---|
86 | * transform: If not specified, the child spot data is assumed to use the same
|
---|
87 | intensity transform as the parent data. The values to choose from are: NONE,
|
---|
88 | LOG2, LOG10.
|
---|
89 |
|
---|
90 | Spot data
|
---|
91 | ---------
|
---|
92 |
|
---|
93 | The [sdata] section contains metadata about the spot data (intensity values
|
---|
94 | and spot extra values) that the plug-in generated. The order in this section
|
---|
95 | is important.
|
---|
96 |
|
---|
97 | If the 'matrix' subtype is used the order must correspond to the 'sdataX'
|
---|
98 | entries in the [files] section. Eg. The file named for key 'sdata1' is data
|
---|
99 | for the first entry in this section.
|
---|
100 |
|
---|
101 | If the 'serial' subtype is used the order must correspond to the column
|
---|
102 | order inside each of the 'sdataX' files. Eg. the first column is data for
|
---|
103 | the first entry in this section.
|
---|
104 |
|
---|
105 | Entries with keys like 'Ch 1', 'Ch 2', etc. are reserved and corresponds to
|
---|
106 | channel intensities. There must be exactly one entry for each channel in the
|
---|
107 | experiment.
|
---|
108 |
|
---|
109 | Data values are always float values but they may be logged. This is conrolled
|
---|
110 | by the 'transform' settings. All intensities must use the same intensity
|
---|
111 | transform.
|
---|
112 |
|
---|
113 | Entries starting with 'x-' are extra values. The values are either in separate
|
---|
114 | files (matrix subtype) or in their own columns (serial subtype). The value is
|
---|
115 | the data type of the extra value. Allowed values are: 'text', 'float' and 'int'.
|
---|
116 | The part of the key after 'x-' should be the name or external id of an already
|
---|
117 | existing extra value type.
|
---|
118 |
|
---|
119 | Example:
|
---|
120 |
|
---|
121 | [sdata]
|
---|
122 | ch1 float
|
---|
123 | ch2 float
|
---|
124 | x-abc float
|
---|
125 |
|
---|
126 |
|
---|
127 | Reporter annotation file (import)
|
---|
128 | =================================
|
---|
129 |
|
---|
130 | This file is used to link spot data with the correct positions in the bioassay
|
---|
131 | set. Required columns depends on if data is imported to the same data cube as
|
---|
132 | the parent or not.
|
---|
133 |
|
---|
134 | * ID: The position numbers. This column is always needed. Values must be
|
---|
135 | positive integers and duplicates are not allowed. The order doesn't matter.
|
---|
136 | Since the position number has no specific meaning, we recommend that plug-
|
---|
137 | ins that generate data for a new data cube simply start at 1 and then
|
---|
138 | increment the value for each line.
|
---|
139 | * Internal ID or External ID: Either the internal or external id:s of the
|
---|
140 | reporter that is assigned to the given position. At least one of those
|
---|
141 | columns are needed when importing data to a new data cube. The same reporter
|
---|
142 | may be assigned to more than one position and the reporter must already
|
---|
143 | exist in BASE.
|
---|
144 |
|
---|
145 | All sdata files should have the same number of rows (not counting the header
|
---|
146 | line) as this file.
|
---|
147 |
|
---|
148 | Assay annotation file (import)
|
---|
149 | ==============================
|
---|
150 |
|
---|
151 | This file is used to link spot data with the correct child assay. This file
|
---|
152 | should have one entry for each child bioassay that should be created.
|
---|
153 |
|
---|
154 | * ID: Either the ID of a parent assay or a unique positive integer. This
|
---|
155 | column is always needed. If the 'multi-assay-parents' option is enabled
|
---|
156 | there is no special meaning to the value, otherwise the ID must be the
|
---|
157 | ID of the parent assay.
|
---|
158 | * Name: An optional column. If present, the child assay will be given the
|
---|
159 | specified name. Otherwise a name is automatically generated. Typically
|
---|
160 | the same as the parent assay.
|
---|
161 | * Parent ID: Required if 'multi-assay-parents' is enabled. The value is a
|
---|
162 | comma-separated list of parent assay ID:s.
|
---|
163 |
|
---|
164 | If the 'serial' subtype is used, the number of lines in this file should match
|
---|
165 | the number of 'sdataX' entries in the [files] section. Data for the assay on
|
---|
166 | the first line is found in the file specified by sdata1 and so on.
|
---|
167 |
|
---|
168 | If the 'matrix' subtype is used, the number of lines in this file should match
|
---|
169 | the number of columns in each of the 'sdataX' files. Data for the assay on the
|
---|
170 | first line is found in the first column in each data file and so on.
|
---|
171 |
|
---|
172 | Data files (import)
|
---|
173 | ===================
|
---|
174 |
|
---|
175 | Data files should follow the same rules as exported data files.
|
---|