1 |
|
---|
2 | This document describes how the BFS format is used with bioassay spot
|
---|
3 | data when communicating with plug-ins.
|
---|
4 |
|
---|
5 | A typical plug-in execution sequence is:
|
---|
6 | 1. Export current data to BFS
|
---|
7 | 2. Execute the plug-in which processes the data
|
---|
8 | 3. Import the transformed data to BASE
|
---|
9 |
|
---|
10 |
|
---|
11 | The export will generate at least two files. One metadata file
|
---|
12 | and one data file. Row and column annotation files can be created
|
---|
13 | if the plug-in needs it. Additional data files can also be created
|
---|
14 | if needed. This document only discusses the export part of the
|
---|
15 | procedure. Note that reporter and assay annotation files are always
|
---|
16 | needed if new spot data is going to be imported.
|
---|
17 |
|
---|
18 | The metadata file (export)
|
---|
19 | ==========================
|
---|
20 |
|
---|
21 | There are two BFS subtypes:
|
---|
22 |
|
---|
23 | * matrix: One data file is required for each value/formula to
|
---|
24 | export. The columns in the data files represents assays.
|
---|
25 |
|
---|
26 | * serial: One data file is required for each assay. The columns
|
---|
27 | in the data files represents values/formulas.
|
---|
28 |
|
---|
29 | Files
|
---|
30 | -----
|
---|
31 |
|
---|
32 | For both subtypes a [files] section is used to name the files holding
|
---|
33 | data and annotations. The following keys should be used:
|
---|
34 |
|
---|
35 | * rdata: The filename of the file containing reporter annotations
|
---|
36 | * pdata: The filename of the file containing assay annotations
|
---|
37 | * sdata1,...,sdataN: N entries numbered from 1 to N with the filenames
|
---|
38 | of the files containing spot data. If the 'serial' subtype is used there
|
---|
39 | should be one file for each assay in the bioassay set. If the 'matrix'
|
---|
40 | subtype is used there should be one file for each entry in the [sdata]
|
---|
41 | section (see the 'Spot data' section below).
|
---|
42 |
|
---|
43 | The rdata and pdata files are optional. Other custom files may be included.
|
---|
44 | It is recommended that custom file entries use 'x-' as a prefix to avoid
|
---|
45 | key clashes in future version.
|
---|
46 |
|
---|
47 | Example:
|
---|
48 |
|
---|
49 | [files]
|
---|
50 | rdata rdata.txt
|
---|
51 | pdata pdata.txt
|
---|
52 | sdata1 sdata1.txt
|
---|
53 | sdata2 sdata2.txt
|
---|
54 | x-custom custom.txt
|
---|
55 |
|
---|
56 |
|
---|
57 | Spot data
|
---|
58 | ---------
|
---|
59 |
|
---|
60 | The [sdata] section contains metadata about the spot data that has been
|
---|
61 | exported. The order in this section is important.
|
---|
62 |
|
---|
63 | If the 'matrix' subtype is used the order must match the 'sdataX'
|
---|
64 | entries in the [files] section. Eg. the data that corresponds to the
|
---|
65 | first line in this section is found in the 'sdata1' file. The number
|
---|
66 | of entries in this section must be the same as the number of 'sdataX'
|
---|
67 | entries in the [files] section.
|
---|
68 |
|
---|
69 | If the 'serial' subtype is used the order must match the column order
|
---|
70 | inside each of the 'sdataX' files. Eg. the data that corresponds to the
|
---|
71 | first line in this section is found in the first column in all 'sdata'
|
---|
72 | files. The number of entries in this section must match the number of
|
---|
73 | column in the 'sdata' files.
|
---|
74 |
|
---|
75 | The key of each line is the name or title of the data that is exported.
|
---|
76 | The value describes the data type and can be either 'text', 'float' or
|
---|
77 | 'int'.
|
---|
78 |
|
---|
79 | Example:
|
---|
80 |
|
---|
81 | [sdata]
|
---|
82 | Ch 1 float
|
---|
83 | Ch 2 float
|
---|
84 | Weight float
|
---|
85 | Flag int
|
---|
86 |
|
---|
87 |
|
---|
88 | Plug-in parameters
|
---|
89 | ------------------
|
---|
90 |
|
---|
91 | The [parameters] section contains extra parameters needed by the plug-in.
|
---|
92 | Keys and values are defined by the plug-in/job configuration. Duplicate
|
---|
93 | keys are not allowed, and order is not important. Multiple values for the
|
---|
94 | same parameter are separated with a tab character.
|
---|
95 |
|
---|
96 | Example:
|
---|
97 |
|
---|
98 | [parameters]
|
---|
99 | beta 0.5
|
---|
100 | length 100
|
---|
101 | vector 10 10.3 23
|
---|
102 | median true
|
---|
103 |
|
---|
104 |
|
---|
105 | Reporter annotation file (export)
|
---|
106 | =================================
|
---|
107 |
|
---|
108 | The file used for reporter annotations is given by the 'rdata' entry in the
|
---|
109 | [files] section. This file is optional. If it is used the only required
|
---|
110 | column is the ID column. In this case, the ID column holds the internal
|
---|
111 | bioassay set 'position' value. All sdata files should have the same number
|
---|
112 | of rows as this file (not counting the header line) and data should be sorted
|
---|
113 | in the same order.
|
---|
114 |
|
---|
115 | Additional columns may be included in the export.
|
---|
116 |
|
---|
117 | Note that the same underlying reporter may be assigned to more than one
|
---|
118 | position. If the plug-in needs to operate on merged-per-reporter data
|
---|
119 | the export should include either the internal or external reporter id in
|
---|
120 | an additional column and use this information to determine what should
|
---|
121 | be merged.
|
---|
122 |
|
---|
123 |
|
---|
124 | Assay annotation file (export)
|
---|
125 | ==============================
|
---|
126 |
|
---|
127 | The file used for assay annotations is given by the 'pdata' entry in the
|
---|
128 | [files] section. This file is optional. If it is used, the only required
|
---|
129 | column is the ID column. In this case, the ID column holds the internal
|
---|
130 | bioassay 'id' value.
|
---|
131 |
|
---|
132 | If the 'matrix' subtype is used the columns in the sdata files must be in
|
---|
133 | the same order as the assays appear in this file. The number of columns in
|
---|
134 | the sdata files must be the same as the number of rows in this file (not
|
---|
135 | counting the header line).
|
---|
136 |
|
---|
137 | If the 'serial' subtype is used, the 'sdata1' file has data for the assay
|
---|
138 | that is described in the first line in this file, the 'sdata2' file has data
|
---|
139 | for the second assay, etc. The number of data files must match the number of
|
---|
140 | lines in this file.
|
---|
141 |
|
---|
142 | Additional columns may be included in the export.
|
---|
143 |
|
---|
144 | Data files (export)
|
---|
145 | ===================
|
---|
146 |
|
---|
147 | Data files contains data in matrix format. More than one data file may be
|
---|
148 | required. The organisation of the data depends on the format subtype. In
|
---|
149 | both subtypes the number and order of the rows must match the information
|
---|
150 | in the reporter annotation file.
|
---|
151 |
|
---|
152 | If the 'matrix' subtype is used, the columns corresponds to assays. The
|
---|
153 | number of columns and their order must match the lines in the assays
|
---|
154 | annotation file. The number of sdata files and their content is defined
|
---|
155 | by the entries in the [sdata] section.
|
---|
156 |
|
---|
157 | If the 'serial' subtype is used, the the number of columns and their order
|
---|
158 | must match the entries in the [sdata] section. Each sdata file has data from
|
---|
159 | one assay. The number of sdata files in the [files] section must match the
|
---|
160 | number of lines in the assays annotation file. Eg. spot data for the assay
|
---|
161 | on the first line is found in the file referenced in the 'sdata1' entry.
|
---|
162 |
|
---|
163 |
|
---|