Ticket #1440: bfs-spotdata-export-1.txt

File bfs-spotdata-export-1.txt, 5.7 KB (added by Nicklas Nordborg, 14 years ago)

Specification for BFS with spotdata that is exported from BASE

Line 
1
2This document describes how the BFS format is used with bioassay spot
3data when communicating with plug-ins.
4
5A typical plug-in execution sequence is:
6 1. Export current data to BFS
7 2. Execute the plug-in which processes the data
8 3. Import the transformed data to BASE
9
10
11The export will generate at least two files. One metadata file
12and one data file. Row and column annotation files can be created
13if the plug-in needs it. Additional data files can also be created
14if needed. This document only discusses the export part of the
15procedure. Note that reporter and assay annotation files are always
16needed if new spot data is going to be imported.
17
18The metadata file (export)
19==========================
20
21There are two BFS subtypes:
22
23* matrix: One data file is required for each value/formula to
24 export. The columns in the data files represents assays.
25
26* serial: One data file is required for each assay. The columns
27 in the data files represents values/formulas.
28
29Files
30-----
31
32For both subtypes a [files] section is used to name the files holding
33data and annotations. The following keys should be used:
34
35 * rdata: The filename of the file containing reporter annotations
36 * pdata: The filename of the file containing assay annotations
37 * sdata1,...,sdataN: N entries numbered from 1 to N with the filenames
38 of the files containing spot data. If the 'serial' subtype is used there
39 should be one file for each assay in the bioassay set. If the 'matrix'
40 subtype is used there should be one file for each entry in the [sdata]
41 section (see the 'Spot data' section below).
42
43The rdata and pdata files are optional. Other custom files may be included.
44It is recommended that custom file entries use 'x-' as a prefix to avoid
45key clashes in future version.
46
47Example:
48
49[files]
50rdata rdata.txt
51pdata pdata.txt
52sdata1 sdata1.txt
53sdata2 sdata2.txt
54x-custom custom.txt
55
56
57Spot data
58---------
59
60The [sdata] section contains metadata about the spot data that has been
61exported. The order in this section is important.
62
63If the 'matrix' subtype is used the order must match the 'sdataX'
64entries in the [files] section. Eg. the data that corresponds to the
65first line in this section is found in the 'sdata1' file. The number
66of entries in this section must be the same as the number of 'sdataX'
67entries in the [files] section.
68
69If the 'serial' subtype is used the order must match the column order
70inside each of the 'sdataX' files. Eg. the data that corresponds to the
71first line in this section is found in the first column in all 'sdata'
72files. The number of entries in this section must match the number of
73column in the 'sdata' files.
74
75The key of each line is the name or title of the data that is exported.
76The value describes the data type and can be either 'text', 'float' or
77'int'.
78
79Example:
80
81[sdata]
82Ch 1 float
83Ch 2 float
84Weight float
85Flag int
86
87
88Plug-in parameters
89------------------
90
91The [parameters] section contains extra parameters needed by the plug-in.
92Keys and values are defined by the plug-in/job configuration. Duplicate
93keys are not allowed, and order is not important. Multiple values for the
94same parameter are separated with a tab character.
95
96Example:
97
98[parameters]
99beta 0.5
100length 100
101vector 10 10.3 23
102median true
103
104
105Reporter annotation file (export)
106=================================
107
108The file used for reporter annotations is given by the 'rdata' entry in the
109[files] section. This file is optional. If it is used the only required
110column is the ID column. In this case, the ID column holds the internal
111bioassay set 'position' value. All sdata files should have the same number
112of rows as this file (not counting the header line) and data should be sorted
113in the same order.
114
115Additional columns may be included in the export.
116
117Note that the same underlying reporter may be assigned to more than one
118position. If the plug-in needs to operate on merged-per-reporter data
119the export should include either the internal or external reporter id in
120an additional column and use this information to determine what should
121be merged.
122
123
124Assay annotation file (export)
125==============================
126
127The file used for assay annotations is given by the 'pdata' entry in the
128[files] section. This file is optional. If it is used, the only required
129column is the ID column. In this case, the ID column holds the internal
130bioassay 'id' value.
131
132If the 'matrix' subtype is used the columns in the sdata files must be in
133the same order as the assays appear in this file. The number of columns in
134the sdata files must be the same as the number of rows in this file (not
135counting the header line).
136
137If the 'serial' subtype is used, the 'sdata1' file has data for the assay
138that is described in the first line in this file, the 'sdata2' file has data
139for the second assay, etc. The number of data files must match the number of
140lines in this file.
141
142Additional columns may be included in the export.
143
144Data files (export)
145===================
146
147Data files contains data in matrix format. More than one data file may be
148required. The organisation of the data depends on the format subtype. In
149both subtypes the number and order of the rows must match the information
150in the reporter annotation file.
151
152If the 'matrix' subtype is used, the columns corresponds to assays. The
153number of columns and their order must match the lines in the assays
154annotation file. The number of sdata files and their content is defined
155by the entries in the [sdata] section.
156
157If the 'serial' subtype is used, the the number of columns and their order
158must match the entries in the [sdata] section. Each sdata file has data from
159one assay. The number of sdata files in the [files] section must match the
160number of lines in the assays annotation file. Eg. spot data for the assay
161on the first line is found in the file referenced in the 'sdata1' entry.
162
163