Context Navigation

Back to Ticket #1440

Ticket #1440: bfs-spotdata-import-1.txt

File bfs-spotdata-import-1.txt, 7.1 KB (added by Nicklas Nordborg, 14 years ago)
Specification for BFS with spotdata that is imported to BASE

Line
1
2	This document describes how the BFS format is used with bioassay spot
3	data when communicating with plug-ins.
4
5	A typical plug-in execution sequence is:
6	1. Export current data to BFS
7	2. Execute the plug-in which processes the data
8	3. Import the transformed data to BASE
9
10
11	This document discusses the import part of the procedure.
12
13	The import takes place after a plug-in has taken some kind of action on
14	the exported data and generated one or more output files. BASE can import
15	the following type of information:
16
17	* Intensity values (logged or non-logged). One value for each channel is
18	required.
19	* Extra values. As many as the plug-in generates.
20	* Reporter lists. As many as the plug-in generates.
21
22	The number of files needed and where to place the information depends on
23	the subtype (matrix or serial) that is used. A plug-in should output the
24	same subtype as it got for input.
25
26	A plug-in can also generate any other type of file, for example, images,
27	pdf files, etc. These files are only uploaded to BASE and attached to the
28	new bioassay set.
29
30	The plug-in must create a metadata file so that the importer knows what
31	to look for.
32
33	The metadata file (import)
34	==========================
35
36	There are two BFS subtypes:
37
38	* matrix: One data file is required for each value/formula to
39	import. The columns in the data files represents assays.
40
41	* serial: One data file is required for each assay. The columns
42	in the data files represents values/formulas.
43
44	Files
45	-----
46
47	The [files] section is used to name the data files. The following
48	entries are recognised and required:
49
50	* rdata: The filename of a file containing new reporter information. The ID
51	column is always the position number which must be a unique positive
52	integer. Additional columns may be required depending on the import
53	settings.
54	* pdata: The filename of the file containing new assay information.
55	The ID column is in most cases the ID of the parent assay, but
56	if the 'multi-assay-parents' setting has been enabled, the ID can be any
57	positive unique integer, and the Parent ID column holds a list of
58	parent ID:s.
59	* sdata1,...,sdataN: N entries numbered from 1 to N with the filenames
60	of the files containing spot data. If the 'serial' subtype is used there
61	should be one file for each assay in the bioassay set. If the 'matrix'
62	subtype is used there should be one file for each entry in the [sdata]
63	section.
64
65	Additionally, all entries starting with 'x-' are considered to be extra files
66	that should be uploaded to BASE and attached to the new bioassay set.
67
68	Settings
69	--------
70
71	The [settings] is used to control some aspects of the import. The following
72	settings have been defined:
73
74	* new-data-cube: If a value of '1' is specified the data is imported into a
75	new data cube. A new data cube is needed whenever the position/reporter
76	mapping has been changed or when parent assays have been merged. When a
77	new data cube is used the 'rdata' file needs one of 'Internal ID' or
78	'External ID' columns so that the importer can map that position to a
79	reporter.
80	* multi-assay-parents: If a value of '1' is specified, it indicates that child
81	assays may have more than one parent assay (eg. due to a merge). A new
82	data cube is needed and this setting is ignored, unless also the
83	'new-data-cube' settings has been enabled. The 'pdata' file must have
84	'Parent ID' column that holds a comma-separated list with the ID:s of the
85	parent assays.
86	* transform: If not specified, the child spot data is assumed to use the same
87	intensity transform as the parent data. The values to choose from are: NONE,
88	LOG2, LOG10.
89
90	Spot data
91	---------
92
93	The [sdata] section contains metadata about the spot data (intensity values
94	and spot extra values) that the plug-in generated. The order in this section
95	is important.
96
97	If the 'matrix' subtype is used the order must correspond to the 'sdataX'
98	entries in the [files] section. Eg. The file named for key 'sdata1' is data
99	for the first entry in this section.
100
101	If the 'serial' subtype is used the order must correspond to the column
102	order inside each of the 'sdataX' files. Eg. the first column is data for
103	the first entry in this section.
104
105	Entries with keys like 'Ch 1', 'Ch 2', etc. are reserved and corresponds to
106	channel intensities. There must be exactly one entry for each channel in the
107	experiment.
108
109	Data values are always float values but they may be logged. This is conrolled
110	by the 'transform' settings. All intensities must use the same intensity
111	transform.
112
113	Entries starting with 'x-' are extra values. The values are either in separate
114	files (matrix subtype) or in their own columns (serial subtype). The value is
115	the data type of the extra value. Allowed values are: 'text', 'float' and 'int'.
116	The part of the key after 'x-' should be the name or external id of an already
117	existing extra value type.
118
119	Example:
120
121	[sdata]
122	ch1 float
123	ch2 float
124	x-abc float
125
126
127	Reporter annotation file (import)
128	=================================
129
130	This file is used to link spot data with the correct positions in the bioassay
131	set. Required columns depends on if data is imported to the same data cube as
132	the parent or not.
133
134	* ID: The position numbers. This column is always needed. Values must be
135	positive integers and duplicates are not allowed. The order doesn't matter.
136	Since the position number has no specific meaning, we recommend that plug-
137	ins that generate data for a new data cube simply start at 1 and then
138	increment the value for each line.
139	* Internal ID or External ID: Either the internal or external id:s of the
140	reporter that is assigned to the given position. At least one of those
141	columns are needed when importing data to a new data cube. The same reporter
142	may be assigned to more than one position and the reporter must already
143	exist in BASE.
144
145	All sdata files should have the same number of rows (not counting the header
146	line) as this file.
147
148	Assay annotation file (import)
149	==============================
150
151	This file is used to link spot data with the correct child assay. This file
152	should have one entry for each child bioassay that should be created.
153
154	* ID: Either the ID of a parent assay or a unique positive integer. This
155	column is always needed. If the 'multi-assay-parents' option is enabled
156	there is no special meaning to the value, otherwise the ID must be the
157	ID of the parent assay.
158	* Name: An optional column. If present, the child assay will be given the
159	specified name. Otherwise a name is automatically generated. Typically
160	the same as the parent assay.
161	* Parent ID: Required if 'multi-assay-parents' is enabled. The value is a
162	comma-separated list of parent assay ID:s.
163
164	If the 'serial' subtype is used, the number of lines in this file should match
165	the number of 'sdataX' entries in the [files] section. Data for the assay on
166	the first line is found in the file specified by sdata1 and so on.
167
168	If the 'matrix' subtype is used, the number of lines in this file should match
169	the number of columns in each of the 'sdataX' files. Data for the assay on the
170	first line is found in the first column in each data file and so on.
171
172	Data files (import)
173	===================
174
175	Data files should follow the same rules as exported data files.

Download in other formats:

Original Format