1 | This document discuss some generic rules and guidelines for
|
---|
2 | formatting and parsing files using the BFS format. Specific use
|
---|
3 | cases for BFS files are likely to define additional rules,
|
---|
4 | particularly with regards to the metadata file. The only current
|
---|
5 | use case we have in mind is to use BFS for passing data to and from
|
---|
6 | external plug-ins. In the future the BFS format may be used for other
|
---|
7 | use cases.
|
---|
8 |
|
---|
9 | We define three different file types in BFS:
|
---|
10 |
|
---|
11 | * Metadata files
|
---|
12 | * Annotation files
|
---|
13 | * Data files
|
---|
14 |
|
---|
15 |
|
---|
16 | Common to all files
|
---|
17 | ====================
|
---|
18 |
|
---|
19 | All files are text-based and uses the UTF-8 character encoding.
|
---|
20 |
|
---|
21 | A newline character (\n) is used as a record separator and
|
---|
22 | a tab character (\t) is used as a column separator.
|
---|
23 |
|
---|
24 | Escape sequences
|
---|
25 | ----------------
|
---|
26 |
|
---|
27 | Data that contains tabs or newlines needs to be escaped. We will use
|
---|
28 | a backslash (\) to indicate the start of an escaped sequence. This means
|
---|
29 | that a backslash must also be escaped. Since some editors includes a
|
---|
30 | carriage return in line breaks breaks, we should also escape carriage
|
---|
31 | return (\r).
|
---|
32 |
|
---|
33 | Here is the very simple escape table:
|
---|
34 | <backslash> --> \\
|
---|
35 | <newline> --> \n
|
---|
36 | <carriage return> --> \r
|
---|
37 | <tab> --> \t
|
---|
38 |
|
---|
39 | It is recommended that parsers are forgiving and if an invalid escape
|
---|
40 | sequence is found, eg. a backslash followed by anything else than
|
---|
41 | \, n, r or t, the input is taken literally. Strict parsers may throw
|
---|
42 | exceptions of log warning messages.
|
---|
43 |
|
---|
44 |
|
---|
45 | Numerical values
|
---|
46 | ----------------
|
---|
47 |
|
---|
48 | Numeric values should use dot (.) as decimal point. Scientific notation
|
---|
49 | is accepted. Null, NaN, Infinity, and other special values should all
|
---|
50 | be represented by empty string values. It is recommended that parsers
|
---|
51 | are forgiving if invalid numerical data is found.
|
---|
52 |
|
---|
53 | Comments, etc.
|
---|
54 | --------------
|
---|
55 | Lines starting with '#' are comment lines and should be ignored.
|
---|
56 |
|
---|
57 | Empty lines (=lines with only white-space) should be ignored.
|
---|
58 |
|
---|
59 | White-space: space, tabs and other characters that matches '\s'
|
---|
60 | in regular expressions.
|
---|
61 |
|
---|
62 |
|
---|
63 |
|
---|
64 | Metadata file
|
---|
65 | =============
|
---|
66 |
|
---|
67 | The metadata file contains information about the other files
|
---|
68 | in the file-set. It can also contain information that is specific
|
---|
69 | for each use case. This file contains key-value pairs in multiple
|
---|
70 | sections.
|
---|
71 |
|
---|
72 |
|
---|
73 | Beginning-of-file (BOF) marker
|
---|
74 | ------------------------------
|
---|
75 | A BFS metadata file should start with the string 'BFSformat',
|
---|
76 | optionally followed by a tab and a value. This must be the
|
---|
77 | first line in the file. The value is used as an indication of
|
---|
78 | the sub-type of the file.
|
---|
79 |
|
---|
80 |
|
---|
81 | Sections
|
---|
82 | --------
|
---|
83 | A section is started by surrounding a value in brackets.
|
---|
84 | Eg. [my section]
|
---|
85 |
|
---|
86 | The is no restriction on the name of the section as long as it is
|
---|
87 | escaped using the normal rules. Note that there is no need to escape
|
---|
88 | brackets in the name, eg. [[a,b]] is a valid section with the name
|
---|
89 | '[a,b]'. Trailing white-space should be ignored.
|
---|
90 |
|
---|
91 | Multiple sections may have the same name and the order of sections
|
---|
92 | should not matter. However, this may be restricted in specific use
|
---|
93 | cases, which may require that section names are unique or come in a
|
---|
94 | specific order.
|
---|
95 |
|
---|
96 | Generic parsers are recommended to provide access to sections by name
|
---|
97 | and by ordinal number, starting at 0. Generic writers are recommended
|
---|
98 | to write sections in the order they are added.
|
---|
99 |
|
---|
100 |
|
---|
101 | Section entries
|
---|
102 | ---------------
|
---|
103 | Each section contains data in the form of tab-separated key-value
|
---|
104 | pairs. Keys may not start with # or [, since this would interfere
|
---|
105 | with comments and sections. Otherwise, the normal escape rules are
|
---|
106 | used. Values should also use the normal escape rules, except that
|
---|
107 | non-escaped tab characters are allowed. This makes it possible to
|
---|
108 | use vector-type values.
|
---|
109 |
|
---|
110 | A key doesn't have to be unique within a section. But this may be
|
---|
111 | limited by specific use cases globally or on a section-by-section
|
---|
112 | basis. The order of the keys are usually not important, but some use
|
---|
113 | cases may need to preserve the order.
|
---|
114 |
|
---|
115 | Generic reader implementations are recommended to provide access to
|
---|
116 | keys by name and by ordinal number, starting at 0. Generic writers
|
---|
117 | implementations are recommended to write keys and values in the order
|
---|
118 | they are added to each section.
|
---|
119 |
|
---|
120 |
|
---|
121 | Pre-defined sections and keys
|
---|
122 | -----------------------------
|
---|
123 |
|
---|
124 | If the file-set includes more files than the metadata file, a 'files'
|
---|
125 | section is required that specifies the other files. Keys may have any
|
---|
126 | name and it is recommended that each key is unique. The value is the
|
---|
127 | filename.
|
---|
128 |
|
---|
129 | [files]
|
---|
130 | file-1 abc123.txt
|
---|
131 | file-2 def456.txt
|
---|
132 | file-3 ghi789.txt
|
---|
133 |
|
---|
134 | The files are expected to be located in the same 'directory' as the current
|
---|
135 | metadata file. A directory may be a folder in the file system, a zip-file,
|
---|
136 | or a similar container. Metadata about the file types and file content is
|
---|
137 | not part of the generic specification. Specific use cases may define
|
---|
138 | additional sections for holding metadata about the file content.
|
---|
139 |
|
---|
140 | Note! The files doesn't have to be BFS type files. They can be image files,
|
---|
141 | pdf files, etc.
|
---|
142 |
|
---|
143 |
|
---|
144 | Annotation files
|
---|
145 | ================
|
---|
146 |
|
---|
147 | The first line is a header line containing the column names for each column.
|
---|
148 | The first column is required and must be 'ID'. Other columns are optional,
|
---|
149 | but must must have unique names. Column names are separated with tabs and
|
---|
150 | are encoded using the normal rules.
|
---|
151 |
|
---|
152 | All other lines are data lines. Each line must have exactly the same number
|
---|
153 | of columns as the header line.
|
---|
154 |
|
---|
155 | Comment lines are not supported.
|
---|
156 |
|
---|
157 | The ID column holds a unique identifier used internally by BASE. A given ID
|
---|
158 | should only be used once and may not be repeated later in the file. The ID
|
---|
159 | is a numeric positive integer value. Zero and negative values are not allowed.
|
---|
160 | There is no special ordering (unless a specific use-case require this). Note
|
---|
161 | that the ID values are not coordinates. They don't have to start at 1 and there
|
---|
162 | may be "holes" in the range of values used. Some use-cases may use ID values
|
---|
163 | with some specific meaning, other use-cases may simple enumerate the rows using
|
---|
164 | a counter.
|
---|
165 |
|
---|
166 |
|
---|
167 | Data files
|
---|
168 | ==========
|
---|
169 |
|
---|
170 | A single data file is a matrix containing one data value for each row-column
|
---|
171 | element.
|
---|
172 |
|
---|
173 | Data starts on the first line. There is no header line.
|
---|
174 |
|
---|
175 | All data lines should have the same number of columns. The number of rows and
|
---|
176 | columns and their order are defined by other, use-case specfic, information in
|
---|
177 | the metadata file or in annotation file(s).
|
---|
178 |
|
---|
179 | Comment lines are not supported.
|
---|
180 |
|
---|