This document discuss some generic rules and guidelines for formatting and parsing files using the BFS format. Specific use cases for BFS files are likely to define additional rules, particularly with regards to the metadata file. The only current use case we have in mind is to use BFS for passing data to and from external plug-ins. In the future the BFS format may be used for other use cases. We define three different file types in BFS: * Metadata files * Annotation files * Data files Common to all files ==================== All files are text-based and uses the UTF-8 character encoding. A newline character (\n) is used as a record separator and a tab character (\t) is used as a column separator. Escape sequences ---------------- Data that contains tabs or newlines needs to be escaped. We will use a backslash (\) to indicate the start of an escaped sequence. This means that a backslash must also be escaped. Since some editors includes a carriage return in line breaks breaks, we should also escape carriage return (\r). Here is the very simple escape table: --> \\ --> \n --> \r --> \t It is recommended that parsers are forgiving and if an invalid escape sequence is found, eg. a backslash followed by anything else than \, n, r or t, the input is taken literally. Strict parsers may throw exceptions of log warning messages. Numerical values ---------------- Numeric values should use dot (.) as decimal point. Scientific notation is accepted. Null, NaN, Infinity, and other special values should all be represented by empty string values. It is recommended that parsers are forgiving if invalid numerical data is found. Comments, etc. -------------- Lines starting with '#' are comment lines and should be ignored. Empty lines (=lines with only white-space) should be ignored. White-space: space, tabs and other characters that matches '\s' in regular expressions. Metadata file ============= The metadata file contains information about the other files in the file-set. It can also contain information that is specific for each use case. This file contains key-value pairs in multiple sections. Beginning-of-file (BOF) marker ------------------------------ A BFS metadata file should start with the string 'BFSformat', optionally followed by a tab and a value. This must be the first line in the file. The value is used as an indication of the sub-type of the file. Sections -------- A section is started by surrounding a value in brackets. Eg. [my section] The is no restriction on the name of the section as long as it is escaped using the normal rules. Note that there is no need to escape brackets in the name, eg. [[a,b]] is a valid section with the name '[a,b]'. Trailing white-space should be ignored. Multiple sections may have the same name and the order of sections should not matter. However, this may be restricted in specific use cases, which may require that section names are unique or come in a specific order. Generic parsers are recommended to provide access to sections by name and by ordinal number, starting at 0. Generic writers are recommended to write sections in the order they are added. Section entries --------------- Each section contains data in the form of tab-separated key-value pairs. Keys may not start with # or [, since this would interfere with comments and sections. Otherwise, the normal escape rules are used. Values should also use the normal escape rules, except that non-escaped tab characters are allowed. This makes it possible to use vector-type values. A key doesn't have to be unique within a section. But this may be limited by specific use cases globally or on a section-by-section basis. The order of the keys are usually not important, but some use cases may need to preserve the order. Generic reader implementations are recommended to provide access to keys by name and by ordinal number, starting at 0. Generic writers implementations are recommended to write keys and values in the order they are added to each section. Pre-defined sections and keys ----------------------------- If the file-set includes more files than the metadata file, a 'files' section is required that specifies the other files. Keys may have any name and it is recommended that each key is unique. The value is the filename. [files] file-1 abc123.txt file-2 def456.txt file-3 ghi789.txt The files are expected to be located in the same 'directory' as the current metadata file. A directory may be a folder in the file system, a zip-file, or a similar container. Metadata about the file types and file content is not part of the generic specification. Specific use cases may define additional sections for holding metadata about the file content. Note! The files doesn't have to be BFS type files. They can be image files, pdf files, etc. Annotation files ================ The first line is a header line containing the column names for each column. The first column is required and must be 'ID'. Other columns are optional, but must must have unique names. Column names are separated with tabs and are encoded using the normal rules. All other lines are data lines. Each line must have exactly the same number of columns as the header line. Comment lines are not supported. The ID column holds a unique identifier used internally by BASE. A given ID should only be used once and may not be repeated later in the file. The ID is a numeric positive integer value. Zero and negative values are not allowed. There is no special ordering (unless a specific use-case require this). Note that the ID values are not coordinates. They don't have to start at 1 and there may be "holes" in the range of values used. Some use-cases may use ID values with some specific meaning, other use-cases may simple enumerate the rows using a counter. Data files ========== A single data file is a matrix containing one data value for each row-column element. Data starts on the first line. There is no header line. All data lines should have the same number of columns. The number of rows and columns and their order are defined by other, use-case specfic, information in the metadata file or in annotation file(s). Comment lines are not supported.