Table of Contents
Raw data can be stored either as files attached to items and/or in
the database. The Platform
item has information about this. For more information see
Section 29.3.9, “Using files to store data”.
Platform | Variants | Data file types | ||||
---|---|---|---|---|---|---|
Name | ID | Name | ID | Item | Name | ID |
Generic | generic | - | - | Array design | Reporter map | generic.reportermap |
Print map | generic.printmap | |||||
Raw bioassay | Generic raw data | generic.rawdata | ||||
Affymetrix | affymetrix | - | - | Array design | CDF file | affymetrix.cdf |
Raw bioassay | CEL file | affymetrix.cel | ||||
Sequencing | sequencing | Expression-like | sequencing.expression | Array design | GTF ref-seq file | refseq.gtf |
Raw bioassay | FPKM tracking file | sequencing.fpkm_tracking |
A given platform either supports importing data to the database or it doesn't. If it supports import, it may be locked to specific raw data type or it may use any raw data type. Among the default platforms installed with BASE, the Affymetrix platform doesn't support importing data while the Generic platform supports importing to any raw data type.
Raw data types are defined in the raw-data-types.xml
file. This file is located in the <basedir>/www/WEB-INF/classes
directory and contains information about the database tables and columns to
use for storing raw data. BASE ships with default raw data types for many
different microarray platforms, including Genepix, Agilent and Illumina.
Tip | |
---|---|
It is also possible to put additional raw data type definitions in the
|
If you want your BASE installation to be configured differently we recommend that you do it before the first initialisation of the database. It is possible to change the configuration of an existing BASE installation but it requires manual updates to the database. Following procedure covers how to update:
Shut down the BASE web server. If you have installed job agents you should shut down them as well.
Modify the raw-data-types.xml
file or create a new file
in the raw-data-types
subdirectory. If you have installed
job agents, make sure they all have the same version as the web server.
Run the updatedb.sh
script. Tables for new raw data types
and new columns for existing raw data types automatically be created, but the script
can't delete tables or columns that have been removed, or modify columns that have
changed datatype. You will have to do these kind of changes by manually executing
SQL against your database. Check your database documentation for information about SQL syntax.
Create a parallel installation | |
---|---|
You can always create a new temporary parallel installation to check what the table generated by installation script looks like. Compare the new table to the existing one and make sure they match. |
Start up the BASE web server and job agents, if any, again.
Start with few columns | |
---|---|
It is better to start with too few columns, since it is easier to add more columns than it is to remove columns that are not needed. |
The following example will serve as a description of the format used in
raw-data-types.xml
:
<?xml version="1.0" ?> <?xml-stylesheet type="text/xsl" href="raw-data-types.xsl"?> <!DOCTYPE raw-data-types SYSTEM "raw-data-types.dtd" > <raw-data-types> <raw-data-type id="genepix" name="GenePix" channels="2" table="RawDataGenePix" > <property name="diameter" title="Spot diameter" description="The diameter of the spot in µm" column="diameter" type="float" /> <property name="ch1FgMedian" title="Channel 1 foreground median" description="The median of the foreground intensity in channel 1" column="ch1_fg_median" type="float" channel="1" /> <!-- skipped a lot of properties --> <intensity-formula name="mean" title="Mean FG - Mean BG" description="Subtract mean background from mean foreground" > <formula channel="1" expression="raw('ch1FgMean') - raw('ch1BgMean')" /> <formula channel="2" expression="raw('ch2FgMean') - raw('ch2BgMean')" /> </intensity-formula> <!-- and a few more... ---> </raw-data-type> </raw-data-types>
Each raw data type is represented by a <raw-data-type>
tag. The following attributes can be used:
Table D.1. Attributes for the <raw-data-type>
tag
Attribute | Required | Comment |
---|---|---|
id | yes | A unique ID of the raw data type. It should contain only letters, numbers and underscores and the first character must be a letter. |
name | yes | A unique name of the raw data type. The name is usually used by client applications for display. |
table | yes | The name of the database table to store data in. The table name must be unique and can only contain letters, numbers and underscores. The first character must be a letter. |
channels | yes | The number of channels used by this raw data type. It must be a number > 0. |
description | no | An optional (longer) description of the raw data type. |
Following the <raw-data-type>
tag
is one or more <property>
tags.
Each one defines a column in the database that is designed to hold
data values of a particular type. The following attributes can be used
on this tag:
Table D.2. Attributes for the <property>
tag
Attribute | Required | Comment |
---|---|---|
* |
All attributes defined by the
<property> tag in
extended-properties.xml . See
Table C.1, “Attributes for the <property> tag”.
|
|
channels | no | The channel number the property belongs to. Allowed values are 0 to the number of channels specified for the raw data type. If the property doesn't belong to any channels set the value to 0 or leave it unspecified. |
Following the <property>
tags comes 0
or more <intensity-formula>
tags.
Each one defines mathematical formulas that can be used to
calculate the intensity values from the raw data. In the Genepix case,
there are several formulas which differs in the way background is
subtracted from foreground intensity values. For other raw data
types, the intensity formula may just copy one of the raw data values.
The intensity formulas are installed as Formula
items in the database. This
means that you can manually add, change or remove intensity formulas directly
from the web interface. The intensity formulas in the raw-data-types.xml
file are only used at installation time.
The <intensity-formula>
tag has the following
attributes:
Table D.3. Attributes for the <intensity-formula>
tag
Attribute | Required | Comment |
---|---|---|
name | yes | A unique name for the formula. This is only used during installation. |
title | yes | The title of the formula. This is used by client applications for display. |
description | no | An optional, longer, description of the formula. |
The <intensity-formula>
must contain
one <formula>
tag for each channel
of the raw data type. The attributes of this tag are:
Table D.4. Attributes for the <formula>
tag