Appendix D. Platforms and raw-data-types.xml reference

Table of Contents

D.1. Default platforms and variants installed with BASE
D.2. raw-data-types.xml reference

Raw data can be stored either as files attached to items and/or in the database. The Platform item has information about this. For more information see Section 29.3.9, “Using files to store data”.

D.1. Default platforms and variants installed with BASE

Platform Variants Data file types
Name ID Name ID Item Name ID
Generic generic - - Array design Reporter map generic.reportermap
Print map generic.printmap
Raw bioassay Generic raw data generic.rawdata
Affymetrix affymetrix - - Array design CDF file affymetrix.cdf
Raw bioassay CEL file affymetrix.cel
Sequencing sequencing Expression-like sequencing.expression Array design GTF ref-seq file refseq.gtf
Raw bioassay FPKM tracking file sequencing.fpkm_tracking

D.2. raw-data-types.xml reference

A given platform either supports importing data to the database or it doesn't. If it supports import, it may be locked to specific raw data type or it may use any raw data type. Among the default platforms installed with BASE, the Affymetrix platform doesn't support importing data while the Generic platform supports importing to any raw data type.

Raw data types are defined in the raw-data-types.xml file. This file is located in the <basedir>/www/WEB-INF/classes directory and contains information about the database tables and columns to use for storing raw data. BASE ships with default raw data types for many different microarray platforms, including Genepix, Agilent and Illumina.

[Tip] Tip

It is also possible to put additional raw data type definitions in the <basedir>/www/WEB-INF/classes/raw-data-types subdirectory. BASE will merge all *.xml it finds with the main raw-data-types.xml file. The extra configuration files should have the same format as the main raw-data-types.xml file. Duplicate raw data types are not supported and it is not possible to add extra columns to existing types using this approach.

If you want your BASE installation to be configured differently we recommend that you do it before the first initialisation of the database. It is possible to change the configuration of an existing BASE installation but it requires manual updates to the database. Following procedure covers how to update:

  1. Shut down the BASE web server. If you have installed job agents you should shut down them as well.

  2. Modify the raw-data-types.xml file or create a new file in the raw-data-types subdirectory. If you have installed job agents, make sure they all have the same version as the web server.

  3. Run the updatedb.sh script. Tables for new raw data types and new columns for existing raw data types automatically be created, but the script can't delete tables or columns that have been removed, or modify columns that have changed datatype. You will have to do these kind of changes by manually executing SQL against your database. Check your database documentation for information about SQL syntax.

    [Tip] Create a parallel installation

    You can always create a new temporary parallel installation to check what the table generated by installation script looks like. Compare the new table to the existing one and make sure they match.

  4. Start up the BASE web server and job agents, if any, again.

[Tip] Start with few columns

It is better to start with too few columns, since it is easier to add more columns than it is to remove columns that are not needed.

Format of the raw-data-types.xml file

The following example will serve as a description of the format used in raw-data-types.xml:


<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="raw-data-types.xsl"?>
<!DOCTYPE raw-data-types SYSTEM "raw-data-types.dtd" >
<raw-data-types>
   <raw-data-type
      id="genepix"
      name="GenePix"
      channels="2"
      table="RawDataGenePix"
      >
      <property
         name="diameter"
         title="Spot diameter"
         description="The diameter of the spot in µm"
         column="diameter"
         type="float"
      />
      <property
         name="ch1FgMedian"
         title="Channel 1 foreground median"
         description="The median of the foreground intensity in channel 1"
         column="ch1_fg_median"
         type="float"
         channel="1"
      />
      <!-- skipped a lot of properties -->
      <intensity-formula
         name="mean"
         title="Mean FG - Mean BG"
         description="Subtract mean background from mean foreground"
         >
         <formula 
            channel="1"
            expression="raw('ch1FgMean') - raw('ch1BgMean')"
         />
         <formula 
            channel="2"
            expression="raw('ch2FgMean') - raw('ch2BgMean')"
         />
      </intensity-formula>
      <!-- and a few more... --->
   </raw-data-type>
</raw-data-types>
	

Each raw data type is represented by a <raw-data-type> tag. The following attributes can be used:

Table D.1. Attributes for the <raw-data-type> tag

Attribute Required Comment
id yes A unique ID of the raw data type. It should contain only letters, numbers and underscores and the first character must be a letter.
name yes A unique name of the raw data type. The name is usually used by client applications for display.
table yes The name of the database table to store data in. The table name must be unique and can only contain letters, numbers and underscores. The first character must be a letter.
channels yes The number of channels used by this raw data type. It must be a number > 0.
description no An optional (longer) description of the raw data type.

Following the <raw-data-type> tag is one or more <property> tags. Each one defines a column in the database that is designed to hold data values of a particular type. The following attributes can be used on this tag:

Table D.2. Attributes for the <property> tag

Attribute Required Comment
*   All attributes defined by the <property> tag in extended-properties.xml. See Table C.1, “Attributes for the <property> tag”.
channels no The channel number the property belongs to. Allowed values are 0 to the number of channels specified for the raw data type. If the property doesn't belong to any channels set the value to 0 or leave it unspecified.

Following the <property> tags comes 0 or more <intensity-formula> tags. Each one defines mathematical formulas that can be used to calculate the intensity values from the raw data. In the Genepix case, there are several formulas which differs in the way background is subtracted from foreground intensity values. For other raw data types, the intensity formula may just copy one of the raw data values.

The intensity formulas are installed as Formula items in the database. This means that you can manually add, change or remove intensity formulas directly from the web interface. The intensity formulas in the raw-data-types.xml file are only used at installation time.

The <intensity-formula> tag has the following attributes:

Table D.3. Attributes for the <intensity-formula> tag

Attribute Required Comment
name yes A unique name for the formula. This is only used during installation.
title yes The title of the formula. This is used by client applications for display.
description no An optional, longer, description of the formula.

The <intensity-formula> must contain one <formula> tag for each channel of the raw data type. The attributes of this tag are:

Table D.4. Attributes for the <formula> tag

Attribute Required Comment
channel yes The channel number. One tag for each channel must be specified. No duplicates are allowed.
expression yes The mathematical expression used to calculate the intensities. The expression is parsed with the Jep parser. It supports the common mathematical operations such as +, -, *, /, some mathematical function like, log2(), ln(), sqrt(), etc. See the API documentation for Jep for more information. You can also use two special function developed specifically for this case:
  • raw(name): Get the value from the raw data property with the given name, for example: raw('ch1FgMedian').

  • mean(name): Get the mean value of the raw data property with the given name, for example: mean('ch1BgMean'). The mean is calculated from all raw data spots in the raw bioassay.