Appendix E. Platforms and raw-data-types.xml reference

Appendix E. Platforms and raw-data-types.xml reference
Prev	Part VI. Appendix	Next

Table of Contents

E.1. Default platforms/variants installed with BASE
E.2. raw-data-types.xml reference

Raw data can be stored either as files attached to items and/or in the database. The Platform item has information about this. For more information see Section 29.3.1, “Using files to store data”.

E.1. Default platforms/variants installed with BASE

Platform		Variants		Data file types
Name	ID	Name	ID	Item	Name	ID
Generic	generic	-	-	Array design	Reporter map	generic.reportermap
				Array design	Print map	generic.printmap
				Raw bioassay	Generic raw data	generic.rawdata
Affymetrix	affymetrix	-	-	Array design	CDF file	affymetrix.cdf
Affymetrix	affymetrix	-	-	Raw bioassay	CEL file	affymetrix.cel

E.2. raw-data-types.xml reference

A given platform either supports importing data to the database or it doesn't. If it supports import, it may be locked to specific raw data type or it may use any raw data type. Among the default platforms installed with BASE, the Affymetrix platform doesn't support importing data while the Generic platform supports importing to any raw data type.

Raw data types are defined in the raw-data-types.xml file. This file is located in the <basedir>/www/WEB-INF/classes directory and contains information about the database tables and columns to use for storing raw data. BASE ships with default raw data types for many different microarray platforms, including Genepix, Agilent and Illumina.

If you want your BASE installation to be configured differently we recommend that you do it before the first initialisation of the database. It is possible to change the configuration of an existing BASE installation but it requires manual updates to the database. Following procedure covers how to update:

Shut down the BASE web server. If you have installed job agents you should shut down them as well.
Modify the raw-data-types.xml file. If you have installed job agents, make sure they all have the same version as the web server.

Run the updatedb.sh script. Tables for new raw data types and new columns for existing raw data types automatically be created, but the script can't delete tables or columns that have been removed, or modify columns that have changed datatype. You will have to do these kind of changes by manually executing SQL against your database. Check your database documentation for information about SQL syntax.

	Create a parallel installation
	You can always create a new temporary parallel installation to check what the table generated by installation script looks like. Compare the new table to the existing one and make sure they match.

Start up the BASE web server and job agents, if any, again.

	Start with few columns
	It is better to start with too few columns, since it is easier to add more columns than it is to remove columns that are not needed.

Format of the raw-data-types.xml file

The following example will serve as a description of the format used in raw-data-types.xml:


<?xml version="1.0" ?>
<?xml-stylesheet type="text/xsl" href="raw-data-types.xsl"?>
<!DOCTYPE raw-data-types SYSTEM "raw-data-types.dtd" >
<raw-data-types>
   <raw-data-type
      id="genepix"
      name="GenePix"
      channels="2"
      table="RawDataGenePix"
      >
      <property
         name="diameter"
         title="Spot diameter"
         description="The diameter of the spot in µm"
         column="diameter"
         type="float"
      />
      <property
         name="ch1FgMedian"
         title="Channel 1 foreground median"
         description="The median of the foreground intensity in channel 1"
         column="ch1_fg_median"
         type="float"
         channel="1"
      />
      <!-- skipped a lot of properties -->
      <intensity-formula
         name="mean"
         title="Mean FG - Mean BG"
         description="Subtract mean background from mean foreground"
         >
         <formula 
            channel="1"
            expression="raw('ch1FgMean') - raw('ch1BgMean')"
         />
         <formula 
            channel="2"
            expression="raw('ch2FgMean') - raw('ch2BgMean')"
         />
      </intensity-formula>
      <!-- and a few more... --->
   </raw-data-type>
</raw-data-types>

Each raw data type is represented by a <raw-data-type> tag. The following attributes can be used:

Table E.1. Attributes for the <raw-data-type> tag

Attribute	Required	Comment
id	yes	A unique ID of the raw data type. It should contain only letters, numbers and underscores and the first character must be a letter.
name	yes	A unique name of the raw data type. The name is usually used by client applications for display.
table	yes	The name of the database table to store data in. The table name must be unique and can only contain letters, numbers and underscores. The first character must be a letter.
channels	yes	The number of channels used by this raw data type. It must be a number > 0.
description	no	An optional (longer) description of the raw data type.

Following the <raw-data-type> tag is one or more <property> tags. Each one defines a column in the database that is designed to hold data values of a particular type. The following attributes can be used on this tag:

Table E.2. Attributes for the <property> tag

Attribute	Required	Comment
*		All attributes defined by the `<property>` tag in `extended-properties.xml`. See Table D.1, “Attributes for the `<property>` tag”.
channels	no	The channel number the property belongs to. Allowed values are 0 to the number of channels specified for the raw data type. If the property doesn't belong to any channels set the value to 0 or leave it unspecified.

Following the <property> tags comes 0 or more <intensity-formula> tags. Each one defines mathematical formulas that can be used to calculate the intensity values from the raw data. In the Genepix case, there are several formulas which differs in the way background is subtracted from foreground intensity values. For other raw data types, the intensity formula may just copy one of the raw data values.

The intensity formulas are installed as Formula items in the database. This means that you can manually add, change or remove intensity formulas directly from the web interface. The intensity formulas in the raw-data-types.xml file are only used at installation time.

The <intensity-formula> tag has the following attributes:

Table E.3. Attributes for the <intensity-formula> tag

Attribute	Required	Comment
name	yes	A unique name for the formula. This is only used during installation.
title	yes	The title of the formula. This is used by client applications for display.
description	no	An optional, longer, description of the formula.

The <intensity-formula> must contain one <formula> tag for each channel of the raw data type. The attributes of this tag are:

Table E.4. Attributes for the <formula> tag

Attribute	Required	Comment
channel	yes	The channel number. One tag for each channel must be specified. No duplicates are allowed.
expression	yes	The mathematical expression used to calculate the intensities. The expression is parsed with the `Jep` parser. It supports the common mathematical operations such as +, -, *, /, some mathematical function like, log2(), ln(), sqrt(), etc. See the API documentation for Jep for more information. You can also use two special function developed specifically for this case: raw(name): Get the value from the raw data property with the given name, for example: `raw('ch1FgMedian')`. mean(name): Get the mean value of the raw data property with the given name, for example: `mean('ch1BgMean')`. The mean is calculated from all raw data spots in the raw bioassay.

Prev	Up	Next
Appendix D. extended-properties.xml reference	Home	Appendix F. web.xml reference