Chapter 2. BASE features

Table of Contents

2.1. Web interface
2.2. Information and annotation management
2.3. Data sharing and privacy
2.4. File and directory structure
2.5. Plugin and extension infrastructure
2.6. Batch upload and download of data
2.7. Supported array platforms and raw data formats
2.7.1. Vendor specific and custom printing array platforms
2.7.2. Available raw data types
2.8. Supported sequencing applications
2.9. Repository and standards

The BASE application features many components; MIAME compliance, multi-user, data sharing, data access management, array and biomaterial LIMS, multiple array platforms, RNAseq sequencing support, extensibility, configurable plug-ins, annotation customisation, streamlined access to analysis tools, integration of MultiExperiment Viewer (MeV), and more. To support all components the underlying relational database has grown to become very large and complex, especially since BASE itself works with objects posing additional database tables to keep track of objects stored in a relational database. Thus, rather than trying to describe every feature in detail here, we highlight some of the more important features.

2.1. Web interface

The entire system is accessed through a web-interface over the Internet using a standard web browser, such as Firefox, Safari, Opera, or Internet Explorer. Access privileges to a particular BASE installation are managed by personal accounts through the web-interface. A local administrator creates new user accounts with specific roles and access privileges and has an overall managerial responsibility for an individual BASE installation. With exception for the administrator with global data access, individual users have sole access to and control their inputted data. Users have the possibility to share data they own (or have share credentials for) to other users of the same BASE installation.

2.2. Information and annotation management

BASE features a biomaterial LIMS tracking biological material from its source to hybridisation/sequencing and ultimately to raw data and analysis. All events throughout sample handling are tracked and information on used and remaining quantities, physical sample locations, quality control information, and sample relations is stored in BASE. Racks or boxes holding biomaterials can be created as BioPlates and plate events are easily performed for extraction or labelling events. Although becoming less commonly used, the array production LIMS of previous BASE versions is retained to support researchers with spotting facilities, e.g., protein array production and BAC array printing that may not be commercially available.

Events in biomaterial and array LIMS are annotable with protocols and event dates, and most items can be annotated with customisable annotation types such as floats, integers, dates, and Boolean flags. Change history for biomaterial items is available if configured and can be used to track modifications in the database. Annotations are either free form or from a preset list of values, and can be marked as required for MIAME compliance. The annotation system is searchable and the user can select any annotations to be an experimental factors in analysis whereby it becomes available to analysis plugins and plot-tools.

2.3. Data sharing and privacy

One of the important features of BASE is its capabilities as a local data repository. The repository functionality is amended with data grouping, sharing, and privacy policies. A BASE project is used to group items (biomaterial, raw data, and experiments) into a logical entity, and a BASE experiment is a collection of bioassays, e.g., array data, grouped logically together for further analysis. All items can co-exist in several projects and experiments without any unnecessary copying of information.

Data privacy is guarded by the data owner and BASE allows the owner to set data access rules. To this end, each item in BASE is owned by a user enabling him to share data with colleagues. The grouping of data in projects allows the data owner to simply include other users in a project in order to share data. Each item can have different access levels even within a project, and project members can have different privileges. The data access rules are very flexible and can be overwhelming since access levels on almost any item can be individually set. However, using projects, the proper access levels can be set at a single point of interaction.

2.4. File and directory structure

BASE has an integrated file system to provide the possibility for researchers to collect all data files related to a project in one single storage location. Data files are uploaded using a web browser or an ftp client. The file storage is an integral part of a strategy to store all experiment relevant data in BASE, even data types not already supported in analysis. Collecting all data allows future reuse of the data as more data are produced, and new analysis tools becomes available.

2.5. Plugin and extension infrastructure

BASE features a hierarchically organised analysis interface that allows data filtering, normalisation, transformation, and other analyses. Parameters and settings are automatically stored for each step in the analysis. The selection of analysis tools depends on array type and available plug-ins where a wide range of tools are pre-installed with BASE, and optional plug-ins can be downloaded from the BASE plug-in site . BASE capitalise from other software tools, such as MEV, by integrating them into the user interface. Such integration provide streamlined access to analysis modules in external tools. BASE even features a rudimentary manual transform creator that enables researchers to add analysis steps within the hierarchical overview of analysis performed independently of BASE. The transform creator enables storage of result files and parameter information for archival, tracking, and sharing purposes.

The analysis of genomics data is continuously evolving with new methods and techniques. To this end BASE provides extensions and plug-in programming interfaces (APIs) to enable straightforward additions of new analysis tools. The use of the APIs is well documented and there are numerous examples on how to create extensions. The MEV and ftp-server integration all utilise the extension mechanism, and the automatically generated overview plots available in the experimental analysis view are also extensions. The plug-in API is used for all data imports and exports, and most analysis tools, providing new developers a lot of example code to examine when they create BASE plug-ins.

2.6. Batch upload and download of data

File, annotation, and item upload can be done asynchronously as data are generated or information becomes available. To relieve researchers from the tedious task of entering data one by one a set of batch import were created; the information generated throughout the experimental work is uploaded to BASE in plain tab-separated files. These files are supplied to batch importer plug-ins that parse the files and create items and associations according to the information in the files. The same plug-ins can be used to batch update many items. Similarly, annotating items is done by creating tab-separated files with annotation information, uploading these to BASE, and loading the file content into the database using annotation importers. If needed, annotations are easily updated with the same mechanism.

Files uploaded to BASE are stored in the directory structure within BASE and multiple files are easily transferred to BASE either packaged in compressed files with a single upload action, or by using an ftp client supporting transfer of file structures. Similarly, downloading multiple files is straightforward either using an ftp client or by a single click in the BASE web interface. Download of items is done through item listing views enabling users to filter and select what information should be downloaded.

2.7. Supported array platforms and raw data formats

There are many types of microarrays, techniques, and brands available for researchers; one- or two-channel hybridizations, spotted cDNA/oligo arrays, Affymetrix (GeneChip), Illumina (SNP, DASL, WGEX, microRNA), aCGH, SNP, tiling arrays, and many more. In addition expression data can be derived from sequencing data, i.e., RNASeq. Data is produced in different file formats that must be treated differently depending on type.

Many platforms and experimental setups are supported in downstream analysis but some microarray techniques cannot currently be analysed within BASE simply because lack of support in available plug-ins. The problem is resolved by creating new, or extending available, plug-ins that add analysis capabilities of platforms and techniques not readily supported in analysis. Extending analysis capabilities to new technologies is only a matter of local needs and resources. We add support for platforms in use at the Lund University microarray facility and make our tools freely available to the community.

For two channel array platforms it is straightforward to customise BASE for a specific array platform, the platform simply needs to be adapted to the (BASE) Generic platform. The adaptation is to create a raw data format definition and to configure raw data importers, or make use of already available raw data formats. However, it is not always possible to make an natural mapping of a platform to the Generic platform. Platforms such as Affymetrix and Illumina platforms cannot naturally be mapped on to the Generic two channel platform. For Affymetrix, BASE comes with a specific Affymetrix platform and Illumina can be supported by customising BASE (go to the Illumina package web site for more information on adding Illumina support to BASE).

How to adapt new array platforms to the Generic platform format or how to create a new platform type in BASE can be read elsewhere in this document. Here we list different array platforms used in BASE and also list raw data types supported by BASE. However, not all platforms nor raw data types listed below are available out-of-the box and a BASE administrator must customise his local BASE installation for their specific need. What comes pre-configured when BASE is installed is indicated in the lists below.

2.7.1. Vendor specific and custom printing array platforms

Not all array platforms listed below are available by default. The comments to specific platforms explain how to enable the use of the array platform in BASE. In some cases there is no confirmed usage of a platform but we believe it has been tested by anonymous users.

Affymetrix

The Affymetrix platform comes pre-configured with a new BASE installation. Affymetrix platform in this context are the Affymetrix expression arrays. So far there has been no reason for expanding the Array platform to other chip-types. In principle any Affymetrix chip type can be stored in BASE but current plug-ins will always assume that expression data is stored and analysed. This can be resolved by adding variants of the Affymetrix platform but the Lund BASE team currently has no plans to create Affymetrix variants.

Agilent

Custom printing

The array layout options are endless and imagination is the only limitation ... almost. BASE can import many in-house array designs and platforms. The custom arrays usually fall back on one of the raw data types already available such as GenePix.

Illumina

There are several variants of the Illumina platform. Using several variants allows BASE to adapt its handling of different Illumina chip types. Illumina platform support is not included in a standard BASE installation but there is a Illumina package available for seamless integration of the Illumina array platform to BASE.

ImaGene

No successful use confirmed but ImaGene raw data is available in BASE.

Sequencing

Expression data from sequencing experiments. Cufflinks raw-data type is available for expression values from sequencing experiments.

Unlisted

In principle any platform generating a matrix of data can be imported into BASE. Simply utilise the available raw data formats and data importers.

2.7.2. Available raw data types

Raw data comes in many different formats. These formats are usually defined by scanner software vendors and BASE must keep track of the different formats for analysis and plotting. BASE supports many formats out the box, but some formats need to be added manually by the BASE administrator (indicated in the list below).

Affymetrix

AIDA

Agilent

BZScan

ChipSkipper

Cufflinks

GenePix

GeneTAC

Illumina

The Illumina array platform usage is recommended to be based on the Illumina Bead Summary (IBS) raw data format below.

Illumina Bead Summary (IBS)

Not available in BASE directly but it is added with the Illumina plug-in that adds Illumina array platform support to BASE.

ImaGene

QuantArray Biotin

QuantArray Cy

SpotFinder

2.8. Supported sequencing applications

BASE was originally developed for management and analysis of array based data. Recent version, starting at version 3, have been adopted to support sequencing based data. Being a newly developed feature it is not as mature as the array part of BASE.

For sequencing data in general, BASE can be used for data management and sharing. BASE currently has extended support for sequencing applications such as RNAseq where data is transformed to gene expression measurements. For such applications array designs can be created based on gene structure defined in GTF formatted files. For example, GTF files for all RefSeqs or known genes.

2.9. Repository and standards

The Microarray Gene Expression Data Society (MGED) develops and maintains standards for data acquisition, representation, and interchange such as the MIAME guidelines, the MAGE-TAB interchange format, and the MGED Ontology for microarray experiments. BASE does not enforce the use of the MGED standards but support storage of information required by MIAME. BASE has an experiment item overview functionality useful for validating information related to experiments. The validation level is user selectable of which the option regarding MIAME compliance is most relevant here. When users or server administrators create annotation types in BASE these annotation values can be marked as required by MIAME and optionally defined to be a list of pre-defined values from a controlled vocabulary. Validation will check for inconsistencies and report errors, and give the user an opportunity to fix issues immediately or later. After resolving the issues raised by the validation, data can be exported for submission to public repositories such as ArrayExpress, Gene Expression Omnibus (GEO), and CIBEX.