Chapter 1. Why use BASE

Table of Contents

1.1. Case I: The SCAN-B BASE installation at Department of Oncology, Lund University
1.2. Case II: The BASE installation at SCIBLU, Department of Oncology, Lund University

BASE was initially developed to manage array-based data but is now extended to support storage and analysis of sequencing data. The first sequencing application is RNAseq.

We outline two different uses of BASE to give a flavour why you should consider to use BASE. The first example describes a research project involving sequencing based gene expression analysis and the second example describes a microarray service facility use of BASE.

1.1. Case I: The SCAN-B BASE installation at Department of Oncology, Lund University

SCAN-B is a project and network of researchers and clinicians that was initialised in the fall 2009. The project is centred on a prospective study where all new breast cancer patients in southern Sweden are asked to enrol. Within the covered region approximately 1500 patients are diagnosed with breast cancer annually. The overall aim is to continuously collect and analyse the consecutive, population-based, breast cancer cohort. Analyses include generation of gene expression and sequencing data with the ultimate goal to build an infrastructure for future real-time clinical implementation.

SCAN-B uses BASE to store and manage all information related to enrolled patients and collected sample material including clinical information and experimental data. Analysis and execution of standard analysis pipelines for sequencing data will be performed through BASE.

The SCAN-B BASE installation consists of three main parts; first, the hardware on which the system runs; secondly, the BASE software and database, as well as configured analysis plugins; thirdly, an external file system for storage of sequencing data that are referenced from BASE. In addition, maintenance of the hardware and configured database/software is required. The server hardware comprises one main computer and raided hard drive system. It also includes a backup solution configured to backup the entire database 2 times per week. Computational nodes are connected to the main computer and used to run configured analysis plugins in a seamless integrated fashion. Maintenance includes managing the backup-schedule, updating the main BASE software, developing, configuring, and maintaining analysis plugins, and maintaining the underlying database and external storage file systems.

Whereas the BASE software itself is freely available to anyone, a particular BASE installation at a research site is in general not freely accessible. Although BASE can be downloaded and installed on a regular of-the-shelf personal computer with relative ease by anyone, considerable effort and costs are associated with maintaining a BASE installation of the size and scope of the SCAN-B BASE installation. A pristine BASE installation includes generic features and functionality to support a framework of procedures to manage data collection in large projects. Within SCAN-B large effort is spent on defining the required procedures where laboratory work is mirrored in BASE. This implies interplay with adopting the BASE software (the Reggie extension is an example of adaptation on BASE to specific needs in SCAN-B) and the laboratory work to achieve efficient data collection. To achieve high quality data production, measures for continuous quality assurance and collection of data associated with patients, samples, and laboratory processing must also be implemented.

1.2. Case II: The BASE installation at SCIBLU, Department of Oncology, Lund University

In the spring of 2004, Lund University created Swegene Centre for Integrative Biology at Lund University (SCIBLU), which comprise the merger of five of the most successful Swegene resource and development platforms into one unit, located in the Lund University Biomedical Centre (BMC). SCIBLU offers integrated service within the main -omics areas. The DNA microarray technology within SCIBLU was initially established in 2000 as a cancer research resource at the department of Oncology and in conjunction with this the development of BASE was initiated.

At SCIBLU a BASE installation is maintained and used as a production installation that manages information surrounding array fabrication (array LIMS) as well as array data generated by the SCIBLU provided services. This particular BASE installation was initially set up in 2003 and to date manage array data from more than 13 000 hybridisation covering a variety of technical platforms such as cDNA, oligo, and BeadChip expression arrays, as well as BAC and oligo aCGH arrays.

The SCIBLU BASE installation consists of two main parts; first, the hardware on which the system runs; secondly, the BASE software and database, as well as configured analysis plugins. Regular maintenance of the hardware and configured database/software is also required. The hardware comprises one main computer and raided hard drive system. It also includes a backup solution configured to backup the entire database 2 times per week. Finally, the hardware includes 2 computational servers connected to the main computer and used to run configured analysis plugins in a seamless integrated fashion. The software used for the SCIBLU BASE installation is freely available from the BASE project site. Maintenance include managing the backup-schedule, updating the main BASE software, updating and managing probe annotations, management of user accounts, configuring and maintaining analysis plugins, and maintaining the underlying database.

Users of the microarray services offered by SCIBLU, e.g., expression analysis or aCGH, are provided access to the SCIBLU production BASE installation as part of the included services. The access comprises user account, access to array LIMS (when in-house produced arrays are utilised), and hard drive space to cover space needed for storing the data generated through the SCIBLU provided service. Additional disk space can be acquired and is associated with an additional cost for the user. Examples of when additional disk space is needed include scenarios where users want to perform extensive data analysis within BASE and decide to store the analysis results within BASE, e.g., many parallel analysis branches or extensive generation of data plots and figures. Other examples include when users want to import data from third party providers (public data repositories or alternative array data providers) to perform meta-analysis with their data generated within SCIBLU.