Schematic overview

NOTE! This document is outdated and has been replaced with newer documentation. See Developer overview of BASE

This document gives a brief overview of the Base application.

Contents
  1. Overview
  2. Fixed database vs. dynamic database
  3. Hibernate
  4. Batch API
  5. Data classes vs. item classes
  6. Query API
  7. Controller API
  8. Plugin system
  9. Client applications

Last updated: $Date: 2009-04-24 12:56:19 +0200 (fr, 24 apr 2009) $

1. Overview

2. Fixed database vs. dynamic database

Base 2 stores most of it's data in a database. The database is divided into two parts, one fixed and one dynamic part.

The fixed part contains tables that corresponds to the various items found in Base. There is, for example, one table for users, one table for groups and one table for reporters. Some items share the same table. Biosources, samples, extracts and labeled extracts are all biomaterials and share the BioMaterials table. The access to the fixed part of the database goes through Hibernate or in some cases through the Batch API.

The dynamic part of the database contains tables for storing analyzed data. Each experiment has it's own set of tables and it is not possible to mix data from two experiments. The dynamic part of the database can only be accessed by the Batch API and the Query API using SQL and JDBC.

NOTE! The actual location of the two parts depends on the database that is used. MySQL uses two separate databases, Postgres uses one database with two schemas.

More information

3. Hibernate

Hibernate (www.hibernate.org) is an object/relational mapping software package. It takes plain Java objects and stores them in a database. All we have to do is to set the properties on the objects (for example: user.setName("A name")). Hibernate will take care of the SQL generation and database communication for us. But, this is not a magic or automatic process. We have to provide mapping information about what objects goes into which tables and what properties goes into which columns, and other stuff like caching and proxying settings, etc. This is done by annotating the code with Javadoc comments. The classes that are mapped to the database are found in the net.sf.basedb.core.data package, which is shown as the Data classes box in the image above.

Hibernate supports many different database systems. In theory, this means that Base 2 should work with all those databases. In practice we have, however, found that this is not the case. For example, Oracle, converts empty strings to null values, which breaks some parts of our code that expects non-null values. And, our Batch and Query API:s generate some SQL as well. They try to use database dialect information from Hibernate, but it is not always possible. The net.sf.basedb.core.dbengine contains code for generating the SQL that Hibernate can't help us with. There is a generic ANSI driver and special drivers for MySQL and Postgres. We don't expect Base 2 to work with other databases without modifications.

More information

4. Batch API

Hibernate comes with a price. It affects performance and uses a lot of memory. This means that those parts of Base 2 that often handles lots of items at the same time doesn't work well with Hibernate. This is for example reporters, array design features and raw data. We have created the Batch API to solve these problems.

The Batch API uses JDBC and SQL directly against the database. However, we still use metadata and database dialect information available from Hibernate to generate the SQL we need. This should make the Batch API just as database-independent as Hibernate is. The Batch API can be used for any BatchableData class in the fixed part of the database and is the only way for adding data to the dynamic part.

NOTE! The main reason for the Batch API is to avoid the internal caching of Hibernate which eats lots of memory when handling thousands of items. Hibernate 3.1 introduced a new stateless API which among other things doesn't do any caching. This version was released after we had created the Batch API. We made a few tests to check if it would be better for us to switch back to Hibernate but found that it didn't perform as well as our own Batch API (it was about 2 times slower). Future versions of Hibernate may perform better so the Batch API may have to be revised again.

More information

5. Data classes vs. item classes

The data classes are, with few exceptions, for internal use. These are the classes that are mapped to the database with Hibernate mapping files. They are very simple and contains no logic at all. They don't do any permission checks or any data validation.

Most of the data classes has a corresponding item class. For example: UserData and User, GroupData and Group, but there is no corresponding item class for the class. The item classes are what the client applications can see and use. They contain logic for permission checking (for example if the logged in user has WRITE permission) and data validation (for example setting a required property to null).

The only exception to this setup is that batchable data classes doesn't have a corresponding item class. The reason is that the data/item class relation needs the caching system, but in the Batch API we want to cache as little as possible. Hence, it doesn't make sense to have an item class. This creates another problem since we still need to do permission checking and data validation. This was solved by moving that part of the code to the batcher classes (ie. ReporterBatcher).

More information

6. Query API

The Query API is used to build and execute queries against the data in the database. It builds a query by using objects that represents certain operations. For example, there is an EqRestriction object which tests if two expressions are equal and there is an AddExpression object which adds two expressions. In this way it is possible to build very complex queries without using SQL or HQL.

The Query API knows can work both via Hibernate and via SQL. In the first case it generates HQL (Hibernate Query Language) statements which Hibernate then translates into SQL. In the second case SQL is generated directly. In most cases HQL and SQL are identical, but not always. Some situations are solved by having the Query API generate slightly different query strings. Some query elements can only be used with one of the query types.

NOTE! The object-based approach makes it a bit difficult to store a query for later reuse. The net.sf.basedb.util.jep package contains an expression parser that can be used to convert a string to Restriction:s and Expression:s for the Query API. While it doesn't cover 100% of the cases it should be useful for the WHERE part of a query.

More information

7. Controller API

The Controller API is the very heart of the Base 2 system. This part of the core is used for boring but essential details, such as user authentication, database connection management, transaction management, data validation, and more. We don't write more about this part here, but recommends reading the documents below.

More information

8. Plugin system

From the core code's point of view a plugin is just another client application. A plugin doesn't have more powers and doesn't have access to some special API that allows it to do cool stuff that other clients can't.

However, the core must be able to control when and where a plugin is executed. Some plugins may take a long time doing their calculations and may use a lot of memory. It would be bad if a 100+ users started to execute a resource-demanding plugin at the same time. This problem is solved by adding a job queue. Each plugin that should be executed is registered as Job in the database. A job controller is checking the job queue at regular intervals. The job controller can then choose if it should execute the plugin or wait depending on the current load on the server.

NOTE! This part of the core is not yet very developed. The only existing job controller is the internal job controller which just starts another thread in the same process. This means that a badly written plugin may crash the entire web server. For example, do not call System.exit() in the plugin code, since it shuts down Tomcat as well. This part of the core is something that must be changed in the near future.

More information

9. Client applications

Client applications are application that use the Base 2 core API. This document hasn't much to say about them. The current web application is built with Java Server Pages (JSP). It is supported by several application server but we have only tested it with Tomcat.

Another client application is the migration tool that migrates Base 1.2.x data to Base 2.