Schematic overview

NOTE! This document is outdated and has been replaced with newer documentation. See Developer overview of BASE

This document gives a brief overview of the Base application.

Contents

Overview
Fixed database vs. dynamic database
Hibernate
Batch API
Data classes vs. item classes
Query API
Controller API
Plugin system
Client applications

Last updated: $Date: 2009-04-24 12:56:19 +0200 (fr, 24 apr 2009) $

1. Overview

2. Fixed database vs. dynamic database

Base 2 stores most of it's data in a database. The database is divided into two parts, one fixed and one dynamic part.

The fixed part contains tables that corresponds to the various items found in Base. There is, for example, one table for users, one table for groups and one table for reporters. Some items share the same table. Biosources, samples, extracts and labeled extracts are all biomaterials and share the BioMaterials table. The access to the fixed part of the database goes through Hibernate or in some cases through the Batch API.

The dynamic part of the database contains tables for storing analyzed data. Each experiment has it's own set of tables and it is not possible to mix data from two experiments. The dynamic part of the database can only be accessed by the Batch API and the Query API using SQL and JDBC.

NOTE! The actual location of the two parts depends on the database that is used. MySQL uses two separate databases, Postgres uses one database with two schemas.

More information

Dynamic API overview

3. Hibernate

Hibernate (www.hibernate.org) is an object/relational mapping software package. It takes plain Java objects and stores them in a database. All we have to do is to set the properties on the objects (for example: user.setName("A name")). Hibernate will take care of the SQL generation and database communication for us. But, this is not a magic or automatic process. We have to provide mapping information about what objects goes into which tables and what properties goes into which columns, and other stuff like caching and proxying settings, etc. This is done by annotating the code with Javadoc comments. The classes that are mapped to the database are found in the net.sf.basedb.core.data package, which is shown as the Data classes box in the image above.

Hibernate supports many different database systems. In theory, this means that Base 2 should work with all those databases. In practice we have, however, found that this is not the case. For example, Oracle, converts empty strings to null values, which breaks some parts of our code that expects non-null values. And, our Batch and Query API:s generate some SQL as well. They try to use database dialect information from Hibernate, but it is not always possible. The net.sf.basedb.core.dbengine contains code for generating the SQL that Hibernate can't help us with. There is a generic ANSI driver and special drivers for MySQL and Postgres. We don't expect Base 2 to work with other databases without modifications.

More information

4. Batch API

Hibernate comes with a price. It affects performance and uses a lot of memory. This means that those parts of Base 2 that often handles lots of items at the same time doesn't work well with Hibernate. This is for example reporters, array design features and raw data. We have created the Batch API to solve these problems.

The Batch API uses JDBC and SQL directly against the database. However, we still use metadata and database dialect information available from Hibernate to generate the SQL we need. This should make the Batch API just as database-independent as Hibernate is. The Batch API can be used for any BatchableData class in the fixed part of the database and is the only way for adding data to the dynamic part.

NOTE! The main reason for the Batch API is to avoid the internal caching of Hibernate which eats lots of memory when handling thousands of items. Hibernate 3.1 introduced a new stateless API which among other things doesn't do any caching. This version was released after we had created the Batch API. We made a few tests to check if it would be better for us to switch back to Hibernate but found that it didn't perform as well as our own Batch API (it was about 2 times slower). Future versions of Hibernate may perform better so the Batch API may have to be revised again.

More information

5. Data classes vs. item classes

The data classes are, with few exceptions, for internal use. These are the classes that are mapped to the database with Hibernate mapping files. They are very simple and contains no logic at all. They don't do any permission checks or any data validation.

Most of the data classes has a corresponding item class. For example: UserData and User, GroupData and Group, but there is no corresponding item class for the class. The item classes are what the client applications can see and use. They contain logic for permission checking (for example if the logged in user has WRITE permission) and data validation (for example setting a required property to null).


	
	
		The only exception to this setup is that batchable data classes doesn't
		have a corresponding item class. The reason is that the data/item class
		relation needs the caching system, but in the Batch API we want to cache as little
		as possible. Hence, it doesn't make sense to have an item class. This creates
		another problem since we still need to do permission checking and data validation.
		This was solved by moving that part of the code to the batcher classes 
		(ie. ReporterBatcher).
	
	
	
		More information
	
	
		Coding rules and guidelines for the data layer
		Coding rules and guidelines for item classes
		Coding rules and guidelines for batch classes
		Overview of the core API: Access permission to items
		Overview of the core API: Data validation
		Overview of the core API: Batch processing
	


	
	6. Query API
	
		The Query API is used to build and execute queries against the data in the 
		database. It builds a query by using objects that represents certain
		operations. For example, there is an EqRestriction object
		which tests if two expressions are equal and there is an AddExpression
		object which adds two expressions. In this way it is possible to build
		very complex queries without using SQL or HQL.
	
	
	
		The Query API knows can work both via Hibernate and via SQL. In the first case it
		generates HQL (Hibernate Query Language) statements which Hibernate then
		translates into SQL. In the second case SQL is generated directly. 
		In most cases HQL and SQL are identical, but not
		always. Some situations are solved by having the Query API generate
		slightly different query strings. Some query elements can only be used
		with one of the query types.
	
	
	
		NOTE! The object-based approach makes it a bit difficult to store 
		a query for later reuse. The net.sf.basedb.util.jep 
		package contains an expression parser that can be used to convert
		a string to Restriction:s and Expression:s for 
		the Query API. While it doesn't cover 100% of the cases it should be
		useful for the WHERE part of a query.
	
	
	
		More information
	
	
		Overview of queries and the Query API
	
	
	
	7. Controller API
	
		The Controller API is the very heart of the Base 2 system. This part
		of the core is used for boring but essential details, such as
		user authentication, database connection management, transaction 
		management, data validation, and more. We don't write more about this 
		part here, but recommends reading the documents below.
	
	
	
		More information
	
	
		Overview of the the core API
	
		
	
	8. Plugin system
	
		From the core code's point of view a plugin is just another client 
		application. A plugin doesn't have more powers and doesn't have
		access to some special API that allows it to do cool stuff that other
		clients can't.
	

	
		However, the core must be able to control when and where a plugin is 
		executed. Some plugins may take a long time doing their calculations
		and may use a lot of memory. It would be bad if a 100+ users started
		to execute a resource-demanding plugin at the same time. This problem is
		solved by adding a job queue. Each plugin that should be executed is
		registered as Job in the database. A job controller is 
		checking the job queue at regular intervals. The job controller can then
		choose if it should execute the plugin or wait depending on the current
		load on the server.
	
	
	
		NOTE! This part of the core is not yet very developed. The only existing
		job controller is the internal job controller which just starts another thread
		in the same process. This means that a badly written plugin may crash the 
		entire web server. For example, do not call System.exit() 
		in the plugin code, since it shuts down Tomcat as well. This part of the core
		is something that must be changed in the near future.
	

	
		More information
	
	
		Overview of the core API: Plugin execution
		Overview of the core API: Job queue management
		Plug-ins
	

	
	
	9. Client applications
	
		Client applications are application that use the Base 2 core API. This document hasn't 
		much to say about them. The current web application is built with Java Server Pages 
		(JSP). It is supported by several application server but we have only tested 
		it with Tomcat.
	
	
	
		Another client application is the migration tool that migrates Base 1.2.x data
		to Base 2.