This document gives a brief overview of the Base application.
Contents
Base 2 stores most of it's data in a database. The database is divided into two parts, one fixed and one dynamic part.
The fixed part contains tables that corresponds
to the various items found in Base. There is, for example, one table
for users, one table for groups and one table for reporters. Some items
share the same table. Biosources, samples, extracts and labeled extracts are
all biomaterials and share the BioMaterials
table. The access
to the fixed part of the database goes through Hibernate or in some cases
through the Batch API.
The dynamic part of the database contains tables for storing analyzed data. Each experiment has it's own set of tables and it is not possible to mix data from two experiments. The dynamic part of the database can only be accessed by the Batch API and the Query API using SQL and JDBC.
NOTE! The actual location of the two parts depends on the database that is used. MySQL uses two separate databases, Postgres uses one database with two schemas.
More information
Hibernate (www.hibernate.org) is an
object/relational mapping software package. It takes plain Java objects
and stores them in a database. All we have to do is to set the properties
on the objects (for example: user.setName("A name")
). Hibernate
will take care of the SQL generation and database communication for us.
But, this is not a magic or automatic process. We have to provide mapping
information about what objects goes into which tables and what properties
goes into which columns, and other stuff like caching and proxying settings, etc.
This is done by annotating the code with Javadoc comments. The classes
that are mapped to the database are found in the net.sf.basedb.core.data
package, which is shown as the Data classes
box in the image above.
Hibernate supports many different database systems. In theory, this means
that Base 2 should work with all those databases. In practice we have, however,
found that this is not the case. For example, Oracle, converts empty strings
to null
values, which breaks some parts of our code that
expects non-null values. And, our Batch and Query API:s generate some SQL as well.
They try to use database dialect information from Hibernate, but it is not
always possible. The net.sf.basedb.core.dbengine
contains code
for generating the SQL that Hibernate can't help us with. There is a generic ANSI
driver and special drivers for MySQL and Postgres. We don't expect Base 2 to work
with other databases without modifications.
More information
Hibernate comes with a price. It affects performance and uses a lot of memory. This means that those parts of Base 2 that often handles lots of items at the same time doesn't work well with Hibernate. This is for example reporters, array design features and raw data. We have created the Batch API to solve these problems.
The Batch API uses JDBC and SQL directly against the database. However, we
still use metadata and database dialect information available from Hibernate
to generate the SQL we need. This should make the Batch API just as
database-independent as Hibernate is. The Batch API can be used for any
BatchableData
class in the fixed part of the database and is the
only way for adding data to the dynamic part.
NOTE! The main reason for the Batch API is to avoid the internal caching of Hibernate which eats lots of memory when handling thousands of items. Hibernate 3.1 introduced a new stateless API which among other things doesn't do any caching. This version was released after we had created the Batch API. We made a few tests to check if it would be better for us to switch back to Hibernate but found that it didn't perform as well as our own Batch API (it was about 2 times slower). Future versions of Hibernate may perform better so the Batch API may have to be revised again.
More information
The data classes are, with few exceptions, for internal use. These are the classes that are mapped to the database with Hibernate mapping files. They are very simple and contains no logic at all. They don't do any permission checks or any data validation.
Most of the data classes has a corresponding item class. For example: UserData
and User
, GroupData
and Group
, but there is no
corresponding item class for the class. The item classes are what
the client applications can see and use. They contain logic for permission checking
(for example if the logged in user has WRITE permission) and data validation (for example
setting a required property to null).
The only exception to this setup is that batchable data classes doesn't
have a corresponding item class. The reason is that the data/item class
relation needs the caching system, but in the Batch API we want to cache as little
as possible. Hence, it doesn't make sense to have an item class. This creates
another problem since we still need to do permission checking and data validation.
This was solved by moving that part of the code to the batcher classes
(ie. ReporterBatcher
).
More information
The Query API is used to build and execute queries against the data in the
database. It builds a query by using objects that represents certain
operations. For example, there is an EqRestriction
object
which tests if two expressions are equal and there is an AddExpression
object which adds two expressions. In this way it is possible to build
very complex queries without using SQL or HQL.
The Query API knows can work both via Hibernate and via SQL. In the first case it generates HQL (Hibernate Query Language) statements which Hibernate then translates into SQL. In the second case SQL is generated directly. In most cases HQL and SQL are identical, but not always. Some situations are solved by having the Query API generate slightly different query strings. Some query elements can only be used with one of the query types.
NOTE! The object-based approach makes it a bit difficult to store
a query for later reuse. The net.sf.basedb.util.jep
package contains an expression parser that can be used to convert
a string to Restriction
:s and Expression
:s for
the Query API. While it doesn't cover 100% of the cases it should be
useful for the WHERE
part of a query.
More information
The Controller API is the very heart of the Base 2 system. This part of the core is used for boring but essential details, such as user authentication, database connection management, transaction management, data validation, and more. We don't write more about this part here, but recommends reading the documents below.
More information
From the core code's point of view a plugin is just another client application. A plugin doesn't have more powers and doesn't have access to some special API that allows it to do cool stuff that other clients can't.
However, the core must be able to control when and where a plugin is
executed. Some plugins may take a long time doing their calculations
and may use a lot of memory. It would be bad if a 100+ users started
to execute a resource-demanding plugin at the same time. This problem is
solved by adding a job queue. Each plugin that should be executed is
registered as Job
in the database. A job controller is
checking the job queue at regular intervals. The job controller can then
choose if it should execute the plugin or wait depending on the current
load on the server.
NOTE! This part of the core is not yet very developed. The only existing
job controller is the internal job controller which just starts another thread
in the same process. This means that a badly written plugin may crash the
entire web server. For example, do not call System.exit()
in the plugin code, since it shuts down Tomcat as well. This part of the core
is something that must be changed in the near future.
More information
Client applications are application that use the Base 2 core API. This document hasn't much to say about them. The current web application is built with Java Server Pages (JSP). It is supported by several application server but we have only tested it with Tomcat.
Another client application is the migration tool that migrates Base 1.2.x data to Base 2.