Contents
See also
The batch system is initialised by the Application
class
when the application is started. The BatchUtil
:s init()
checks all classes registered with Hibernate if the implement the
BatchableData
interface or not. For all classes that implement that interface
SQL statements for insert, update and delete are generated. We use the information
about database tables, columns and dialect found in the Hibernate configuration.
The BatchUtil
also keeps track of the property name/parameter order
coupling so we can properly fill the prepared statements with the correct
properties.
There are some restrictions on which classes that can be batchable. A class must be
mapped to only one table and a property of the class must be a simple one, i.e. the
property can't be divided into several columns. Many-to-one and one-to-one associations
are supported but not collections or arrays. If the class is within those restriction
it only has to implement BatchableData
to be batchable.
The batching system is used for inserting, updating and/or deleting a large amount of data
in an effective way. The disadvantage with this system is that it won't synchronize the
objects with the the database. E.g. at insert the ID of the object won't
be set. Only classes that implements BatchableData
tagging interface can be batched.
A single type of batchable data object usually has three classes, a data class, a
batcher class and a utility class. The data class is the actual class holding the data and which
is mapped with Hibernate. The batcher class is used for doing the insert, update and
delete to the database. The utility class is used to get new data objects, and to
get access to otherwise protected properties. We need the utility class to bridge the gap
between classes in the net.sf.basedb.core.data
package and their
corresponding class in the net.sf.basedb.core
package. For example,
A reporter may have a reporter type. The reporter type is not a batchable class
and we must always work with the net.sf.basedb.core.ReporterType
class
and not the net.sf.basedb.core.data.ReporterTypeData
class. But
the ReporterData
can only return ReporterTypeData
objects.
Therefore we need the Reporter
class to convert the ReporterTypeData
object into a ReporterType
.
Here is an example of how to insert reporters:
// Get a DbControl DbControl dc = ...; // Create batcher for reporters ReporterBatcher rb = ReporterBatcher.getNew(dc); // Create a new reporter with external id ReporterData rd = Reporter.getNew("reporter1"); // Adds the object to the batch insertion queue rb.insert(rd); // Execute all the inserts dc.commit();
insert()
adds the object to a batch that is sent to the database when the
batcher is flushed. The batcher will be flushed when DbControl.commit()
or
Batcher.flush()
is called. You can also set a batch size that will make
the batcher flush automatically when it reach the size limit of the batch. A default batch
size is configured in the base.config
file.
rb.setBatchSize(500); // Will flush when 500 objects have been added
update()
and delete()
work as insert()
does with their own batch.
ReporterData rd = Reporter.getByExternalId(dc, "reporter1"); rd.setExtended("species", "Mus musculus"); rb.update(rd); dc.commit(); dc.close();
ReporterData rd = Reporter.getByExternalId(dc, "reporter1"); rb.delete(rd); dc.commit(); dc.close();
Insert, update and delete have all their own batches and can be flushed separately.
rb.flushInsert(); rb.flushUpdate(); rb.flushDelete(); rb.flush(); // Will flush all batches
There are some restrictions on which classes that can be batchable. A class must be
mapped to only one table and a property of the class must be a simple one, i.e. the
property can't be divided into several columns. Many-to-one and one-to-one associations
are suported but not collections or arrays. If the class is within those restriction
it only has to implement the BatchableData
interface to be batchable.
public class Foo implements BatchableData { // ... }
Many-to-one or one-to-one associations to non-batchable classes must have package private methods. We don't want to expose the data layer objects of those classes. The utility class is used to get/set those properties using Hibernate metadata methods.
private ReporterTypeData reporterType;
/**
Get the {@link ReporterTypeData} of this the reporter. Package private since
we cannot expose the data object to client applications.
@return The ReporterTypeData
item
@hibernate.many-to-one column="`reportertype_id`" not-null="false"
outer-join="false"
@see Reporter#getReporterType(net.sf.basedb.core.DbControl, ReporterData)
*/
ReporterTypeData getReporterType()
{
return reporterType;
}
void setReporterType(ReporterTypeData reporterType)
{
this.reporterType = reporterType;
}
Associations to other batchable classes may have public get/set methods.
In this case we must instead map the association with a
cascade="evict"
attribute. This will make sure that once the
object reaches the client application it is not associated with any session and
changes to it will not propagate to the database bypassing regular permission
checks.
There is one catch however. The version of XDoclet we currently use doesn't support
the cascade="evict"
attribute. Therefore we must skip the Hibernate mapping
for such properties and add it in an external xml file. For example, to use an external
mapping for the AbstractFeatureData
class the name of the external
file should be hibernate-properties-AbstractFeatureData.xml
.
<many-to-one name="reporter" class="net.sf.basedb.core.data.ReporterData" cascade="evict" fetch="select" update="false" insert="true" access="property" column="`reporter_id`" not-null="false" />
Every batchable class must have their own batcher, that is because there are things that are
class specific. A batcher always inherit from AbstractBatcher
or
BasicBatcher
which provides the core services. The BasicBatcher
is used for all items mapped with Hibernate. The Dynamic API
has batchers that inherits from the AbstractBatcher
class.
initPermissions()
method or no permissions
will be set, resulting in a PermissionDeniedException
when trying to use
the batcher. The BasicBatcher
fetches the role-based permissions for the
logged in user. This means that batchable items doesn't have unique permissions on them,
they are always treated as a group.
// ReporterBatcher.java public static ReporterBatcher getNew(DbControl dc) throws BaseException { ReporterBatcher rb = new ReporterBatcher(dc); rb.initPermissions(0, 0); return rb; }
getType()
usually returns only a constant from the Item
enumeration. The returned value is used for role-based permission checking.
validate()
method should validate the properties of a data object. It is called
by the BasicBatcher
before an object is inserted or updated. The validation
should follow the rules for case 2 validation as discussed in the
data validation document and the
coding rules and guidelines document.
onBeforeCommit()
method is called after validation just
before the object is added to the insert or update batch. It can be useful
for a batcher to override this method in case it needs to modify some property values,
for example, set the last updated date. The BasicBatcher
automatically
increments the
version property. The onBeforeCommit()
method is not called when an
object is deleted, since it is possible to delete an object from the id value.
// ReporterBatcher.java void onBeforeCommit(ReporterData data, Transactional.Action action) throws BaseException { setPropertyValue(data, "lastUpdate", new Date()); }
AbstractBatcher
class and is called
by the close()
method after it has called flush()
but
before the connection to the database has been closed. This allows a subclass
to cleanup any open resources and (more importantly) update properties on
parent items. For example the RawDataBatcher
updates the
spot count and disk usage on the raw bioassay. This method is only called one time.
Once the batcher has been closed, it cannot be used again.
// RawDataBatcher.java void onBeforeClose() throws BaseException { rawBioAssayData.setSpots(rawBioAssayData.getSpots() + getTotalInsertCount()); rawBioAssayData.setBytes(rawBioAssayData.getBytes() + bytes); }
initPermissions()
sets the permissions for the entire batcher and
is called once directly after the batcher is created. The permissions apply to all
objects handled by the batcher. It is not possible to have different permissions for
different objects. For reporters this is not a problem, since the permissions are
given by role permissions only. But for raw data, the permissions depend on the
raw bioassay they belong to. This is solved by letting a single batcher only
handle raw data for a single raw bioassay. By giving a raw bioassay as a parameter
when creating the batcher the permissions are known.
// RawDataBatcher.java RawDataBatcher(DbControl dc, RawBioAssay rawBioAssay) throws BaseException { super(dc); this.rawBioAssay = rawBioAssay; ... } ... void initPermissions(int granted, int denied) throws BaseException { if (rawBioAssay.hasPermission(Permission.READ)) { granted |= Permission.grant(Permission.READ); } if (rawBioAssay.hasPermission(Permission.WRITE)) { granted |= Permission.grant(Permission.WRITE, Permission.DELETE, Permission.CREATE); } super.initPermissions(granted, denied); }Note! Do not forget to call
super.initPermissions()
.
The utility class is used to get/set properties which doesn't have public methods in the data layer. The reason that the methods not are public are usually that we don't want the data object to be exposed to client applications.
// Reporter.java public static ReporterType getReporterType(DbControl dc, ReporterData reporter) throws PermissionDeniedException, BaseException { ReporterTypeData rtd = (ReporterTypeData)metaData.getPropertyValue(reporter, "reporterType", EntityMode.POJO); return (ReporterType)dc.getItem(ReporterType.class, rtd); } public static void setReporterType(ReporterData reporter, ReporterType reporterType) throws PermissionDeniedException, BaseException { if (reporterType != null) reporterType.checkPermission(Permission.USE); metaData.setPropertyValue(reporter, "reporterType", reporterType == null ? null : reporterType.getData(), EntityMode.POJO); }
The utility class can also be used as a place for static getNew()
,
getById()
and getQuery()
methods if there is no
other natural place for those methods. For example, the Reporter
class has all those methods as well as a getByExternalId()
method.
On the other hand the RawDataUtil
class doesn't have those methods
since the natural place is the RawBioAssay
class.
public static ReporterData getNew(String externalId) throws InvalidDataException, BaseException { if (externalId == null) { throw new InvalidUseOfNullException("externalId"); } ReporterData rd = new ReporterData(); rd.setExternalId(externalId); rd.setName("New reporter"); return rd; } public static DataQuery<ReporterData> getQuery() { return new DataQuery<ReporterData>(ReporterData.class); }