Opened 15 years ago

Closed 15 years ago

Last modified 15 years ago

#1374 closed enhancement (fixed)

Caching of experimental factors

Reported by: Jari Häkkinen Owned by: Nicklas Nordborg
Priority: critical Milestone: BASE 2.14
Component: core Version: 2.13
Keywords: Cc:

Description

The experiment properties view takes a long time to appear when there are many experimental factors associated with the experiment. This is due to the costly check of all inherited annotaions. If the experimental factors are cached with each experiment the performance and user experience would be improved.

I suggest that we add a caching mechanism for experimental factors and the user has to actively update the experimental factors when needed.

In the experiment properites tab experimental factors listing should show a caching status (i.e., when the cache was last refreshed), allow the user to update a single annotation or all annotations at once.

Could such a caching also improve the performance experiment explorer? Using the experiment explorer is very slow and it might be a consequence of expensive annotation queries.

Change History (20)

comment:1 by Nicklas Nordborg, 15 years ago

Owner: changed from everyone to Nicklas Nordborg
Priority: majorcritical
Status: newassigned

Here is the idea:

The offending method is probably the AnnotationSet.findAnnotations() method. It will look through all primary and inherited annotations. The result of this method should be cached. It is not possible to cache the returned list as it is since the Annotation object is not serializable. Instead, we should create AnnotationProxy objects that hold the most important information, eg. various id:s, date, the actual annotation values, etc. This object should be serializable.

Then we should overload the findAnnotations method with a new method that first looks in the cache. If the result is in the cache and is not too old, we use this, otherwise the regular findAnnotations method is called. The result is proxied and written to the cache before returning the information to the caller.

comment:2 by Johan Vallon-Christersson, 15 years ago

Would it be possible to present the cache date to the user in the web interface, e.g., in conjunction to the experimental factors table and/or the rawbioassays table in an experiment?

Would it be possible to have a button or link that allows the user to manually initialize a call to the findAnnotations method, i.e., caching of information?

Would the findAnnotations method include the option to automatically try to (for rawbioassays) inherit annotations from parents for all (or a selection) of experimental factors? Would this be a separate ticket/enhancement (somewhat related to ticket:1394)?

comment:3 by Nicklas Nordborg, 15 years ago

Would it be possible to present the cache date...

The date will be available when holding the mouse over an icon that indicates that the valeues has been taken from the cache. Note that since the cache is done per raw bioassay and annotation type, there will be a lot of those icons/dates (eg. one for each cell in the raw bioassays table).

Would it be possible to have a button or link that allows the user to manually initialize a call to the findAnnotations method, i.e., caching of information?

The cache can automatically be updated by removing the cache entry and reloading the page. Clicking on the "cached value" icon will do this.

Would the findAnnotations method include the option to automatically try to (for rawbioassays) inherit annotations from parents

The findAnnotations method can't be used for inheriting annotations. This is done in a different method. This function already exists so there is no need for a ticket.

in reply to:  3 comment:4 by Johan Vallon-Christersson, 15 years ago

Replying to nicklas:

Would it be possible to have a button or link that allows the user to manually initialize a call to the findAnnotations method, i.e., caching of information?

The cache can automatically be updated by removing the cache entry and reloading the page. Clicking on the "cached value" icon will do this.

Would it be possible to have a button that initialize update for cached information for all annotations for all rawbioassays in the experiment (possibly then by a method that in turn calls findAnnotations)? I can imagine that a user would like to update everything in one go instead of doing one at the time (in addition to having the option to just update a particular experimental factor).

Would the findAnnotations method include the option to automatically try to (for rawbioassays) inherit annotations from parents

The findAnnotations method can't be used for inheriting annotations. This is done in a different method. This function already exists so there is no need for a ticket.

The function that exists (unless I've missed something) is for the user to inherit annotations for one experimental factor at the time. Is it possible for the user to click to inherit annotations for all experimental factors in the experimental factors table? If not, this would be desirable and I could write a ticket with a user case.

comment:5 by Nicklas Nordborg, 15 years ago

Changes to the 'inherit annotations' functionality should go into a new ticket. This ticket is about the caching.

Initial test results indicate that the performance can maybe be increased by a factor of 10. The loading time of the "Raw bioassays" table with the test data set (4 raw bioassays, 3 experimental factors) goes from around 2 seconds to 0.2 seconds.

I also found another intersting thing. By optimizing some of the other item loading I think it is possible to have the caching system automatically detect if an annotation has been modified since it was cached. This would make the caching completely transparent and modified annotations are automatically updated without the help of the user. The performance factor may still be as as high as 6-8. The icons for updating the cached values are really cluttering the web interface and it would be nice to not have to worry about this. It will also make it easier to use the caching system in other places (eg. experiment explorer) since we don't need to have the 'update cache' functionality everywhere.

in reply to:  5 comment:6 by Johan Vallon-Christersson, 15 years ago

Replying to nicklas:

Initial test results indicate that the performance can maybe be increased by a factor of 10. The loading time of the "Raw bioassays" table with the test data set (4 raw bioassays, 3 experimental factors) goes from around 2 seconds to 0.2 seconds.

Sounds promising! Some tests on the production installation; loading time for the Properties tab in an experiment.

  • 577 rawbioassays and 13 experimental factors (3 min 30 sec).
  • 190 rawbioassays and 70 experimental factors (6 min 20 sec).

Enhanced performance is critical here and a factor 10 would make a big difference.

I also found another intersting thing. By optimizing some of the other item loading I think it is possible to have the caching system automatically detect if an annotation has been modified since it was cached. This would make the caching completely transparent and modified annotations are automatically updated without the help of the user. The performance factor may still be as as high as 6-8. The icons for updating the cached values are really cluttering the web interface and it would be nice to not have to worry about this. It will also make it easier to use the caching system in other places (eg. experiment explorer) since we don't need to have the 'update cache' functionality everywhere.

This is also very promising. Completely agree that it would be better to have caching done automatically. A suppose there will always be a trade-off for performance; there is a big difference between a factor 6 and 10 when cutting loading times as long as 6-7 minutes.

The point on being able to reuse the caching system in experiment explorer is a very valid one. EExplorer is a place were enhanced performance is of importance. For example, loading time for EExplorer:

  • 190 assays (49576 spots/assay) and 70 experimental factors (3 min 40 sec).


Also, the loading of Reporter search (within EExplorer) needs performance optimization (perhaps by caching reporter data) although this might be a different ticket? An example: loading the Reporter search for the bioassayset above takes 5 min.

in reply to:  5 comment:7 by Johan Vallon-Christersson, 15 years ago

Replying to nicklas:

Changes to the 'inherit annotations' functionality should go into a new ticket.

Created a new ticket:1397 regarding the 'inherit annotations' functionality.

comment:8 by Nicklas Nordborg, 15 years ago

The reporter search was already optimised in #903. The remaining issue here is that the cached reporter/position information needs to be updated when things like filters, sort order, etc. are changed in the reporter search. I don't think we can cut the cache update time anymore. But this is also not part of this ticket.

comment:9 by Nicklas Nordborg, 15 years ago

Arrrggghhh... there are some annoying problems... The auto-detection for changes only works when the actual annotation values are modified. It doesn't work with structural changes, eg. when inheriting new or different annotations, or if an annotation is removed. I think that we need to cache at a different level than the findAnnotations method. I'll try to see if we can manage the cache at AnnotationSet level which would make it easier to validate cached results.

comment:10 by Nicklas Nordborg, 15 years ago

(In [5120]) References #1374: Caching of experimental factors

The cache is now working. #1400, #1401 and some other enhancements indicate that the initial caching can be 2-3 times quicker than the old code. The performance gain with the cache seems to be about 5-7 times. The end result is about 15-20 times quicker than the old code. Note that this was for a very simple experiment with 4 raw data sets and 5 experimental factors and almost no other data in the database. This should be tested in real-world scenario.

IMPORTANT NOTE!! Cache invalidation has not yet been implemented. What goes into the cache stays in the cache. Updates and changes to annotations will not be visible!

comment:11 by Nicklas Nordborg, 15 years ago

(In [5121]) References #1374: Caching of experimental factors

This should invalidate (delete) the cached snapshots when annotation values or the inheritance structure is changed.

comment:12 by Nicklas Nordborg, 15 years ago

(In [5123]) References #1374: Caching of experimental factors

The 'Annotations & parameters' tab on single-item view pages is now using the cached snapshots as well. The main reason is that this will re-create the snapshot immediately after a change has been made and it should improve performance on pages that needs to list lots of annotations.

NOTE! This change also affect the change made in [5102] for ticket #1373 (due to be released in BASE 2.13.1). The changes are incompatible and changes for list_annotations.jsp that was made in 5102 SHOULD NOT be merged back to the trunk.

comment:13 by Nicklas Nordborg, 15 years ago

(In [5124]) References #1374: Caching of experimental factors

Implements our own serialization code. The file sizes shrink considerably and so does the loading time. I hope we can get another factor 2 in loading time :). The important thing now is that we must remember to synchronize everything if the number/types of internal variables change in the future.

comment:14 by Nicklas Nordborg, 15 years ago

We have started with performance tests of this change. Here are some figures for the original BASE 2.13 code. Time is measured by calls to System.currentTimeMillis() before and after the part of code that generates the Raw bioassays. Here are the figures (in minutes and seconds):

Experiment A B
Raw bioassays 577 190
Experimental factors 14 80
Total annotations 8078 13300
Time 1 2.38 3.18
Time 2 2.27 3.18
Time 3 2.27 3.14

comment:15 by Nicklas Nordborg, 15 years ago

Initial caching times: A = 1m9s; B = 8.9s Using the cached data: A = 3.7s; B = 1.5s

It's nice, isn't it!

Still... loading times may feel longer. On my computer it takes around 10-20 seconds for Firefox to render the page... This time is not included in any of the measurments (before or after).

in reply to:  15 comment:16 by Johan Vallon-Christersson, 15 years ago

Replying to nicklas:

Initial caching times: A = 1m9s; B = 8.9s Using the cached data: A = 3.7s; B = 1.5s

Major enhancement!!

comment:17 by Nicklas Nordborg, 15 years ago

Resolution: fixed
Status: assignedclosed

(In [5128]) Fixes #1374: Caching of experimental factors

Removed timing code.

comment:18 by Nicklas Nordborg, 15 years ago

(In [5131]) References #1374 and #1403: Caching of annotations, etc...

Made some changes to the API to make it more useful in other contexts. For example, the ability to find annotation snapshots by annotation id is needed by #1404.

comment:19 by Jari Häkkinen, 15 years ago

We have encountered problems on my test setup. When we add an annotation to an raw bioassay it does not appear in experiment explorer. The annotation is visible in item overview for the raw bioassay. If I restart tomcat the annotation will also show up in EE. I run on trunk revision 5143.

comment:20 by Nicklas Nordborg, 15 years ago

This is a side-effect of #1389 that introduced in-memory caching of annotation information when selecting an experimental factor. The cache should be cleared when a user logs out.

In any case, the caching done in #1389 is maybe not needed anymore now that this ticket and #1403 has been fixed. Or... the cache should at least be designed so that it is cleared when an experimental factor is deselected. I'll re-open #1389.

Note: See TracTickets for help on using tickets.