Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#1616 closed task (fixed)

Clone reporter information to per-experiment tables in the dynamic database

Reported by: nicklas Owned by: nicklas
Priority: critical Milestone: BASE 3.1
Component: core Version:
Keywords: Cc:

Description (last modified by nicklas)

The idea is to increase performance issues arising from a large reporter table when joining this with analyzed data in the dynamic database. Since this seems to not be the case a separate ticket was added for this (#1618).

The cloned table should contain only the reporters that have been used in the experiment. It should be possible to manually synchronize the cloned table with the master table to update the annotations or to add more reporters after adding raw bioassays (that may reference reporters that wasn't referenced before).

This ticket is related to #1280 (Versioned annotations of reporters).

An advanced feature would be to clone only a selected set of reporter annotations. For simplicity reasons and to avoid user confusion an administrator should be responsible for setting up presets that regular users can choose between when setting up the experiment. It should be possible to select a different preset when synchronizing (which means that the existing table needs to be dropped and re-created with the new columns).

Change History (16)

comment:1 Changed 6 years ago by nicklas

Since we had test code that was developed for #903, I decided to make some test and found that the "reporter serch" in "experiment explorer" is very slow to use.

I created a new BASE instance on my computer and imported 500 raw bioassays (2 different in 250 copies each). They all used the same array design so the number of reporters ended up around ~50K.

I created an experiment and root bioassay sets with 10, 50 and 100 bioassays. Approximate times for re-generating the "reporter serch" page after changing a filter or sort order is:

Assays   Time (seconds)
------   --------------
10       ~3-5
50       ~10-20
100      ~30-40

This was not promising since the reporter table is small... So I added a lot more reporters to the table (~2.5M) and repeated the tests. The result was a bit of surprise since there was almost no difference in the times. It took only a few seconds longer than with 50K reporters.

So, it seems like it is not the number of reporters in the reporter table that is the limiting factor.

After carefully reading through #903 this was less of a surprise. The first SQL query listed in comment:23:ticket:903 is similar to the query generated by the "reporter search" page. And it has already been stated that this query is slow due to the join between the spot and position mapping tables. The fix mentioned in comment 24 is solving the issue by splitting the query in 2 but this only works when we are interested in the reporter/position mapping.

To speed up the "reporter search" a different fix is needed that also works when we are looking for other reporter information.

In other words, I am not certain that cloning the reporter table will make much difference performance-wise. The cloning would still solve #1280 and this may be reason enough to implement it.

comment:2 Changed 6 years ago by nicklas

  • Description modified (diff)

comment:3 Changed 6 years ago by nicklas

  • Milestone changed from BASE 3.0 to BASE 3.1

comment:4 Changed 6 years ago by nicklas

  • Owner changed from everyone to nicklas
  • Status changed from new to assigned

comment:5 Changed 6 years ago by nicklas

(In [5876]) References #1616: Clone reporter information to per-experiment tables in the dynamic database

Added ReporterCloneTemplate as a main item type in BASE. The template can hold a list of reporter properties (ClonedProperty) which defines the properties that should be cloned. The id, version and externalId are mandatory.

When a template is used to to clone reporter information into an experiment, a locked copy of the template is created. This is needed to avoid problems if the template is changed.

There is no gui for defining templates, and the cloned reporter information can't be used in queries or table listings yet.

comment:6 Changed 6 years ago by nicklas

(In [5877]) References #1616: Clone reporter information to per-experiment tables in the dynamic database

Added gui for defining reporter clone templates. Changes to the batcher API. Added an API (TransactionalAction) for hooking into transaction commit/rollback.

comment:7 Changed 6 years ago by nicklas

(In [5878]) References #1616: Clone reporter information to per-experiment tables in the dynamic database

Added a reporter cloning plug-in and buttons in the gui for starting it and for removing the cloned table. The gui probaby need some improvements and it should be possible to get more information about the cloned reporters...

Added permission checks in the ReporterCloneBatcher.

comment:8 Changed 6 years ago by nicklas

(In [5879]) References #1616: Clone reporter information to per-experiment tables in the dynamic database

DynamicSpotQuery and DynamicPosQuery now join to the cloned reporter table if it exists and not told otherwise. It is possible to force the queries to use the master table if desired and also to use both.

The changes should automatically be used by any code that executes dynamic queries. In some cases this may be a problem if the calling code is not aware of that only a subset of all reporter properties may be available. The "Spot data" listing and "Expression builder" has been updated to only use the cloned reporter information.

Experiment explorer is more of a mess. The main problem is that the fixes for #1618 involved making some queries against the main reporter table without joins to the dynamic database. Now we'll need an alternate branch that do the same queries against the cloned reporter table if it exists. The difficult part is that the we need to support both variants and that the API used to get the information we need is not very similar...

comment:9 Changed 6 years ago by nicklas

(In [5885]) References #1616: Clone reporter information to per-experiment tables in the dynamic database

Updated Experiment explorer to use cloned information when possible. Renamed ClonedProperty to ClonableProperty.

comment:10 Changed 6 years ago by nicklas

(In [5886]) References #1616: Clone reporter information to per-experiment tables in the dynamic database

Added possibility to clone reporters based on all reporters that are references by raw data in the experiment.

Allow a "null" template to be used which means that all reporter properties are cloned.

Added some more tests to the test programs.

comment:11 Changed 6 years ago by nicklas

(In [5887]) References #1616: Clone reporter information to per-experiment tables in the dynamic database

Removed deprecation on some methods since that was never intended.

comment:12 Changed 6 years ago by nicklas

(In [5892]) References #1616: Clone reporter information to per-experiment tables in the dynamic database

Fixes issues in Experiment explorer when a default context (filters, sort order, etc.) uses reporter properties that are not cloned. The fix is a filter that can be added to the ItemContext class so that non-existing properties are excluded.

comment:13 Changed 6 years ago by nicklas

(In [5897]) References #1616: Clone reporter information to per-experiment tables in the dynamic database

Added a tab on the experiment view page that lists cloned and master reporter information that can be used to find out if the master reporter information has been modified. It is then possible to update the cloned information.

This more or less completes the intended functionality. I'll keep the ticket open for some more testing.

comment:14 Changed 6 years ago by nicklas

(In [5969]) References #1616: Clone reporter information to per-experiment tables in the dynamic database

Added note about incompatibility to the documentation.

comment:15 Changed 6 years ago by nicklas

  • Resolution set to fixed
  • Status changed from assigned to closed

comment:16 Changed 6 years ago by nicklas

(In [5974]) References #1616: Clone reporter information to per-experiment tables in the dynamic database

Run test program from TestAll.

Note: See TracTickets for help on using tickets.