Opened 10 years ago

Last modified 10 years ago

#1616 closed task

Clone reporter information to per-experiment tables in the dynamic database — at Version 2

Reported by: Nicklas Nordborg Owned by: everyone
Priority: critical Milestone: BASE 3.1
Component: core Version:
Keywords: Cc:

Description (last modified by Nicklas Nordborg)

The idea is to increase performance issues arising from a large reporter table when joining this with analyzed data in the dynamic database. Since this seems to not be the case a separate ticket was added for this (#1618).

The cloned table should contain only the reporters that have been used in the experiment. It should be possible to manually synchronize the cloned table with the master table to update the annotations or to add more reporters after adding raw bioassays (that may reference reporters that wasn't referenced before).

This ticket is related to #1280 (Versioned annotations of reporters).

An advanced feature would be to clone only a selected set of reporter annotations. For simplicity reasons and to avoid user confusion an administrator should be responsible for setting up presets that regular users can choose between when setting up the experiment. It should be possible to select a different preset when synchronizing (which means that the existing table needs to be dropped and re-created with the new columns).

Change History (2)

comment:1 Changed 10 years ago by Nicklas Nordborg

Since we had test code that was developed for #903, I decided to make some test and found that the "reporter serch" in "experiment explorer" is very slow to use.

I created a new BASE instance on my computer and imported 500 raw bioassays (2 different in 250 copies each). They all used the same array design so the number of reporters ended up around ~50K.

I created an experiment and root bioassay sets with 10, 50 and 100 bioassays. Approximate times for re-generating the "reporter serch" page after changing a filter or sort order is:

Assays   Time (seconds)
------   --------------
10       ~3-5
50       ~10-20
100      ~30-40

This was not promising since the reporter table is small... So I added a lot more reporters to the table (~2.5M) and repeated the tests. The result was a bit of surprise since there was almost no difference in the times. It took only a few seconds longer than with 50K reporters.

So, it seems like it is not the number of reporters in the reporter table that is the limiting factor.

After carefully reading through #903 this was less of a surprise. The first SQL query listed in comment:23:ticket:903 is similar to the query generated by the "reporter search" page. And it has already been stated that this query is slow due to the join between the spot and position mapping tables. The fix mentioned in comment 24 is solving the issue by splitting the query in 2 but this only works when we are interested in the reporter/position mapping.

To speed up the "reporter search" a different fix is needed that also works when we are looking for other reporter information.

In other words, I am not certain that cloning the reporter table will make much difference performance-wise. The cloning would still solve #1280 and this may be reason enough to implement it.

comment:2 Changed 10 years ago by Nicklas Nordborg

Description: modified (diff)
Note: See TracTickets for help on using tickets.