Opened 11 years ago
Closed 11 years ago
#790 closed (fixed)
Quantitative hits comparison optimisation
Reported by: | Fredrik Levander | Owned by: | olle |
---|---|---|---|
Milestone: | Proteios SE 2.18.0 | Keywords: | |
Cc: |
Description
The quantitative hits comparison (#736) needs optimisation to support large datasets.
Change History (11)
comment:1 Changed 11 years ago by
Owner: | set to olle |
---|
comment:3 Changed 11 years ago by
Traceability note:
- Quantitative hits comparison was introduced in Ticket #736 (quantitative comparison report).
comment:4 Changed 11 years ago by
Design discussion.
- Currently, quantitative hits comparison is only defined for non gel-based data, so the design will be limited to this case.
- Quantitative hits comparison is managed by class/file
plugins/HitsComparisonQuantitativeReportPlugin.java
inplugin/
. - For convenience, data is copied to an instance of inner class
HitsComparisonData
, making it possible to use a single argument of this type when calling methods requiring access to different parts of the data. However, this is currently only used for twodoExport(...)
method calls in sequence, and one of them only performs file checks, and can be replaced by a simpler method. In order to not require more memory than necessary, use of inner classHitsComparisonData
should be eliminated, at the expense of a longer argument list in the remainingdoExport(...)
method. - The generation of the comparison list contains a double loop, where for all external id (for proteins) or sequence values (for peptides), all hits in each of the two hit selections are checked. Both lists can be large, but only few hits in the hit selections contribute to the result for a specific external id or sequence value. It is therefore desirable to make this process more effective. The hit selections are already partly sorted after values for the variables in the outer loop. If the hit selections could be fully sorted after the latter values, effective search algorithms could used to find the hits of interest, eliminating the need to check all hits in the hit selections.
comment:5 Changed 11 years ago by
Design update:
- Class/file
plugins/HitsComparisonQuantitativeReportPlugin.java
inplugin/
should be updated to no longer use inner classHitsComparisonData
, in order to not require more memory than necessary, and to reduce the code size. - Each hit selection list should be created by a single database query, where the results are sorted after external id for protein comparison and sequence values for peptide comparison. In the main report generation loop, an algorithm exploring the sorting will be used to find the first list item (if any) with a specified value of the loop variable in the outer loop, instead of checking every item in the hit list. Since the list is sorted, all items with the specified value can be found by checking subsequent items in the list, until an item is found with a different value, or the end of the list is reached. Hopefully, this will reduce the report generation time for large data sets.
- As a minor issue,
Boolean
variablemodificationsIncluded
should be renamedincludeModifications
, to clarify that it represents a logical value, and not a collection like a list or a set.
comment:6 Changed 11 years ago by
(In [4405]) Refs #790. Refs #736. Class/file plugins/HitsComparisonQuantitativeReportPlugin.java
in plugin/
updated:
- Inner class
HitsComparisonData
will no longer be used as a temporary copy of the data, in order to not require more memory than necessary, and to reduce the code size. - Public method
void doExport(DbControl dc, HitsComparisonData hcd, Directory outCoreDir, String filename, ProgressReporter progress)
is removed, and the output file is now obtained by new public methodFile fetchOutCoreFile(DbControl dc, Directory outCoreDir, String filename)
.
comment:7 Changed 11 years ago by
comment:8 Changed 11 years ago by
(In [4407]) Refs #790. Refs #736. Class/file plugins/HitsComparisonQuantitativeReportPlugin.java
in plugin/
updated to have comparison report generation optimized for large data sets. Each hit selection list is now created by a single database query, where the results are sorted after external id for protein comparison and sequence values for peptide comparison. In the main report generation loop, an algorithm exploring the sorting is used to find the first list item (if any) with a specified value of the loop variable in the outer loop, instead of checking every item in the hit list. Since the list is sorted, all items with the specified value can be found by checking subsequent items in the list, until an item is found with a different value, or the end of the list is reached. Hopefully, this will reduce the report generation time for large data sets.
- Private method
List<Hit> fetchHitSelectionHitList(...)
updated with two new arguments,boolean gelBasedComparison
andBoolean includeModifications
. The hit list for each selection will now be created by a single database query obtained from updated private methodItemQuery<Hit> createBasicHitQuery(...)
. - Private method
ItemQuery<Hit> createBasicHitQuery(...)
updated to take lists of gel external id, local sample id, and fraction id as arguments, instead of specific values for the variables. The lists are used directly to create the query. - Private methods
void createQuantitativeReportPeptideTable(...)
andvoid createQuantitativeReportProteinTable(...)
updated to call new private methodList<Hit> updateHitList(...)
to create the hit list used for the report. - New private method
List<Hit> updateHitList(List<Hit> hitList, List<Hit> sortedHitList, boolean gelBasedComparison, String comparisonType, Boolean includeModifications, String comparisonValue, Integer charge)
added. It updates the hit list with hits from sorted hit list. The hits to process are found by calling new private methodint firstSortItemIndexInSortedHitList(...)
to find the first index for given value for sorted hit list. - New private method
int firstSortItemIndexInSortedHitList(List<Hit> sortedHitList, boolean gelBasedComparison, String comparisonType, Boolean includeModifications, String comparisonValue)
added. It uses a binary search algorithm to find the first index in the hit list for an item having a given value. - New private method
void createHitDebugTable(PrintWriter writer, List<Hit> hitList, String comparisonType, String quantityVariable)
added for debug purposes. It is not used in production code, but left in the code base for convenience.
comment:9 Changed 11 years ago by
(In [4408]) Refs #790. Refs #736. Class/file plugins/HitsComparisonQuantitativeReportPlugin.java
in plugin/
updated in public method void run(Request request, Response response, ProgressReporter progress)
to set value of outfile
string used in job completion comment. It was originally set in public method void doExport(DbControl dc, HitsComparisonData hcd, Directory outCoreDir, String filename, ProgressReporter progress)
, which was later removed.
comment:10 Changed 11 years ago by
(In [4409]) Refs #790. Refs #736. Class/file plugins/HitsComparisonQuantitativeReportPlugin.java
in plugin/
updated to report progress percentage when generating the comparison report:
- Private methods
void createQuantitativeReportPeptideTable(...)
andvoid createQuantitativeReportProteinTable(...)
updated with new argumentProgressReporter progress
. Itsdisplay(int percent, String message)
method is called to display an updated progress percentage whenever a new integer percentage value of the outer loop of the report generation occurs.
comment:11 Changed 11 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Ticket closed as the added update hopefully will increase performance of quantitative hits comparison for large data sets.
Ticket assigned to olle.