Opened 10 years ago
Closed 9 years ago
#807 closed (fixed)
PRIDE export of MGF-based searches
Reported by: | Fredrik Levander | Owned by: | olle |
---|---|---|---|
Milestone: | Proteios SE 2.19.0 | Keywords: | |
Cc: |
Description
Currently PRIDE export only supports mzData-based searches. However, MGF is nowadays the most frequently used peak list format, and the PRIDE export should be updated to generate the mzData part based on the MGF in cases where MGF was used as peak list.
Probably what needs to be done is to update the code so that after calling writePrideXmlHeader(dc, xmlCrudeWriter) in doExport, a check needs to be done on the filetype of the peak list. If it is a mzData the current code can be used, and otherwise the new conversion needs to be called (provided that it is an MGF, otherwise an exception should be thrown). It is then good if the writer has functionality for writing custom sample and contact blocks (as currently can be done when using mzData peak lists), and which are options in the PRIDE export.
There is functionality for writing mzData based on peak lists in the PKL to mzData converter plugin. Changes in the current PRIDE xml is that the MS ontology is used instead of the old PSI ontology. This means that the instrument part can be retrieved from an associated mzML file if present. The last appearing instrumentConfiguration block should be used (there is probably a referencable param group holding the instrument name). Furthermore, other PSI terms should be updated to MS terms, as can be seen in recent PRIDE xml files.
A caveat is that spectrum IDs in the mzData part are integers, starting with one, while the hits in the hits table are matching spectrum string ids to the MGF TITLE lines. The spectrum id in the hits table are not equal to the spectrum number in the MGF file. This means that a mzData spectrum id to MGF TITLE map will have to be generated when converting the MGF and writing the mzData part of the file, and then this map will be needed for writing the correct SpectrumReference? for peptides.
Currently, protein assembly needs to be done per file for the export to work properly. This is due to limitations in the PRIDE XML, which only allows one peak list per PRIDE XML file. A warning about this should probably turn up somewhere.
Change History (32)
comment:1 Changed 10 years ago by
Status: | new → assigned |
---|
comment:2 Changed 10 years ago by
Traceability note:
- PRIDE XML export was introduced in Ticket #405 (Support export for publication).
- PRIDE XML export was updated in Ticket #694 (Non-gel PRIDE export fails).
- PRIDE XML export was updated in Ticket #701 (Add more info to PRIDE export).
- PRIDE XML export was updated in Ticket #805 (Support for simplifying ProteomeXchange submission) to annotate the created PRIDE XML file with the filename of the peaklist file.
- A PKL to mzData converter was introduced in Ticket #188 (Batch conversion of peaklists).
- An MGF to mzData converter was introduced in Ticket #808 (MGF to mzData Converter).
- Automatic file conversion before basic processing was introduced in Ticket #591 (Workflow for using mzData files with OMSSA), where mzData files are converted to MGF files before OMSSA search.
comment:3 Changed 10 years ago by
(In [4478]) Refs #808. Refs #807. MGF to mzData converter in class/file plugins/MgfToMzDataPlugin.java
in plugin/
updated to add the contents of the TITLE=
header line in the MGF file to a comments
tag in the SpectrumDesc
tag of the corresponding spectrum in the created mzData file. The title string will be prefixed by "title=
" in the comments tag:
- Public method
void doConvert(InputStream instream, OutputStream outstream, ProgressReporter progress)
updated to read the contents of theTITLE=
header line in the MGF file and transfer it to private methodvoid writeMzDataSpectrumDescBlock(...)
through new argumentString title
. - Private method
void writeMzDataSpectrumDescBlock(...)
updated with new argumentString title
. It will add the title to acomments
tag in theSpectrumDesc
tag of the spectrum, after adding prefix "title=
".
comment:4 Changed 10 years ago by
Design update to allow automatic conversion of peaklist MGF files to mzData files for PRIDE export:
- OMSSA search from Proteios SE already contains automatic conversion of input peaklist files, see Ticket #591 (Workflow for using mzData files with OMSSA), where optionally mzData files are converted to MGF before search, i.e. conversion in the opposite direction than of interest here. The design here will be modeled on the one for OMSSA search.
Update of class/file action/hit/CreatePrideExportJob.java
in client/servlet/
:
- If the peaklist file is an MGF file, a flag is set to create an mzData file, provided a converted file does not already exist. The value of the flag variable is transferred to the created job as a job parameter.
- If the conversion flag is set, a conversion job is created using plug-in class
MgfToMzDataPlugin
. The conversion job is set as blocker to the PRIDE XML export job, ensuring that the former is executed first. - Three private convenience methods are added:
a.boolean convertedMgfToMzDataFileExists(DbControl dc, File spectrumFile)
b.File fetchConvertedMgfToMzDataFile(DbControl dc, File spectrumFile)
c.Job createMgfToMzDataConvertJob(DbControl dc, File spectrumFile)
Update of class/file plugins/PrideExportPlugin.java
in plugin/
:
- For clarity, a new variable
File mzDataFile
is introduced for the file from which the mzData XML block is to be copied, in addition to previous variableFile peakListFile
, which is the file for which peptide identification searches have been made. If the file conversion flag from job parameters is false, the value ofmzDataFile
is set equal topeakListFile
, otherwise new private methodFile fetchConvertedMgfToMzDataFile(ItemFactory factory, File spectrumFile)
is called to retrieve the mzData file. - If an MGF to mzData file conversion has been performed, the latter file is parsed for comments tags with spectrum title strings from the MGF file, in order to create a spectrum string id/spectrum id hash map.
- The input peaklist file is used in references to search hits in the PRIDE XML file, while the mzData file is used to copy the mzData part of the PRIDE XML file from.
- If an MGF to mzData file conversion has been performed, the spectrum id in the PRIDE XML
SpectrumReference
tag is obtained from the created spectrum string id/spectrum id hash map, using the spectrum string id of the search hit in the Hits table as key. - Two private convenience methods are added:
a.File fetchConvertedMgfToMzDataFile(ItemFactory factory, File spectrumFile)
b.private HashMap<String, Integer> createSpectrumStringIdSpectrumIdHashMap(DbControl dc, Integer sourceFileId)
comment:5 Changed 10 years ago by
(In [4479]) Refs #807. PRIDE XML export is updated to support automatic conversion of MGF files to mzData for use in the mzData part of the PRIDE XML file:
Update of class/file action/hit/CreatePrideExportJob.java
in client/servlet/
:
- If the peaklist file is an MGF file, a flag is set to create an mzData file, provided a converted file does not already exist. The value of the flag variable is transferred to the created job as a job parameter.
- If the conversion flag is set, a conversion job is created using plug-in class
MgfToMzDataPlugin
. The conversion job is set as blocker to the PRIDE XML export job, ensuring that the former is executed first. - Three private convenience methods are added:
a.boolean convertedMgfToMzDataFileExists(DbControl dc, File spectrumFile)
b.File fetchConvertedMgfToMzDataFile(DbControl dc, File spectrumFile)
c.Job createMgfToMzDataConvertJob(DbControl dc, File spectrumFile)
Update of class/file plugins/PrideExportPlugin.java
in plugin/
:
- For clarity, a new variable
File mzDataFile
is introduced for the file from which the mzData XML block is to be copied, in addition to previous variableFile peakListFile
, which is the file for which peptide identification searches have been made. If the file conversion flag from job parameters is false, the value ofmzDataFile
is set equal topeakListFile
, otherwise new private methodFile fetchConvertedMgfToMzDataFile(ItemFactory factory, File spectrumFile)
is called to retrieve the mzData file. - If an MGF to mzData file conversion has been performed, the latter file is parsed for comments tags with spectrum title strings from the MGF file, in order to create a spectrum string id/spectrum id hash map.
- The input peaklist file is used in references to search hits in the PRIDE XML file, while the mzData file is used to copy the mzData part of the PRIDE XML file from.
- If an MGF to mzData file conversion has been performed, the spectrum id in the PRIDE XML
SpectrumReference
tag is obtained from the created spectrum string id/spectrum id hash map, using the spectrum string id of the search hit in the Hits table as key. - Two private convenience methods are added:
a.File fetchConvertedMgfToMzDataFile(ItemFactory factory, File spectrumFile)
b.private HashMap<String, Integer> createSpectrumStringIdSpectrumIdHashMap(DbControl dc, Integer sourceFileId)
comment:6 Changed 10 years ago by
comment:7 Changed 10 years ago by
(In [4481]) Refs #807. PRIDE XML export is updated to convert selected cvParam tags from PSI to MS ontology, when copying XML data from mzData file to the mzData part of the PRIDE XML file:
Update of class/file plugins/PrideExportPlugin.java
in plugin/
:
- Public method
void doExport(DbControl dc, File outCoreFile, ProgressReporter progress)
updated to call private method voidcopySelectedXMLBlocks(..., boolean convertPsiToMsOntology)
with value of new argumentboolean convertPsiToMsOntology
set totrue
when copying XML blocks, that may contain the selected cvParam tags. - Private method
copySelectedXMLBlocks(...)
updated with a variant having an extra argumentboolean convertPsiToMsOntology
. If the value of the latter istrue
, new private methodString exchangePsiForMsOntology(String line)
is called to convert selected cvParam tags from PSI to MS ontology. - New private method
String exchangePsiForMsOntology(String line)
added. It converts selected cvParam tags from PSI to MS ontology. Lines that do not contain the selected cvParam tags are returned unmodified.
comment:8 Changed 10 years ago by
(In [4482]) Refs #808. Refs #807. MGF to mzData converter in class/file plugins/MgfToMzDataPlugin.java
in plugin/
updated to ensure that DbControl
instances are closed after use. Private method void writeSpectraData(XMLCrudeWriter2 xmlCWriter, ...)
updated to use existing DbControl
instance, instead of creating new one:
- Public method
void run(Request request, Response response, ProgressReporter progress)
updated by closingDbControl
used to get directory information from input file item. - Public method
void doConvert(InputStream instream, Directory outCoreDir, String filename, ProgressReporter progress)
updated in call of public methodvoid doConvert(... InputStream instream, OutputStream outstream, ProgressReporter progress)
to set new argumentDbControl dc
to existingDbControl
instance. - Public method
void doConvert(InputStream instream, OutputStream outstream, ProgressReporter progress)
updated with new initial argumentDbControl dc
, whose value is used in call of private methodvoid writeSpectraData(... XMLCrudeWriter2 xmlCWriter, List<Double> peakMassData, List<Double> peakIntensityData, boolean mz_double_precision, boolean inten_double_precision)
to set new argumentDbControl dc
to existingDbControl
instance. - Private method
void writeSpectraData(XMLCrudeWriter2 xmlCWriter, List<Double> peakMassData, List<Double> peakIntensityData, boolean mz_double_precision, boolean inten_double_precision)
updated with new initial argumentDbControl dc
, whose value is used instead of creating newDbControl
instance.
comment:9 Changed 10 years ago by
(In [4484]) Refs #808. Refs #807. MGF to mzData converter in class/file plugins/MgfToMzDataPlugin.java
in plugin/
updated in private method void writeMzDataSpectrumDescBlock(XMLCrudeWriter2 xmlCWriter, Float massToChargeRatio, Float intensity, Integer chargeState, Float rtInSeconds, String title)
to put the comments
tag in the SpectrumDesc
tag of the corresponding spectrum in the created mzData file after the spectrumSettings
and precursorList
tags, in order to be valid according to the mzData XSD (XML Schema Description).
comment:10 Changed 10 years ago by
Design update to retrieve instrument part in the mzData XML block in the PRIDE XML file from an mzML file associated with the peaklist file:
- Use of class
XMLCrudeWriter3Impl
instead ofXMLCrudeWriterImpl
for writing XML data, since the former allows the indentation level to be specified explicitly. - New instance variables to store the instrument data retrieved from an associated mzML file.
- New private method
File fetchAlternativeSpectrumFile(ItemFactory factory, File spectrumFile, String alternativeFileExtension)
to fetch an alternative spectrum file. - Private method
void copySelectedXMLBlocks(...)
updated with new argumentint firstLineExtraIndentation
. - New private method
String fetchInstrumentName(File sourceFile)
for retrieving the instrument name from a cvParam tag in "referenceableParamGroup
" XML tag in a source mzML file. - New private method
List<String> fetchInstrumentCvParamList(File sourceFile, String tagName)
for retrieving a list of inside contents of cvParam tags in selected XML tag in a source mzML file. Only data from the last block of the selected XML tag is returned. - New private method
void writeInstrumentBlock(XMLCrudeWriter3Impl xmlCrudeWriter)
to write an mzDatainstrument
XML block with data from instance variables, whose values have been retrieved from an associated mzML file. - Public method
void doExport(DbControl dc, File outCoreFile, ProgressReporter progress)
updated to write instrument data in mzData part of PRIDE XML file from information retrieved from an associated mzML file. First part of method handling different options also rewritten, in order to increase clarity and simplify future additions.
comment:11 Changed 10 years ago by
(In [4486]) Refs #807. PRIDE XML export is updated to retrieve instrument part in the mzData XML block in the PRIDE XML file from an mzML file associated with the peaklist file:
Class/file plugins/MgfToMzDataPlugin.java
in plugin/
update:
- Use of class
XMLCrudeWriter3Impl
instead ofXMLCrudeWriterImpl
for writing XML data, since the former allows the indentation level to be specified explicitly. - New instance variables to store the instrument data retrieved from an associated mzML file.
- New private method
File fetchAlternativeSpectrumFile(ItemFactory factory, File spectrumFile, String alternativeFileExtension)
to fetch an alternative spectrum file. - Private method
void copySelectedXMLBlocks(...)
updated with new argumentint firstLineExtraIndentation
. - New private method
String fetchInstrumentName(File sourceFile)
for retrieving the instrument name from a cvParam tag in "referenceableParamGroup
" XML tag in a source mzML file. - New private method
List<String> fetchInstrumentCvParamList(File sourceFile, String tagName)
for retrieving a list of inside contents of cvParam tags in selected XML tag in a source mzML file. Only data from the last block of the selected XML tag is returned. - New private method
void writeInstrumentBlock(XMLCrudeWriter3Impl xmlCrudeWriter)
to write an mzDatainstrument
XML block with data from instance variables, whose values have been retrieved from an associated mzML file. - Public method
void doExport(DbControl dc, File outCoreFile, ProgressReporter progress)
updated to write instrument data in mzData part of PRIDE XML file from information retrieved from an associated mzML file. First part of method handling different options also rewritten, in order to increase clarity and simplify future additions.
comment:12 Changed 10 years ago by
Design update for adding sample and contact information to PRIDE XML export file:
Class/file plugins/MgfToMzDataPlugin.java
in plugin/
update:
- Public method
void doExport(DbControl dc, File outCoreFile, ProgressReporter progress)
update:
a. Bug fix: Start tag after adding sample information changed from "contact
" to "sourceFile
".
b. First line extra indentation specified for the different cases.
c. Old code that was commented out is now removed. - Private method
void writeContactBlock(...)
updated to use classXMLCrudeWriter3Impl
instead ofXMLCrudeWriterImpl
for writing XML data, since the former allows the indentation level to be specified explicitly. Extra indentation specified. - Private method
void writeSampleBlock(...)
updated to use classXMLCrudeWriter3Impl
instead ofXMLCrudeWriterImpl
for writing XML data, since the former allows the indentation level to be specified explicitly. Extra indentation specified.
comment:13 Changed 10 years ago by
(In [4487]) Refs #807. PRIDE XML export is updated for adding sample and contact information to PRIDE XML export file:
Class/file plugins/MgfToMzDataPlugin.java
in plugin/
update:
- Public method
void doExport(DbControl dc, File outCoreFile, ProgressReporter progress)
update:
a. Bug fix: Start tag after adding sample information changed from "contact
" to "sourceFile
".
b. First line extra indentation specified for the different cases.
c. Old code that was commented out is now removed. - Private method
void writeContactBlock(...)
updated to use classXMLCrudeWriter3Impl
instead ofXMLCrudeWriterImpl
for writing XML data, since the former allows the indentation level to be specified explicitly. Extra indentation specified. - Private method
void writeSampleBlock(...)
updated to use classXMLCrudeWriter3Impl
instead ofXMLCrudeWriterImpl
for writing XML data, since the former allows the indentation level to be specified explicitly. Extra indentation specified.
comment:14 Changed 10 years ago by
comment:15 Changed 10 years ago by
Design update:
- PRIDE XML export should be updated to allow addition of species information in the
sampleDescription
XML tag in the mzData part of the PRIDE XML file. The species information will be entered by the user in the form used to create the PRIDE XML export job. Species ontology, accession number, and name are all required for the information to be added to the PRIDE XML file.
- Class/file
action/hit/PrideExport.java
inclient/servlet/
updated by adding new fields related to species information to the sample section of the form. The fields are coupled to new valid parameters defined in classCreatePrideExportJob
. New private methodFieldset getSampleFieldset()
added to increase code readability. - Class/file
action/hit/SelectPrideProtocolFileStep1a.java
inclient/servlet/
updated to retrieve values of new valid parameters related to species information from the request and saving them as session attributes. - Class/file
action/hit/CreatePrideExportJob.java
inclient/servlet/
updated with new valid parameters related to species information. The values of the parameters are retrieved from session attributes and transferred to the created job as job parameters. - Class/file
plugins/PrideExportPlugin.java
inplugin/
updated to retrieve the values of new variables related to species information from job parameters. Private methodvoid writeSampleBlock(XMLCrudeWriter3Impl xmlCrudeWriter)
updated to write acvParam
tag in thesampleDescription
tag with the species information, provided that this option has been selected and species ontology, accession number, and name have all been specified. - English dictionary file
locale/en/dictionary
inclient/servlet/
updated with new entries for various string keys.
comment:16 Changed 10 years ago by
(In [4489]) Refs #807. PRIDE XML export updated to allow addition of species information in the sampleDescription
XML tag in the mzData part of the PRIDE XML file. The species information will be entered by the user in the form used to create the PRIDE XML export job. Species ontology, accession number, and name are all required for the information to be added to the PRIDE XML file.
- Class/file
action/hit/PrideExport.java
inclient/servlet/
updated by adding new fields related to species information to the sample section of the form. The fields are coupled to new valid parameters defined in classCreatePrideExportJob
. New private methodFieldset getSampleFieldset()
added to increase code readability. - Class/file
action/hit/SelectPrideProtocolFileStep1a.java
inclient/servlet/
updated to retrieve values of new valid parameters related to species information from the request and saving them as session attributes. - Class/file
action/hit/CreatePrideExportJob.java
inclient/servlet/
updated with new valid parameters related to species information. The values of the parameters are retrieved from session attributes and transferred to the created job as job parameters. - Class/file
plugins/PrideExportPlugin.java
inplugin/
updated to retrieve the values of new variables related to species information from job parameters. Private methodvoid writeSampleBlock(XMLCrudeWriter3Impl xmlCrudeWriter)
updated to write acvParam
tag in thesampleDescription
tag with the species information, provided that this option has been selected and species ontology, accession number, and name have all been specified. - English dictionary file
locale/en/dictionary
inclient/servlet/
updated with new entries for various string keys.
comment:17 Changed 10 years ago by
comment:18 Changed 10 years ago by
(In [4491]) Refs #807. PRIDE XML export updated to exclude Hit table entries with score type "Proteios aligned", that are created using "Propagate Feature Sequences":
- Class/file
plugins/PrideExportPlugin.java
inplugin/
updated in private methodItemQuery<Hit> createPeptideHitQuery(...)
to exclude hits with score type "Proteios aligned".
comment:19 Changed 10 years ago by
comment:20 Changed 10 years ago by
comment:21 Changed 10 years ago by
comment:22 Changed 10 years ago by
comment:23 Changed 10 years ago by
(In [4496]) Refs #807. PRIDE XML export updated to exclude Hit table entries with score type "Proteios aligned" when finding peaklist files, for which to create export jobs:
- Class/file
action/hit/CreatePrideExportJob.java
inclient/servlet/
updated in protected methodvoid runMe()
to exclude Hit table entries with score type "Proteios aligned" when finding peaklist files, for which to create export jobs.
comment:24 Changed 10 years ago by
(In [4497]) Refs #807. PRIDE XML export updated to search for an alternative mzML file in other directories in project, if not found in same directory as the peaklist file:
- Class/file
plugins/PrideExportPlugin.java
inplugin/
updated:
a. Public methodvoid run(Request request, Response response, ProgressReporter progress)
updated to call new private methodFile fetchAlternativeSpectrumFileInProject(ItemFactory factory, File spectrumFile, String alternativeFileExtension)
to obtain an alternative mzML file, if none is found in same directory as the peaklist file.
b. New private methodFile fetchAlternativeSpectrumFileInProject(ItemFactory factory, File spectrumFile, String alternativeFileExtension)
added. It searches for an alternative spectrum file in the project. If several exist, the first found is returned.
comment:25 Changed 10 years ago by
(In [4498]) Refs #808. Refs #807. MGF to mzData converter in class/file plugins/MgfToMzDataPlugin.java
in plugin/
updated in public method void doConvert(DbControl dc, InputStream instream, OutputStream outstream, ProgressReporter progress)
to set total intensity for a spectra to 0 if no value exists in spectrum header line starting with "PEPMASS=
".
comment:26 Changed 10 years ago by
comment:27 Changed 10 years ago by
(In [4500]) Refs #807. PRIDE XML export updated to show link to NEWT ontology look-up service in form:
- Class/file
action/hit/PrideExport.java
inclient/servlet/
updated in private methodFieldset getSampleFieldset()
by adding help text with link to NEWT ontology look-up service at http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=NEWT.
comment:28 Changed 9 years ago by
comment:29 Changed 9 years ago by
comment:30 Changed 9 years ago by
comment:31 Changed 9 years ago by
comment:32 Changed 9 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Ticket closed as first version of PRIDE export of MGF-based searches has been added.
Ticket accepted.