#808 closed (fixed)
MGF to mzData Converter
Reported by: | olle | Owned by: | olle |
---|---|---|---|
Milestone: | Proteios SE 2.19.0 | Keywords: | |
Cc: |
Description
A plug-in for converting MGF files to mzData files could be useful in cases where one needs a spectrum file in mzData format, but only an MGF file is available.
Change History (17)
comment:1 Changed 10 years ago by
Status: | new → assigned |
---|
comment:2 Changed 10 years ago by
Design description:
- MGF (Mascot Generic Format) stored data in a plain non-XML text format, where each spectra begins with a number of header lines starting with a "
BEGIN IONS
" line, followed by lines of mass peak data value pairs for m/z and intensity values, and ending with an "END IONS
" line. The mass peak lines are similar to those in PKL files, where each spectra begins with a line with base peak mass, total intensity, and charge, followed by lines of mass peak data value pairs for m/z and intensity values, and ending with an empty line (except for the last spectra). The converter will therefore be based on the existing "PKL to mzData" converter plug-in developed in Ticket #188 (Batch conversion of peaklists).
Modifications made for the MGF to mzData converter, other than those needed to parse the MGF file:
- Values from spectrum header line starting with "
RTINSECONDS=
" are added in cvParam tag "TimeInSeconds
" with PSI accession number 1000039 tospectrumInstrument
tag. - Values for base peak mass and total intensity are obtained from spectrum header line starting with "
PEPMASS=
". - Values for charge state is obtained from spectrum header line starting with "
CHARGE=
". Note that in MGF files the charge sign is added as suffix, e.g. "CHARGE=2+
". - Peak m/z float values were allowed to skip the decimal dot, e.g. "
646.0000000
" could be represented by "646
". - The name of the converted file is constructed by adding "
.mzData
" to the input filename, including the file extension ".mgf
". The converted file will therefore have file extension ".mgf.mzData
", clearly indicating that the file was converted from an MGF file.
comment:3 Changed 10 years ago by
comment:4 Changed 10 years ago by
Design update:
- One of the main purposes of this converter is to be able to create a PRIDE XML file, when the peak list file is in MGF instead of mzData format, that is supported by PRIDE XML. Since the current PRIDE XML standard prefers the MS ontology over of the old PSI ontology, the former ontology should preferably be used in the mzData file.
Currently four PSI ontology terms occur in the MGF to mzData converter, where all accession numbers exist in both the PSI and MS ontology:
PSI Term | PSI Name | MS Term | MS Name | Comment |
PSI:1000039 | TimeInSeconds | MS:1000039 | second | This MS term is obsolete, as it has been replaced by MS:1000016 , "scan time "
|
PSI:1000040 | MassToChargeRatio | MS:1000040 | m/z | |
PSI:1000041 | ChargeState | MS:1000041 | charge state | |
PSI:1000042 | Intensity | MS:1000042 | peak intensity |
Simple replacing the obsolete term MS:1000039
with MS:1000016
leads to problems, as both terms PSI:1000038
, "TimeInMinutes
", and PSI:1000039
, "TimeInSeconds
", correspond to MS:1000016
, "scan time
", and need to be supplemented with a unit accession number, in order to be unambiguous. However, it is not clear if an mzData cvParam
tag accepts separate unit specifications (mzData 1.05 specification did not). Until this has been resolved, the obsolete term MS:1000039
, "second
", will be used as replacement for PSI:1000039
, "TimeInSeconds
".
comment:5 Changed 10 years ago by
(In [4477]) Refs #808. MGF to mzData converter in class/file plugins/MgfToMzDataPlugin.java
in plugin/
updated to use MS instead of PSI ontology in cvParam
tags. PSI term PSI:1000039
, "TimeInSeconds
" is exchanged for the obsolete MS term MS:1000039
, "second
", instead of the preferred term MS:1000016
, "scan time
", since the latter is ambiguous without a unit accession number.
comment:6 Changed 10 years ago by
Design update:
- Each spectra in the MGF file may have a title line with prefix "
TITLE=
" in the header lines block. Since the order of spectra in the MGF files does not always correspond to the "Spectrum ID" in the Hits table, the title string should be added to acomments
tag in theSpectrumDesc
tag of the created mzData file, in order to unambiguously identifying the spectra in the latter. The title string will be prefixed by "title=
" in thecomments
tag.
comment:7 Changed 10 years ago by
(In [4478]) Refs #808. Refs #807. MGF to mzData converter in class/file plugins/MgfToMzDataPlugin.java
in plugin/
updated to add the contents of the TITLE=
header line in the MGF file to a comments
tag in the SpectrumDesc
tag of the corresponding spectrum in the created mzData file. The title string will be prefixed by "title=
" in the comments tag:
- Public method
void doConvert(InputStream instream, OutputStream outstream, ProgressReporter progress)
updated to read the contents of theTITLE=
header line in the MGF file and transfer it to private methodvoid writeMzDataSpectrumDescBlock(...)
through new argumentString title
. - Private method
void writeMzDataSpectrumDescBlock(...)
updated with new argumentString title
. It will add the title to acomments
tag in theSpectrumDesc
tag of the spectrum, after adding prefix "title=
".
comment:8 Changed 10 years ago by
comment:9 Changed 10 years ago by
(In [4482]) Refs #808. Refs #807. MGF to mzData converter in class/file plugins/MgfToMzDataPlugin.java
in plugin/
updated to ensure that DbControl
instances are closed after use. Private method void writeSpectraData(XMLCrudeWriter2 xmlCWriter, ...)
updated to use existing DbControl
instance, instead of creating new one:
- Public method
void run(Request request, Response response, ProgressReporter progress)
updated by closingDbControl
used to get directory information from input file item. - Public method
void doConvert(InputStream instream, Directory outCoreDir, String filename, ProgressReporter progress)
updated in call of public methodvoid doConvert(... InputStream instream, OutputStream outstream, ProgressReporter progress)
to set new argumentDbControl dc
to existingDbControl
instance. - Public method
void doConvert(InputStream instream, OutputStream outstream, ProgressReporter progress)
updated with new initial argumentDbControl dc
, whose value is used in call of private methodvoid writeSpectraData(... XMLCrudeWriter2 xmlCWriter, List<Double> peakMassData, List<Double> peakIntensityData, boolean mz_double_precision, boolean inten_double_precision)
to set new argumentDbControl dc
to existingDbControl
instance. - Private method
void writeSpectraData(XMLCrudeWriter2 xmlCWriter, List<Double> peakMassData, List<Double> peakIntensityData, boolean mz_double_precision, boolean inten_double_precision)
updated with new initial argumentDbControl dc
, whose value is used instead of creating newDbControl
instance.
comment:10 Changed 10 years ago by
(In [4483]) Refs #808. MGF to mzData converter in class/file plugins/MgfToMzDataPlugin.java
in plugin/
updated in public method void doConvert(DbControl dc, InputStream instream, OutputStream outstream, ProgressReporter progress)
to only write log message with value of numberOfSpectra
at end of conversion, instead of for each spectra.
comment:11 Changed 10 years ago by
(In [4484]) Refs #808. Refs #807. MGF to mzData converter in class/file plugins/MgfToMzDataPlugin.java
in plugin/
updated in private method void writeMzDataSpectrumDescBlock(XMLCrudeWriter2 xmlCWriter, Float massToChargeRatio, Float intensity, Integer chargeState, Float rtInSeconds, String title)
to put the comments
tag in the SpectrumDesc
tag of the corresponding spectrum in the created mzData file after the spectrumSettings
and precursorList
tags, in order to be valid according to the mzData XSD (XML Schema Description).
comment:12 Changed 10 years ago by
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
Ticket closed as the requested functionality has been added.
comment:13 Changed 10 years ago by
(In [4498]) Refs #808. Refs #807. MGF to mzData converter in class/file plugins/MgfToMzDataPlugin.java
in plugin/
updated in public method void doConvert(DbControl dc, InputStream instream, OutputStream outstream, ProgressReporter progress)
to set total intensity for a spectra to 0 if no value exists in spectrum header line starting with "PEPMASS=
".
Ticket accepted.