Home  · Classes  · Annotated Classes  · Modules  · Members  · Namespaces  · Related Pages
IDFileConverter

Converts identification engine file formats.

potential predecessor tools $ \longrightarrow $ IDFileConverter $ \longrightarrow $ potential successor tools
TPP tools: PeptideProphet, ProteinProphet TPP tools: ProteinProphet
(for conversion from idXML to pepXML)
Sequest protein identification engine

IDFileConverter can be used to convert identification results from external tools/pipelines (like TPP, Sequest, Mascot, OMSSA, X! Tandem) into other (OpenMS-specific) formats. For search engine results, it might be advisable to use the respective TOPP Adapters (e.g. OMSSAAdapter) to avoid the extra conversion step.

The most simple format accepted is '.tsv': A tab separated text file, which contains one or more peptide sequences per line. Each line represents one spectrum, i.e. is stored as a PeptideIdentification with one or more PeptideHits. Lines starting with "#" are ignored by the parser.

Conversion from the TPP file formats pepXML and protXML to OpenMS' idXML is quite comprehensive, to the extent that the original data can be represented in the simpler idXML format.

In contrast, support for converting from idXML to pepXML is limited. The purpose here is simply to create pepXML files containing the relevant information for the use of ProteinProphet.

Support for conversion to/from mzIdentML (.mzid) is still experimental and may lose information.

Details on additional parameters:

mz_file:
Some search engine output files (like pepXML, mascotXML, Sequest .out files) may not contain retention times, only scan numbers. To be able to look up the actual RT values, the raw file has to be provided using the parameter mz_file. (If the identification results should be used later to annotate feature maps or consensus maps, it is critical that they contain RT values. See also IDMapper.)

mz_name:
pepXML files can contain results from multiple experiments. However, the idXML format does not support this. The mz_name parameter (or mz_file, if given) thus serves to define what parts to extract from the pepXML.

scan_regex:
For Mascot results exported to XML, the scan numbers (used to look up retention times using mz_file) should be given in the "pep_scan_title" XML elements, but the format can vary. If the defaults fail to extract the scan numbers, a Perl-style regular expression can be given through the advanced parameter scan_regex, and will be used instead. The regular expression should contain a named group "SCAN" matching the scan number or "RT" matching the actual retention time. For example, if the format of the "pep_scan_title" elements is "scan=123", where 123 is the scan number, the expression "scan=(?<SCAN>\\d+)" can be used to extract the number. (However, the format in this example is actually covered by the defaults.)
For Percolator tab-delimited output, information is extracted from the "PSMId" column. By default, extraction of scan numbers and charge states is supported for MS-GF+ Percolator results (retention times and precursor m/z values can then be looked up in the raw data via mz_file). In a user-defined regular expression, the named groups "SCAN" (scan number), "CHARGE" (charge state), "RT" (retention time) and "MZ" (precursor m/z) are supported. The parameter count_from_zero defines whether scans are counted from zero or from one (default) in the number extracted via "SCAN". If "CHARGE", "RT" and "MZ" are present, it is not necessary to look up any information in the raw data, so mz_file is not needed.

Some information about the supported input types: mzIdentML pepXML protXML idXML mascotXML omssaXML XTandem.xml Sequest .out directory Percolator tab-delimited output

The command line parameters of this tool are:

IDFileConverter -- Converts identification engine file formats.
Version: 2.0.0 Aug 19 2015, 22:19:33, Revision: GIT-NOTFOUND

Usage:
  IDFileConverter <options>

Options (mandatory options marked with '*'):
  -in <path/file>*           Input file or directory containing the data to convert. This may be:
                             - a single file in a multi-purpose XML format (pepXML, protXML, idXML, mzid),
                             - a single file in a search engine-specific format (Mascot: mascotXML, OMSSA: o
                             mssaXML, X! Tandem: xml, Percolator: psms),
                             - a single text file (tab separated) with one line for all peptide sequences ma
                             tching a spectrum (top N hits),
                             - for Sequest results, a directory containing .out files.
                             (valid formats: 'pepXML', 'protXML', 'mascotXML', 'omssaXML', 'xml', 'psms', '
                             tsv', 'idXML', 'mzid')
  -out <file>*               Output file (valid formats: 'idXML', 'mzid', 'pepXML', 'FASTA')
  -out_type <type>           Output file type (default: determined from file extension) (valid: 'idXML', 'mzi
                             d', 'pepXML', 'FASTA')
                             
  -mz_file <file>            [pepXML, Sequest, Mascot, X! Tandem, Percolator only] Retention times will be 
                             looked up in this file (valid formats: 'mzML', 'mzXML', 'mzData')
                             
  -mz_name <file>            [pepXML only] Experiment filename/path (extension will be removed) to match in 
                             the pepXML file ('base_name' attribute). Only necessary if different from 'mz_fi
                             le'.
  -use_precursor_data        [pepXML only] Use precursor RTs (and m/z values) from 'mz_file' for the generate
                             d peptide identifications, instead of the RTs of MS2 spectra.
  -peptideprophet_analyzed   [pepXML output only] Write output in the format of a PeptideProphet analysis 
                             result. By default a 'raw' pepXML is produced that contains only search engine
                             results.
  -score_type <choice>       [Percolator only] Which of the Percolator scores to report as 'the' score for a 
                             peptide hit (default: 'qvalue' valid: 'qvalue', 'PEP', 'score')
                             
Common TOPP options:
  -ini <file>                Use the given TOPP INI file
  -threads <n>               Sets the number of threads allowed to be used by the TOPP tool (default: '1')
  -write_ini <file>          Writes the default configuration file
  --help                     Shows options
  --helphelp                 Shows all options (including advanced)

INI file documentation of this tool:

Legend:
required parameter
advanced parameter
+IDFileConverterConverts identification engine file formats.
version2.0.0 Version of the tool that generated this parameters file.
++1Instance '1' section for 'IDFileConverter'
in Input file or directory containing the data to convert. This may be:
- a single file in a multi-purpose XML format (pepXML, protXML, idXML, mzid),
- a single file in a search engine-specific format (Mascot: mascotXML, OMSSA: omssaXML, X! Tandem: xml, Percolator: psms),
- a single text file (tab separated) with one line for all peptide sequences matching a spectrum (top N hits),
- for Sequest results, a directory containing .out files.
input file*.pepXML,*.protXML,*.mascotXML,*.omssaXML,*.xml,*.psms,*.tsv,*.idXML,*.mzid
out Output fileoutput file*.idXML,*.mzid,*.pepXML,*.FASTA
out_type Output file type (default: determined from file extension)idXML,mzid,pepXML,FASTA
mz_file [pepXML, Sequest, Mascot, X! Tandem, Percolator only] Retention times will be looked up in this fileinput file*.mzML,*.mzXML,*.mzData
mz_name [pepXML only] Experiment filename/path (extension will be removed) to match in the pepXML file ('base_name' attribute). Only necessary if different from 'mz_file'.
use_precursor_datafalse [pepXML only] Use precursor RTs (and m/z values) from 'mz_file' for the generated peptide identifications, instead of the RTs of MS2 spectra.true,false
peptideprophet_analyzedfalse [pepXML output only] Write output in the format of a PeptideProphet analysis result. By default a 'raw' pepXML is produced that contains only search engine results.true,false
score_typeqvalue [Percolator only] Which of the Percolator scores to report as 'the' score for a peptide hitqvalue,PEP,score
ignore_proteins_per_peptidefalse [Sequest only] Workaround to deal with .out files that contain e.g. "+1" in references column,
but do not list extra references in subsequent lines (try -debug 3 or 4)
true,false
scan_regex [Mascot, Percolator only] Regular expression used to extract the scan number or retention time. See documentation for details.
count_from_zerofalse [Percolator only] Scan numbers extracted by 'scan_regex' start counting at zero (default: start at one).true,false
log Name of log file (created only when specified)
debug0 Sets the debug level
threads1 Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse Disables progress logging to command linetrue,false
forcefalse Overwrite tool specific checks.true,false
testfalse Enables the test mode (needed for internal use only)true,false

OpenMS / TOPP release 2.0.0 Documentation generated on Thu Aug 20 2015 01:44:31 using doxygen 1.8.9.1