Tool to estimate the false discovery rate on peptide and protein level
This TOPP tool can calculate the false discovery rate (FDR) given a forward and backward search (or one run on a combined database). Most useful is this on protein level, however, it also can be applied to peptides.
The false discovery rate is defined as the number of false discoveries (the hits in the reversed search) over the number of false and correct discoveries (the hits in both databases) with a score better than a given threshold.
Prerequisites:
- When using a combined database of forward and reverse hits (thus only using one search run per ID engine), then use PeptideIndexer to index an idXML file generated by a search engine adapter, e.g. MascotAdapter. This will allow us to discern which peptides are from the target vs. decoy database.
- Note
- When no decoy hits were found you will get a warning saying something like:
"FalseDiscoveryRate: #decoy sequences is zero! Setting all target sequences to q-value/FDR 0!"
This should be a serious concern, since the target/decoy annotation in a previous step has probably a misconfigured database (see PeptideIndexer).
-
Currently mzIdentML (mzid) is not directly supported as an input/output format of this tool. Convert mzid files to/from idXML using IDFileConverter if necessary.
The command line parameters of this tool are:
FalseDiscoveryRate -- Estimates the false discovery rate on peptide and protein level using decoy searches.
Version: 2.0.0 Aug 19 2015, 22:19:33, Revision: GIT-NOTFOUND
Usage:
FalseDiscoveryRate <options>
This tool has algorithm parameters that are not shown here! Please check the ini file for a detailed descript
ion or use the --helphelp option.
Options (mandatory options marked with '*'):
-in <file> Identification input file which contains a search against a concatenated sequence databa
se. Either specify '-in' alone or 'fwd_in' together with 'rev_in' as input. (valid forma
ts: 'idXML')
-fwd_in <file> Identification input to estimate FDR, forward run. (valid formats: 'idXML')
-rev_in <file> Identification input to estimate FDR, decoy run. (valid formats: 'idXML')
-out <file>* Identification output with annotated FDR (valid formats: 'idXML')
-proteins_only If set, the FDR of the proteins only is calculated
-peptides_only If set, the FDR of the peptides only is calculated
Common TOPP options:
-ini <file> Use the given TOPP INI file
-threads <n> Sets the number of threads allowed to be used by the TOPP tool (default: '1')
-write_ini <file> Writes the default configuration file
--help Shows options
--helphelp Shows all options (including advanced)
The following configuration subsections are valid:
- algorithm Parameter section for the FDR calculation algorithm
You can write an example INI file using the '-write_ini' option.
Documentation of subsection parameters can be found in the doxygen documentation or the INIFileEditor.
Have a look at the OpenMS documentation for more information.
INI file documentation of this tool:
Legend:
required parameter
advanced parameter
+FalseDiscoveryRateEstimates the false discovery rate on peptide and protein level using decoy searches.
version2.0.0
Version of the tool that generated this parameters file.
++1Instance '1' section for 'FalseDiscoveryRate'
in
Identification input file which contains a search against a concatenated sequence database. Either specify '-in' alone or 'fwd_in' together with 'rev_in' as input.input file*.idXML
fwd_in
Identification input to estimate FDR, forward run.input file*.idXML
rev_in
Identification input to estimate FDR, decoy run.input file*.idXML
out
Identification output with annotated FDRoutput file*.idXML
proteins_onlyfalse
If set, the FDR of the proteins only is calculatedtrue,false
peptides_onlyfalse
If set, the FDR of the peptides only is calculatedtrue,false
log
Name of log file (created only when specified)
debug0
Sets the debug level
threads1
Sets the number of threads allowed to be used by the TOPP tool
no_progressfalse
Disables progress logging to command linetrue,false
forcefalse
Overwrite tool specific checks.true,false
testfalse
Enables the test mode (needed for internal use only)true,false
+++algorithmParameter section for the FDR calculation algorithm
q_valuetrue
If 'true', the q-values will be calculated instead of the FDRstrue,false
use_all_hitsfalse
If 'true' not only the first hit, but all are used (peptides only)true,false
split_charge_variantsfalse
If set to 'true' charge variants are treated separately (for peptides of combined target/decoy searches only).true,false
treat_runs_separatelyfalse
If set to 'true' different search runs are treated separately (for peptides of combined target/decoy searches only).true,false
decoy_string_rev
String which is appended at the accession of the protein to indicate that it is a decoy protein (for proteins only).
add_decoy_peptidesfalse
If set to true, decoy peptides will be written to output file, too. The q-value is set to the closest target score.true,false