Metabolite Annotation File (MAF)¶
MAF¶
When you create an assay a corresponding metabolite annotation file will add to your study. Metabolite information should include all metabolites / unknowns / features identified within the study. You should complete the table with as much information as possible. Sample names should be automatically included in the columns to the right of the table (please contact us if not or download & edit in e.g., Excel), in each column sample values per metabolite should be included.
The Metabolite Annotation File (m_MTBLSxxx.tsv) should be referenced in the metabolite assignment file column of each assay. If results are cumulative of multiple assays, enter the same file for each.
Validation Rules¶
-
MAF can not be empty.
-
All MAF should be referenced in the assay.
-
MAF file should readable.
-
"database_identifier" should be the first column.
-
Columns 'database_identifier', 'chemical_formula', 'smiles', 'inchi' and 'metabolite_identification' found in the correct column position.
-
MS/NMR assay name can be found in the MAF.
-
Sample name can be found in the MAF.
-
No empty rows in the MAF.
For comprehensive details on the validation rules that apply to Assays, please visit our GitHub validation-rules docs
LC-MS and GC-MS MAF¶
| Column Name | Description | Example |
|---|---|---|
| database_identifier | Stable external accession for the compound/feature. Recommended, if available |
(e.g., CHEBI:27389) |
| chemical_formula | Molecular formula of the identified compound. Optional |
(plain text, e.g., C4H9NO2) |
| smiles | SMILES string for the compound. Optional |
canonical if available; plain text (e.g., CC(CN)C(O)=O) |
| inchi | Full InChI string for the compound (starting with InChI=; not the InChIKey). Optional |
e.g., InChI=1S/C4H9NO2/c1-3(2-5)4(6)7/h3H,2,5H2,1H3,(H,6,7) |
| metabolite_identification | Human-readable metabolite name or, if unknown, your metabolites/unknowns/features as reported. Required |
e.g., 3-Aminoisobutyric acid, or Feature 001, Feature 002, Unknown1, etc. |
| mass_to_charge | Precursor/Parent/Feature m/z (numeric). Typically the molecular ion. For MS/MS or MSn provide precursor ion m/z. Required |
e.g '104' for 3-Aminoisobutyric acid in [M-H]+ mode |
| fragmentation | Fragment/Daughter/Product m/z (numeric). For MS/MS or MSn provide product ion(s). Recommended, if available (Required for MRM, Precursor Ion Scan) |
e.g. for '3-Aminoisobutyric acid': 86, 87, 69, 58. |
| modifications | Chemical modifications/adducts/derivatisation noted for the feature. Recommended, if available |
e.g., [M+Na]+, [M+K]+, TMS-deriv., etc. |
| charge | Observed charge state for the compound/feature. Recommended, if available |
e.g., [M-H]+, [M-H]-, OR Positive, Negative, OR +1, -1. |
| retention_time | Chromatographic RT used for the assignment (numeric; unit as used by your pipeline). Required |
e.g., 3.15 |
| taxid | NCBI Taxonomy ID for the biological source associated with this identification (if applicable). Optional |
e.g., NCBI:txid9606 |
| species | Species name corresponding to taxid, aligned with your Sample sheet’s organism term. Optional |
e.g., Homo sapiens, Mus musculus, etc. |
| database | Name of the spectral/compound database or library used for the match (text). Field often used to report Compound ID present in secondary database. Optional |
e.g., PubChem Compound identifier (CID:XXXX) |
| database_version | Direct URL/URI to the external record or evidence for the identification (resolvable link). Optional |
e.g., https://pubchem.ncbi.nlm.nih.gov/ |
| reliability | Qualitative confidence level/category for the identification or Metabolomics Standards Initiative (MSI) level (see https://doi.org/10.1021/es5002105). Recommended, if available |
e.g., 'MSI:2', 'MSI:1' etc. |
| uri | Direct URL/URI to the external record or evidence for the identification (resolvable link). Optional |
--- |
| search_engine | Name (and optional version) of the software/tool used for the identification or match. Optional |
e.g., library search, MZmine, XCMS, etc. |
| search_engine_score | Primary score reported by the search tool (numeric); specify score type in your methods/protocol. Optional |
--- |
| smallmolecule_abundance_sub | Aggregated abundance for the metabolite (e.g., subject/condition-level summary). If reporting per-sample values, put those in the auto-added sample columns on the right and leave this blank. Optional |
--- |
| smallmolecule_abundance_stdev_sub | Standard deviation corresponding to smallmolecule_abundance_sub (numeric). Optional |
--- |
| smallmolecule_abundance_std_error_sub | Standard error corresponding to smallmolecule_abundance_sub (numeric). Optional |
--- |
| SampleName1 | Quantification value (e.g. concentration, AUC, etc.). Recommended, if available |
--- |
| SampleName2 | Quantification value (e.g. concentration, AUC, etc.). Recommended, if available |
--- |
NMR MAF¶
| Column Name | Description | Example |
|---|---|---|
| database_identifier | Stable external accession for the compound/feature. Recommended, if available |
(e.g., CHEBI:27389) |
| chemical_formula | Molecular formula of the identified compound. Optional |
(plain text, e.g., C4H9NO2) |
| smiles | SMILES string for the compound. Optional |
canonical if available; plain text (e.g., CC(CN)C(O)=O) |
| inchi | Full InChI string for the compound (starting with InChI=; not the InChIKey). Optional |
e.g., InChI=1S/C4H9NO2/c1-3(2-5)4(6)7/h3H,2,5H2,1H3,(H,6,7) |
| metabolite_identification | Human-readable metabolite name or, if unknown, your metabolites/unknowns/features as reported. Required |
e.g., 3-Aminoisobutyric acid, or Feature 001, Feature 002, Unknown1, etc. |
| chemical_shift | Electronic environment of nuclei relative to a reference standard like tetramethylsilene (TMS). Reported in parts per million (ppm; numerical value). Recommended, if available |
e.g., 4.16 |
| multiplicity | Indicates the splitting or coupling of an NMR signal, reflecting the number of adjacent hydrogens. Recommended, if available |
e.g., 1-H, 2-H, 3-H, etc. Other reporting options include 'Singlet' or 's'; 'Doublet' or 'd'; 'Triplet' or 't'; 'Quartet' or 'q'; 'Multiplet' or 'm'; 'Doublet of doublets' or 'dd'; etc. |
| taxid | NCBI Taxonomy ID for the biological source associated with this identification (if applicable). Optional |
e.g., NCBI:txid9606 |
| species | Species name corresponding to taxid, aligned with your Sample sheet’s organism term. Optional |
e.g., Homo sapiens, Mus musculus, etc. |
| database | Name of the spectral/compound database or library used for the match (text). Field often used to report Compound ID present in secondary database. Optional |
e.g., PubChem Compound identifier (CID:XXXX) |
| database_version | Direct URL/URI to the external record or evidence for the identification (resolvable link). Optional |
e.g., https://pubchem.ncbi.nlm.nih.gov/ |
| reliability | Qualitative confidence level/category for the identification or Metabolomics Standards Initiative (MSI) level (see https://doi.org/10.1021/es5002105). Recommended, if available |
e.g., 'MSI:2', 'MSI:1' etc. |
| uri | Direct URL/URI to the external record or evidence for the identification (resolvable link). Optional |
--- |
| search_engine | Name (and optional version) of the software/tool used for the identification or match. Optional |
e.g., library search, MZmine, XCMS, etc. |
| search_engine_score | Primary score reported by the search tool (numeric); specify score type in your methods/protocol. Optional |
--- |
| smallmolecule_abundance_sub | Aggregated abundance for the metabolite (e.g., subject/condition-level summary). If reporting per-sample values, put those in the auto-added sample columns on the right and leave this blank. Optional |
--- |
| smallmolecule_abundance_stdev_sub | Standard deviation corresponding to smallmolecule_abundance_sub (numeric). Optional |
--- |
| smallmolecule_abundance_std_error_sub | Standard error corresponding to smallmolecule_abundance_sub (numeric). Optional |
--- |
| SampleName1 | Quantification value (e.g. concentration, AUC, etc.). Recommended, if available |
--- |
| SampleName2 | Quantification value (e.g. concentration, AUC, etc.). Recommended, if available |
--- |