Skip to content

Metabolite Annotation File (MAF)

MAF

When you create an assay a corresponding metabolite annotation file will add to your study. Metabolite information should include all metabolites / unknowns / features identified within the study. You should complete the table with as much information as possible. Sample names should be automatically included in the columns to the right of the table (please contact us if not or download & edit in e.g., Excel), in each column sample values per metabolite should be included.

The Metabolite Annotation File (m_MTBLSxxx.tsv) should be referenced in the metabolite assignment file column of each assay. If results are cumulative of multiple assays, enter the same file for each.

Validation Rules

  • MAF can not be empty.

  • All MAF should be referenced in the assay.

  • MAF file should readable.

  • "database_identifier" should be the first column.

  • Columns 'database_identifier', 'chemical_formula', 'smiles', 'inchi' and 'metabolite_identification' found in the correct column position.

  • MS/NMR assay name can be found in the MAF.

  • Sample name can be found in the MAF.

  • No empty rows in the MAF.

For comprehensive details on the validation rules that apply to Assays, please visit our GitHub validation-rules docs

LC-MS and GC-MS MAF

Column Name Description Example
database_identifier Stable external accession for the compound/feature. Recommended, if available (e.g., CHEBI:27389)
chemical_formula Molecular formula of the identified compound. Optional (plain text, e.g., C4H9NO2)
smiles SMILES string for the compound. Optional canonical if available; plain text (e.g., CC(CN)C(O)=O)
inchi Full InChI string for the compound (starting with InChI=; not the InChIKey). Optional e.g., InChI=1S/C4H9NO2/c1-3(2-5)4(6)7/h3H,2,5H2,1H3,(H,6,7)
metabolite_identification Human-readable metabolite name or, if unknown, your metabolites/unknowns/features as reported. Required e.g., 3-Aminoisobutyric acid, or Feature 001, Feature 002, Unknown1, etc.
mass_to_charge Precursor/Parent/Feature m/z (numeric). Typically the molecular ion. For MS/MS or MSn provide precursor ion m/z. Required e.g '104' for 3-Aminoisobutyric acid in [M-H]+ mode
fragmentation Fragment/Daughter/Product m/z (numeric). For MS/MS or MSn provide product ion(s). Recommended, if available (Required for MRM, Precursor Ion Scan) e.g. for '3-Aminoisobutyric acid': 86, 87, 69, 58.
modifications Chemical modifications/adducts/derivatisation noted for the feature. Recommended, if available e.g., [M+Na]+, [M+K]+, TMS-deriv., etc.
charge Observed charge state for the compound/feature. Recommended, if available e.g., [M-H]+, [M-H]-, OR Positive, Negative, OR +1, -1.
retention_time Chromatographic RT used for the assignment (numeric; unit as used by your pipeline). Required e.g., 3.15
taxid NCBI Taxonomy ID for the biological source associated with this identification (if applicable). Optional e.g., NCBI:txid9606
species Species name corresponding to taxid, aligned with your Sample sheet’s organism term. Optional e.g., Homo sapiens, Mus musculus, etc.
database Name of the spectral/compound database or library used for the match (text). Field often used to report Compound ID present in secondary database. Optional e.g., PubChem Compound identifier (CID:XXXX)
database_version Direct URL/URI to the external record or evidence for the identification (resolvable link). Optional e.g., https://pubchem.ncbi.nlm.nih.gov/
reliability Qualitative confidence level/category for the identification or Metabolomics Standards Initiative (MSI) level (see https://doi.org/10.1021/es5002105). Recommended, if available e.g., 'MSI:2', 'MSI:1' etc.
uri Direct URL/URI to the external record or evidence for the identification (resolvable link). Optional ---
search_engine Name (and optional version) of the software/tool used for the identification or match. Optional e.g., library search, MZmine, XCMS, etc.
search_engine_score Primary score reported by the search tool (numeric); specify score type in your methods/protocol. Optional ---
smallmolecule_abundance_sub Aggregated abundance for the metabolite (e.g., subject/condition-level summary). If reporting per-sample values, put those in the auto-added sample columns on the right and leave this blank. Optional ---
smallmolecule_abundance_stdev_sub Standard deviation corresponding to smallmolecule_abundance_sub (numeric). Optional ---
smallmolecule_abundance_std_error_sub Standard error corresponding to smallmolecule_abundance_sub (numeric). Optional ---
SampleName1 Quantification value (e.g. concentration, AUC, etc.). Recommended, if available ---
SampleName2 Quantification value (e.g. concentration, AUC, etc.). Recommended, if available ---

NMR MAF

Column Name Description Example
database_identifier Stable external accession for the compound/feature. Recommended, if available (e.g., CHEBI:27389)
chemical_formula Molecular formula of the identified compound. Optional (plain text, e.g., C4H9NO2)
smiles SMILES string for the compound. Optional canonical if available; plain text (e.g., CC(CN)C(O)=O)
inchi Full InChI string for the compound (starting with InChI=; not the InChIKey). Optional e.g., InChI=1S/C4H9NO2/c1-3(2-5)4(6)7/h3H,2,5H2,1H3,(H,6,7)
metabolite_identification Human-readable metabolite name or, if unknown, your metabolites/unknowns/features as reported. Required e.g., 3-Aminoisobutyric acid, or Feature 001, Feature 002, Unknown1, etc.
chemical_shift Electronic environment of nuclei relative to a reference standard like tetramethylsilene (TMS). Reported in parts per million (ppm; numerical value). Recommended, if available e.g., 4.16
multiplicity Indicates the splitting or coupling of an NMR signal, reflecting the number of adjacent hydrogens. Recommended, if available e.g., 1-H, 2-H, 3-H, etc. Other reporting options include 'Singlet' or 's'; 'Doublet' or 'd'; 'Triplet' or 't'; 'Quartet' or 'q'; 'Multiplet' or 'm'; 'Doublet of doublets' or 'dd'; etc.
taxid NCBI Taxonomy ID for the biological source associated with this identification (if applicable). Optional e.g., NCBI:txid9606
species Species name corresponding to taxid, aligned with your Sample sheet’s organism term. Optional e.g., Homo sapiens, Mus musculus, etc.
database Name of the spectral/compound database or library used for the match (text). Field often used to report Compound ID present in secondary database. Optional e.g., PubChem Compound identifier (CID:XXXX)
database_version Direct URL/URI to the external record or evidence for the identification (resolvable link). Optional e.g., https://pubchem.ncbi.nlm.nih.gov/
reliability Qualitative confidence level/category for the identification or Metabolomics Standards Initiative (MSI) level (see https://doi.org/10.1021/es5002105). Recommended, if available e.g., 'MSI:2', 'MSI:1' etc.
uri Direct URL/URI to the external record or evidence for the identification (resolvable link). Optional ---
search_engine Name (and optional version) of the software/tool used for the identification or match. Optional e.g., library search, MZmine, XCMS, etc.
search_engine_score Primary score reported by the search tool (numeric); specify score type in your methods/protocol. Optional ---
smallmolecule_abundance_sub Aggregated abundance for the metabolite (e.g., subject/condition-level summary). If reporting per-sample values, put those in the auto-added sample columns on the right and leave this blank. Optional ---
smallmolecule_abundance_stdev_sub Standard deviation corresponding to smallmolecule_abundance_sub (numeric). Optional ---
smallmolecule_abundance_std_error_sub Standard error corresponding to smallmolecule_abundance_sub (numeric). Optional ---
SampleName1 Quantification value (e.g. concentration, AUC, etc.). Recommended, if available ---
SampleName2 Quantification value (e.g. concentration, AUC, etc.). Recommended, if available ---