Automated procedure for candidate compound selection in GC-MS metabolomics based on prediction of Kovats retention index

MOTIVATION:

Matching both the retention index (RI) and the mass spectrum of an unknown compound against a mass spectral reference library provides strong evidence for a correct identification of that compound. Data on retention indices are, however, available for only a small fraction of the compounds in such libraries. We propose a quantitative structure-RI model that enables the ranking and filtering of putative identifications of compounds for which the predicted RI falls outside a predefined window.

RESULTS:

We constructed multiple linear regression and support vector regression (SVR) models using a set of descriptors obtained with a genetic algorithm as variable selection method. The SVR model is a significant improvement over previous models built for structurally diverse compounds as it covers a large range (360-4100) of RI values and gives better prediction of isomer compounds. The hit list reduction varied from 41% to 60% and depended on the size of the original hit list. Large hit lists were reduced to a greater extend compared with small hit lists.

Authors: 
V.V. Mihaleva, H.A. Verhoeven, R.C.H. de Vos, R.D. Hall, R.C. van Ham
Publication data (text): 
2009
DOI: 
10.1093/bioinformatics/btp056
Pages: 
2009; 25 (6): 787-794
Publisher: 
OUP
Published in: 
Bioinformatics
Date of publication: 
March, 2009
Status of the publication: 
Published/accepted
Source: 
Centre for BioSystems Genomics