Metabolomics data exploration guided by prior knowledge

In metabolomics research, it is often important to focus the data analysis to specific areas of interest within the metabolome. In this paper, we describe the application of consensus principal component analysis (CPCA) and canonical correlation analysis (CCA) as a means to explore the relation between metabolome data and (i) biochemically related metabolites and (ii) an amino acid biosynthesis pathway. CPCA searches for major trends in the behavior of metabolite concentrations that are in common for the metabolites of interest and the remainder of the metabolome. CCA identifies the strongest correlations between the metabolites of interest and the remainder of the metabolome. CPCA and CCA were applied to two different microbial metabolomics data sets. The first data set, derived from Pseudomonas putida S12, was relatively simple as it contained metabolomes obtained under four environmental conditions only. The second data set, obtained from Escherichia coli, was much more complex as it consisted of metabolomes obtained under 28 different environmental conditions. In case of the simple and coherent P. putida S12 data set, CCA and CPCA gave similar results as the variation in the subset of the selected metabolites and the remainder of the metabolome was similar. In contrast, CCA and CPCA yielded different results in case of the E. coli data set. With CPCA the trends in the selected subset--the phenylalanine biosynthesis pathway--dominated the results. The main trends were related to high and low phenylalanine productivity, and the metabolites showing a similar behavior in concentration were metabolites regulating the phenylalanine biosynthesis route in the subset and metabolites related to general amino acid metabolism in the remainder of the metabolome. With CCA, neither subset truly dominated the data analysis. CCA described the differences between the wild type and the overproducing strain and the differences between the succinate and glucose grown cells. For the difference between the wild type and the overproducing strain, metabolites from the beginning and the end of aromatic amino acid pathways like erythrose-4-phosphate, tryptophan, and phenylalanine were important for the selected metabolites. CCA and CPCA proved to be complementary data analysis tools that enable the focusing of the data analysis on groups of metabolites that are of specific interest in relation to the remainder of the metabolome. Compared to an ordinary PCA, focusing the data analysis on biologically relevant metabolites lead especially for the complex E. coli data to a better biological interpretation of the data.

Authors:

R.A. van den Berg, C.M. Rubingh, J.A. Westerhuis, M.J. van der Werf, A.K. Smilde

Authors from the NMC:

Johan Westerhuis

Age Smilde

Publication data (text):

2009

DOI:

10.1016/j.aca.2009.08.029

Pages:

2009; 651 (2): 173-181

Published in:

Analytica Chimica Acta

Date of publication:

October, 2009

Status of the publication:

Published/accepted

Link to publication:

http://www.sciencedirect.com/science/article/pii/S0003267009011416