Data fusion of complex-structured metabolomics data

In metabolomics, analytical platforms such as a GCMS or LCMS are able to measure many metabolites in biological samples (e.g. urine or blood plasma), but various error sources related to that complex analytical platform contribute to the measurement error of each metabolite analyzed. In this research, measurement error (intra-assay accuracy or analytical repeatability) refers to the error associated with the repeated analysis of the same sample within the same assay.  Different metabolites may have measurement errors that correlate with each other. Nowadays multiple platforms are used together in order to analyze as many as possible different metabolites belonging to the same sample. The integration of data from these platforms (data fusion) does not account for platform-related measurement errors such that the fused data may not give a reliable quantification of the metabolites’ responses in a sample. This requires a new data fusion approach in which within- and between-platform variation in analytical repeatability is properly accounted for. In this project, we first developed new figures of merit to quantify the measurement errors of each platform. These figures have been applied to GC/MS data to show their potential for testing preprocessing tools. The results of this work found their way in a joint publication with the University of Amsterdam, TNO and Wageningen UR. On the basis of these figures, a Maximum-Likelihood Principal Components Analysis (MLPCA) for exploratory research has been developed that fully accounts for the platform-specific and metabolite-specific structure in analytical repeatability. This approach is superior to existing ones in which analytical repeatability is not fully accounted for. In addition, we developed a statistical multi-laboratory data fusion approach, and applied this approach to replicate gene expression data of different tissues to identify tissue-preferential chromosomal modules of genes. We validated these modules using Gene Ontology databases and gene expression data of different tissues.

Main project title: 
Analysing complex-structured Metabolomics data
PD 1-1-09
Principal Investigator: 
Code 1: 
Code 2: 
Status Project Proposal: 
Location 1: