Modelling relationships between multiple data matrices from different origins

High-level data fusion in classification studies is frequently done in order to improve classification performance for predictive purposes. It is not clear, however, when two predictor set do add information to each other such that predictive performance improves. This project shows when and how much the predictive performance increases if two classifiers are joined.

Sensory traits are very important for commercial crops such as tomatoes. The sensory analysis by taste panels, however, is costly in both time and money. Metabolites determine to a large extent sensory traits of tomatoes. Measuring metabolic profiles is much cheaper and faster than the use of taste panels. Therefore, metabolomics data are of major importance for assessing the quality of tomatoes.
Metabolomics data can originate from multiple types of analytical measurements. Fusing the data from multiple metabolomics platforms is done to find the most relevant metabolites that act as predictors for the specific sensory traits. A reduction in the number of metabolic analyses that still provides sufficient information about the quality may lead to a reduction in costs. A protocol has been developed to determine the amount of additional and redundant information between two mass spectrometry data sets and a specific sensory trait

The genetic basis of sensory and metabolic traits is of importance because tomato quality might be improved with breeding techniques. Therefore, methods have been developed to relate groups of metabolites originating from a metabolic pathway to genetic markers. These tools can also be used for relating groups of sensory traits to genetic markers. The genetic markers that are identified in this way can be used by plant breeders for improving their genetic material by means of so-called marker assisted selection.

Dependencies between genotypes (objects or rows in the data matrix) can mask relevant relationships between metabolites and sensory information. Genetic dependencies between genotypes can be obtained from genetic marker information. This information is incorporated in generalized least squares regression procedures for the prediction of sensory traits.

All sub projects were done based on the data originating from the Centre of Biosystems Genomics (CBSG). The results are/will be reported in scientific papers.