Bioinformatik II: Theoretische Bioinformatik und Maschinelles Lernen (4VO)

LVA-Nr.: 365.034
LVA-Leitung: Sepp Hochreiter
Zeit und Ort: Mo 14:30-16:15, Raum HS14 und
Mi 15:30-17:00, Raum T111(Beginn: 5.3.2007)
Typ: VO, 4h, wöchentlich
Anmeldung: KUSSS
Klausur: Der Klausurtermin am Di 9.10.2007, 10:15-11:45 findet nicht statt!
Eine mündliche Prüfung ist aber jederzeit möglich. Terminvereinbarung per Email.

Contents:

Classification, regression, kernels, sequence analysis, neuronal nets, support vector machines, hidden Markov models, clustering, principal component analysis, independent component analysis, projection methods, PCA, ICA, factor analysis, error models, optimization techniques, regularization, Bayes approach, hyper-parameter optimization, feature selection, statistical learning theory, generalization error, maximum likelihood, model selection, etc.

Motivation:

Machine learning methods, for example neural networks used for the secondary and 3D structure prediction of proteins, have proven their value as essential bioinformatics tools. Modern measurement techniques in both biology and medicine create a huge demand for new machine learning approaches. One such technique is the measurement of mRNA concentrations with microarrays, where the data is first preprocessed, then genes of interest are identified, and finally predictions made. In other examples DNA data is integrated with other complementary measurements in order to detect alternative splicing, nucleosome positions, gene regulation, etc. All of these tasks are performed by machine learning algorithms. Alongside neural networks the most prominent machine learning techniques relate to support vector machines, kernel approaches, projection method and belief networks. These methods provide noise reduction, feature selection, structure extraction, classification / regression, and assist modeling. In the biomedical context, machine learning algorithms predict cancer treatment outcomes based on gene expression profiles, they classify novel protein sequences into structural or functional classes and extract new dependencies between DNA markers (SNP - single nucleotide polymorphisms) and diseases (schizophrenia or alcohol dependence).

In this course the most prominent machine learning techniques are introduced and their mathematical foundations are shown. However, because of the restricted space neither mathematical or practical details are presented. Only few selected applications of machine learning in biology and medicine are given as the focus is on the understanding of the machine learning techniques. If the techniques are well understood then new applications will arise, old ones can be improved, and the methods which best fit to the problem can be selected.

Students should learn how to chose appropriate methods from a given pool of approaches for solving a specific problem. Therefore they must understand and evalute the different approaches, know their andavtagtes and disandavantages as well as where to obtain and how to use them. In a step further, the students should be able to adapt standart algorithms for their own purposes or to modify those algorithms for specific applications with certain prior knowledge or special constraints.