6th Brazilian School of Probability (EBP6)

SHORT COURSES

Gregory Lawler (Cornell and Duke Universities - USA)
Conformally Invariant Processes in the Plane

Bernard Prum (Génopole Evry -- France)
Markov Models and Hidden Markov Models in Genome Analysis

Conformally Invariant Processes in the Plane

Gregory Lawler (Cornell and Duke Universities - USA)

The scaling limits of many models in statistical physics have been conjectured to have conformally invariant limits ``at criticality''. Such limits are now being understood by mathematicians. The chief new tool is the stochastic Loewner evolution (SLE), introduced by Oded Schramm. This, combined with ideas of universality, has led to a number of rigorous results. This course will be an introduction to conformally invariant processes and SLE with applications to intersection properties of Brownian motion, percolation, loop-erased, and self-avoiding walks. Much of what I will discuss will be joint work of Schramm, Wendelin Werner, and myself. If time allows, I will also discuss the situation in three dimensions where mathematicians are still very far from understanding things rigorously.

Markov Models and Hidden Markov Models in Genome Analysis

Bernard Prum (Génopole Evry - France)

Biological sequences essentially consist in DNA chains, the chromosome which transmit the information from a generation to the following one, and proteic chains, the proteins being the essential component of all phenomena in living cells. The first ones are writen in a 4 letters alphabet {a, c, g, t} while the second ones contain 20 letters, the amino-acid.

Daily, more than 20 millions of new deciphered letters arrive in the data banks and a challenge for the statisticians is to help the biologist for finding the relevant information in this huge amount of data.

A first topic we are interesed by consists in searching words whose frequency is too high to let believe it results from pure randomness. As an example, in bacterial genomes exists some signal (called CHI) which participates to their defenses and must therefore be sufficiently frequent to be efficient. Hence CHI's role is irrelevant with the usual genetic code but has another importance for the organism.

To search for these "exceptional" words, we look for a modelisation which could be both satisfactory for the biologist and tractable for the mathematician. One has to take into account the frequencies of the letters, of the 2-letters words, 3-letters words, etc..., hence to work conditionnally to the sufficient statistics of a Markov chain model. In these models for each word W, using a conditionnal approach, we compute the expectation and the variance of the number of occurrences and give result about its (asymptotic) law.

A very relevant criticism done against this modelisation is that it assumes the homogeneity of the sequence, and this hypothesis is worst and worst admitted by the biologists when they deal with larger and larger sequences. One way for answering these criticisms consists in allowing the simultaneous existence of more than one markovian model and this led us to work with Hidden Markov Models (HMM). These models quickly turn out to be statistical tools permiting much more than the separate analysis of regions choosed to be homogeneous. The fact that, at the begining of the algorithm, we must nor fix the markovian transition in each state nor the positions of the various states implies that adjusting a HMM on a sequence produces its "segmentation" by allocating a common characteristic to all the segments related to a same state.

An important drawback of the 'classical' modelisation by HMM is that it implies that the areas corresponding to a same state must have length distributed according to an exponential law, and this is not at all verified in the reality of genomes. Semi-markovian models solve this difficulty : they allow every law for the length of the various areas.

Joined with the use of charateristics of the biological context, these methods must significatively improve the performances of the predictions of homogeneous regions. We will present a few applications as search of "horizontal transfers" and "annotation".

Since some 10 years, it is admitted that beside the vertical transmission (from parents to offsprings), a phenomenon of horizontal transmission of genetical information plays an important role in the evolution of life.

For example some viruses may copy a part of the genome of some individual and transport and incorporate it in the genome of another individual, maybe of another species. The potential profit of this phenomenon is obvious: through such tranposons, a new beneficial gene can spread in a great number of species. As it is well known that each species leads to a different adjustment of a Markov model (frequencies of words change from a species to another), modelisation using HMM is perfectly adapted for searching tranposons.

The matter of "annotation" is to contribute to an automatic research in DNA sequences of coding parts, and within these of exons ans introns (in "eucaryotes" - essentially every species except bacteriae - genes contain two kinds of regions : exon message is in fine translated into the proteins, while introns desappear during the 'maturation' process). HMM is also a successful approach for this problem.

HOME

Committees

Proceedings

Probability in Genetics

Detailed Program

Invited Speakers

Short Courses

Conferences

Short Talks

Posters

Arrival Information

School Location

Registration Fee

Schedule

Transportation

Touristic Information

Previous Schools

SHORT COURSES

Gregory Lawler (Cornell and Duke Universities - USA)

Bernard Prum (Génopole Evry -- France)

Conformally Invariant Processes in the Plane

Gregory Lawler (Cornell and Duke Universities - USA)

Markov Models and Hidden Markov Models in Genome Analysis

Bernard Prum (Génopole Evry - France)

Contact