*(Sandipan Dey, **20 August 2016)*

- In this article,
*Gibbs Sampling*is used to find the*Best K-mer Motifs*from a collection of*Dna strings*. This problem appears in the*Rosalind*and the*Stepic*Bioinformatics problem-solving sites and also as an assignment in a*Coursera BioInformatics Algorithms*course provided by*UCSD*. - A Profile-randomly generated k-mer in a string is defined as follows: For each
*k-mer Pattern*in*Text*, compute the probability*Pr(Pattern | Profile)*, resulting in*n = |Text| – k + 1*probabilities*(p1, ., pn)*. These probabilities do not necessarily sum to*1*, but can still form the random number generator*Random(p1, ., pn)*based on them. *GIBBSSAMPLER*uses this random number generator to select a*Profile-randomly generated k-mer*at each step: if the*n-sided*(biased) die with probability of face-showups*(p1, ., pn)*rolls the number*i*, then the*Profile-randomly generated k-mer*is defined as the*i-th k-mer*in Text.- The algorithm is then repeated a few times with a few
*random starts*and the best motifs are chosen in the end. - The following shows the algorithm used by to implement the
*Gibbs Sampler*and also couple of sample input datasets used and the output best motifs computed. - The animations show the
*Profile matrix*(with*Information Content Scaled / unscaled*) computed at each*iteration*of the*Repeated GibsSampling*algorithm and how it changes.

Advertisements