The following problem appeared as an assignment in the *coursera course Natural Languange Processing (by Stanford university)*. The following description of the problem is taken from the assignment description and the course slides.

The following figure shows the basic concepts of spelling correction using the **noisy channel model**. Given the **misspelled word**, the **most probable correct word** can be computed by

**generating**the possible correct words (from the*vocabulary*) with some**edit-distance model**.**estimating**the**prior**and the**likelihood**probabilities using some**language model**.**computing**the**posterior**probabilities and the**MAP estimate**using the**Bayes Theorem**.

The following figure shows how the candidate words (possible correct words) can be generated (mostly the words within * edit distance 1*) and then a suitable language model can be used to estimate (with

**) the**

*MLE**prior*and the

*likelihood*(insertion / deletion etc.) probabilities from the training corpus.

The following figure shows the **real world spelling correction** methods using the **noisy channel model**, we are going to use these techniques to solve the spelling-correction problem. Given a *k-word* sentence **W=w1,w2,…,wk**, we are going to apply the following steps to compute the most probable sentence with the possibly-corrected **misspelled**–**words**:

- For each word
*w_i*generate the candidate set (e.g., the words that are withing).*1-edit distance* - Use a
to compute the probability of a sequence*language model*and then find the*P(W)*.*most probable word-sequence W*

The following figure shows the different **language models** that are going to be used to compute the **MLE** (without / with **smoothing**) probabilities and also how they can be used to compute the probability of a sentence (i.e. a sequence of words).

- Given a
**training corpus**, the**parameters**for the**language models**are first estimated from the corpus. - Then these probabilities are used to compute the most probable (possibly spell-corrected) sequence of candidate words for each sentence in the
**dev corpus**. - The
**accuracy**of**spell-correction**on the**dev corpus**is computed using the ground truth for each**language model**and reported.

## Results

The following results show the **accuracy** of **spell-correction** using the different **language models** on the **dev corpus**:

**Uniform Language Model**: correct: 31 total: 471 accuracy:**0.065817****Unigram Language Model**: correct: 29 total: 471 accuracy:**0.061571****Laplace Unigram Language Model**: correct: 52 total: 471 accuracy:**0.110403****Laplace Bigram Language Model**: correct: 64 total: 471 accuracy:**0.135881****Stupid Backoff Language Model**: correct: 85 total: 471 accuracy:**0.180467****Custom Language Model (Simple Linear Interpolation λ1=0.17,λ2=0.39,λ3=0.44****)**: correct: 91 total: 471 accuracy:**0.193206**

**dev corpus**with

**misspelled**

**word(s)**along with the

**ground truths**and the

**noisy channel model-corrected outputs**with different

**language models**.

The next figure shows the **simple linear interpolation language model** accuracy on **dev corpus** with different values of **λ1** and **λ2.**