*(Sandipan Dey, **12 Jan 2017)*

This problem appeared as an exercise in the *Coursera Course Algorithm-I (By Prof. ROBERT SEDGEWICK, Princeton)* as an application of the *Union-find*.

### Percolation

(as defined in http://coursera.cs.princeton.edu/algs4/assignments/percolation.html)

Given a composite systems comprised of randomly distributed insulating and metallic materials: what fraction of the materials need to be metallic so that the composite system is an electrical conductor? Given a porous landscape with water on the surface (or oil below), under what conditions will the water be able to drain through to the bottom (or the oil to gush through to the surface)? Scientists have defined an abstract process known as percolation to model such situations.

### The model

A *percolation system* is modeled using an *n-by-n grid* of sites. Each site is either *open* or *blocked*. A *full site* is an open site that can be connected to an *open site* in the *top row* via a chain of neighboring (left, right, up, down) open sites. The system is said to *percolate* if there is a *full site* in the *bottom row*. In other words, a system percolates if we fill all open sites connected to the top row and that process fills some open site on the bottom row. (For the insulating/metallic materials example, the open sites correspond to metallic materials, so that a system that percolates has a metallic path from top to bottom, with full sites conducting. For the porous substance example, the open sites correspond to empty space through which water might flow, so that a system that percolates lets water fill open sites, flowing from top to bottom.)

### The problem

In a famous scientific problem, researchers are interested in the following question: if sites are independently set to be open with probability *p* (and therefore blocked with probability *1 – p*), what is the probability that the system percolates? When *p* equals *0*, the system does not percolate; when *p* equals *1*, the system percolates. The plots below show the site vacancy probability p versus the percolation probability for 20-by-20 random grid (left) and 100-by-100 random grid (right).

When *n* is sufficiently large, there is a threshold value *p** such that when *p < p** a *random n-by-n grid* almost never percolates, and when *p > p**, a random *n-by-n grid* almost always percolates. No mathematical solution for determining the percolation threshold *p** has yet been derived. We shall estimate *p** with *Monte Carlo simulation*.

### Monte Carlo simulation

To estimate the *percolation threshold*, let’s consider the following computational experiment:

- Initialize all sites to be blocked.
- Repeat the following until the system percolates:
- Choose a site uniformly at random among all blocked sites.
- Open the site.
- The fraction of sites that are opened when the system percolates provides an estimate of the percolation threshold.

- By repeating this computation experiment
*T*times and averaging the results, we obtain a more accurate estimate of the*percolation threshold*. Let*x_t*be the fraction of open sites in computational experiment*t*. The sample mean provides an estimate of the percolation threshold; the sample standard deviation*s*measures the*sharpness*of the threshold. - Assuming
*T*is sufficiently large (say, at least*30*, by*CLT*), we can construct a*95% confidence interval*for the percolation threshold as follows.

### Union Find

This problem can be solved efficiently by using the (weighted) *quick union-find* algorithm. When a new site is opened, we can use *union* to add the site to some existing adjacent open sites. Each time we can check if the top rows are *connected* to the bottom rows and determine if we reached the percolation threshold *p**. The below figure explains the data structures and algorithms used.

The following figure shows the connected sites at the percolation threshold for different grid size.

The following animation shows how the system evolves till the point the percolation threshold is reached.

Finally the following figures show how the *average percolation threshold* and the corresponding *95% confidence interval* varies with different *n* using the *Monte-Carlo Simulation* with *T=50*. As can be seen, the higher the *n* is, the *narrower* the confidence interval, the more certain we are about the value of the *percolation threshold*.

### Erdos-Renyi Random Graph, the Giant Component and Connectivity

- The following figure shows the similar results for the
*E-R*random graphs. As the probability*p*increases and reaches a threshold around*1/n*a giant component appears the graph. Also, an*E-R random graph*becomes almost surely connected at the probability threshold*ln(n)/n*.

- The following animation shows how the giant component appears for an
*E-R*random graph with*n=100*as the probability*p*increases.

- The following figures show the results of a
*Monte-Carlo Simulation*(with*T=100*) which shows that there is narrow region around the probability threshold*p*below which*G(n,p)*is*not connected a.s.*and above which*G(n,p)*is*connected with probability 1 a.s.*, for*n=100*(shown in the*log-scale*too).

More theories regarding **Random Graphs** can be found here.