In this article, the dual perceptron implementation and some non-linear kernels are used to separate out some linearly inseparable datasets.
- The following figure shows the dual perceptron algorithm used. The implementation is comparative fast since we don’t need to generate the non-linear features by ourselves, the kernel trick does that for us.
- Few 2-D training datasets are generated uniformly randomly and the corresponding binary class labels are also generated randomly. Then the dual implementation of the perceptron with a few non-linear kernels, such as Gaussian (with different bandwidth), Polynomial (with different degrees) were used to separate out the positive from the negative class data points. The following figures show the decision boundaries learnt by the perceptron upon convergence.
- The next animation shows the iteration steps how the dual perceptron implementation (with Gaussian Kernel with bandwidth 0.1) converges on a given training dataset.
- The next figure shows how the dual perceptron with polynomial kernel overfits with the increase in the degree of the polynomial. Also, with the Gaussian kernel it learns a different decision boundary altogether.