In this article, the impact of varying regularization parameters for the *logistic regression* (with L2 norm) and the *SVM* binary classifiers on the*decision boundaries* learnt during training (how they *overfit* or *underfit*) will be shown for a few dataset.

- The following animation shows the impact of varying the
*lambda*parameter for the*logistic regression*classifier trained with*polynomial features*(upto degree 6, with 2 predictor variables) on a dataset (taken from Andrew Ng’s Coursera Machine Learning course). As can be seen from the change in the decision boundary contours, the model learnt*overfits*the training data for the low values of lambda and*underfits*for the high values.

- The following animation shows the impact of varying the parameters
*C*and*sigma*for the*support vector machine*classifier (with*Gaussian Kernel*) on another dataset (taken from Andrew Ng’s Coursera Machine Learning course). As can be seen from the change in the decision boundary contours again, the model learnt*overfits*the training data for the low values of lambda and*underfits*for the high values.

- Next the following
*apples and oranges*image will be used for binary classification: Orange will correspond to the positive class label and the apples to the negative class label. Only two color channels (red and green) will be used as predictor variables.

- First some training dataset is selected from the image to train the models. The yellow rectangles in the following two figures show the training datasets with positive label and negative label taken from the image respectively. As can be noticed, little bit of noise is introduced in the training data (for the positive class label) to test the robustness of the classifiers.

- Next a few
*logistic regression*models with*polynomial features*(upto 6 degrees) are learnt with the two predictors (namely the red and green channels), since the data is not linearly separable. The following animation shows the impact of varying the*lambda*parameter for the*logistic regression*classifier trained. As can be seen from the change in the decision boundary contours from the next animated figure, the model learnt*overfits*the training data for the low values of lambda and*underfits*for the high values.

- The logistic regression models learnt with different values of the regularization parameters are then used to classify the entire image data. The data points (pixels) predicted by the model as positive classes are marked as black (ideally the model should predict all the pixles of the orange in the image as positive classes). Again, as can be seen from the next animation, with high values of the regularization parameter lambda, the model
*underfits*and can’t classify the entire orange, although with low values of lambda it does a pretty good job in separating the orange out from the apples in the image.

- Next a few
*SVM*models with*Gaussian Kernel*(since non-linear decision boundary) are learnt with the two predictors (namely the red and green channels). The following animation shows the impact of varying the regularization parameter*C*and the kernel bandwidth parameter*sigma*for the*SVM*classifier trained.

- The SVM models learnt with different values of the regularization / kernel parameters are then used to classify the entire image data. The data points (pixels) predicted by the model as positive classes are marked as black (ideally the model should predict all the pixles of the orange in the image as positive classes). Again, as can be seen from the next animation, with high values of the regularization parameter C, the model
*underfits*and can’t classify the entire orange, although with low values of C it does a pretty good job in separating the orange out from the apples in the image.

Advertisements