Perceptron, Average Perceptron and Pegasos

Implementation of Supervised Learning Algorithms - Perceptron, Average Perceptron, and Pegasos - in pure python code
machine-learning
tutorial

Abstract

The goal of this project is to write three Supervised Learning Algorithms - Perceptron, Average Perceptron, and Pegasos - in pure python code. I’ll use numpy for handling and creating numerical arrays. The parameter estimation is done through Stochastic Gradient Descent algorithm.

import numpy as np

Test Cases

I’ve created some test cases for each algorithm that I’ll be writing, to test the correctness of the algorithm. e.g.  check_get_order() check_hinge_loss_single() check_hinge_loss_full()

Algorithms

Hinge Loss

HINGE_LOSS (\(\{(\ x^{(i)},\ y^{(i)}: i = 1.....n\ )\}\), \(\theta\), \(\theta_0\)):
  loss \(\gets\) 0
  for i \(\gets\) 1….n:
    \(z\) \(\gets\) $ y^{(i)} (. x^{(i)} + _0 ) $
    \(loss\ \gets loss + max(0, 1 - z)\)
  return \(loss \div n\)

#collapse
def hinge_loss_single(feature_vector, label, theta, theta_0):
    y = theta @ feature_vector + theta_0
    return np.maximum(0, 1 - y * label)
#collapse
def hinge_loss_full(feature_matrix, labels, theta, theta_0):
    loss = 0
    for i in range(len(labels)):
        loss += hinge_loss_single(feature_matrix[i], labels[i], theta, theta_0)
    return loss / len(labels)

1. Perceptron

For perceptron algorithm, I’ll use 0-1 loss for simplicity.
Algo takes feature matrix, T - no. of times to run the algo on the given data, labels as input and returns estimated \(\theta\) \(\And\) \(\theta_0\)

Proof for perceptron update:
Loss function(for a particular sample \(\{x^i, y^i\}\)):
Aggrement: $ z = y^i*(x^i.+ _0)$, Learning rate: \(\eta\)

\(\hat{E}(\{x^i, y^i\}, \theta, \theta_0) = \begin{cases} 0,\ \ \ \ \ \ \ \ \ \ \ \text{ if } z >0 \\ 1-z,\ \ \ \ \text{ otherwise} \end{cases}\)

$ _ = \[\begin{cases} 0,\ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{if } \hat{E} = 0\\ -y^i*x^i,\ \ \ \ \text{otherwise} \end{cases}\]

$

\(\nabla_{\theta_0}\hat{E} = \begin{cases} 0,\ \ \ \ \ \ \ \ \ \ \ \ \ \ \text{if } \hat{E} = 0\\ -y^i,\ \ \ \ \text{otherwise} \end{cases}\)
therefore the gradient descent update will be:
\(\theta \gets \theta + \eta . y^i x^i\)
\(\theta_0 \gets \theta_0 + y^i\)

PERCEPTRON (\(\{(\ x^{(i)},\ y^{(i)}: i = 1.....n\ )\}\), T):
\(\theta \gets 0 \ vector\)
\(\theta_0 \gets 0.0\)
  for t \(\gets\) 1….T:
   Randomly shuffle indices
   for i in indices:
    if $ y^{(i)} (. x^{(i)} + _0 ) $:
     \(\theta \gets \theta + y^{(i)} . x^{(i)}\)
     \(\theta_0 \gets \theta_0 + y^{(i)}\)

#collapse
def perceptron_single_step_update(
        feature_vector,
        label,
        current_theta,
        current_theta_0):
    if label * (np.dot(current_theta, feature_vector) + current_theta_0) <= 1e-7:
        current_theta += label * feature_vector
        current_theta_0 += label
    return current_theta, current_theta_0
#collapse
def perceptron(feature_matrix, labels, T):
    import random
    random.seed(1)
    nsamples, nfeatures = feature_matrix.shape
    theta = np.zeros(nfeatures)
    theta_0 = 0.0
    for t in range(T):
        indices = list(range(nsamples))
        random.shuffle(indices)
        for i in indices:
            theta, theta_0 = perceptron_single_step_update(feature_matrix[i], labels[i], theta, theta_0)
    return theta, theta_0

2. Average Perceptron

The Average Perceptron algorithm gives better results than the simple Perceptron algorithm in most of the cases.
Average Perceptron is analogous to the simple Perceptron except that we also need to keep track of the sum of \(\theta\)’s got after each run of the algorithm and then finally return the Average of those.

#collapse
def average_perceptron(feature_matrix, labels, T):
    import random
    random.seed(1)
    nsamples, nfeatures = feature_matrix.shape
    theta = np.zeros(nfeatures)
    theta_sum =  np.zeros(nfeatures)
    theta_0 = 0.0
    theta_0_sum = 0.0
    for t in range(T):
        indices = list(range(nsamples))
        random.shuffle(indices)
        for i in indices:
            theta, theta_0 = perceptron_single_step_update(feature_matrix[i], labels[i], theta, theta_0)
            theta_sum += theta
            theta_0_sum += theta_0
    return theta_sum / (nsamples * T), theta_0_sum / (nsamples * T)

3. Pegasos

#collapse
def pegasos_single_step_update(feature_vector, label, L, eta, current_theta,
                               current_theta_0):
    c = 1 - eta * L
    d = eta * label
    if label * (current_theta @ feature_vector + current_theta_0) <= 1:
        return c * current_theta + (d * feature_vector), current_theta_0 + d
    return c * current_theta, current_theta_0
#collapse
def pegasos(feature_matrix, labels, T, L):
    import random
    random.seed(1)
    nsamples, nfeatures = feature_matrix.shape
    theta = np.zeros(nfeatures)
    theta_0 = 0
    count = 0
    for t in range(T):
        indices = np.array(range(nsamples))
        random.shuffle(indices)
        for i in indices:
            count += 1
            eta = 1 / np.sqrt(count)
            theta, theta_0 = pegasos_single_step_update(
                feature_matrix[i], labels[i], L, eta, theta, theta_0)
    return theta, theta_0

Testing

Apply algos on synthetic data

Generate Synthetic Data

#collapse
def synth_data_2d(n):
    import numpy as np
    import scipy.stats as ss
    np.random.seed(1)
    positive_data = ss.norm.rvs(loc=(4,5), scale=(1,1), size=(n,2))
    negative_data = ss.norm.rvs(loc=(-2,-2), scale=(1,1), size=(n,2))
    data = np.concatenate((positive_data, negative_data), axis=0)
    class1 = np.repeat(1, positive_data.shape[0])
    class2 = np.repeat(-1, negative_data.shape[0])
    labels = np.concatenate([class1, class2])
    return data, labels
X, y = synth_data_2d(1000)
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib notebook
mpl.style.use("ggplot")

colors = ['darkcyan' if label == 1 else 'salmon' for label in y]

plt.scatter(X[:, 0], X[:, 1], c=colors, s=10, alpha=0.75)
plt.xlabel("$X_1$")
plt.ylabel("$X_2$")
plt.title("Scatter Plot - Synthetic data")
plt.show()

Decesion Boundaries

Get parameters \(\theta\) and \(\theta_0\) for each algorithm and plot the corresponding decesion boundary.

#collapse
ap_theta, ap_theta_0 = average_perceptron(X, y, 10)
p_theta, p_theta_0 = perceptron(X, y, 10)
peg_theta, peg_theta_0 = pegasos(X, y, 10, 0.01)
plt.scatter(X[:, 0], X[:, 1], s=10, c=colors)
xmin, xmax = plt.axis()[:2]

# plot the decision boundary
xs = np.linspace(xmin, xmax)
ap_ys = -(ap_theta[0]*xs + ap_theta_0) / (ap_theta[1] + 1e-16)
p_ys = -(p_theta[0]*xs + p_theta_0) / (p_theta[1] + 1e-16)
peg_ys = -(peg_theta[0]*xs + peg_theta_0) / (peg_theta[1] + 1e-16)

plt.plot(xs, ap_ys, '--', label="Avg Perceptron")
plt.plot(xs, p_ys, '--', label="Perceptron")
plt.plot(xs, peg_ys, '-', label="Pegasos")
plt.legend(loc="upper left")
plt.xlabel("$X_1$")
plt.ylabel("$X_2$")
plt.title("Decesion Boundary Comparison")
plt.show()

Because the Pegasus Algorithm is a Linear Support Vector Machine Algorithm, so let’s plot its Decision and Margin Boundaries.

%matplotlib notebook
import matplotlib as mpl
mpl.style.use("ggplot")
colors = ['darkcyan' if label == 1 else 'salmon' for label in y]
plt.scatter(X[:, 0], X[:, 1], s=30, c=colors)
xmin, xmax = plt.axis()[:2]



# plot the decision boundary
xs = np.linspace(xmin, xmax)
ys = -(peg_theta[0]*xs + peg_theta_0) / (peg_theta[1] + 1e-16)
upper_margin = -(peg_theta[0]*xs + peg_theta_0 + np.linalg.norm(peg_theta)) / (peg_theta[1] + 1e-16)
lower_margin = -(peg_theta[0]*xs + peg_theta_0 - np.linalg.norm(peg_theta)) / (peg_theta[1] + 1e-16)
plt.plot(xs, ys, '-', label="Decision Boundary")
plt.plot(xs, upper_margin,'--', color = "grey", label="Margin Boundary")
plt.plot(xs, lower_margin, '--', color = "grey")
plt.legend(loc="upper left")
plt.xlabel("$X_1$")
plt.ylabel("$X_2$")
plt.title("Pegasos Decesion and Margin Boundaries")
plt.show()