BUDGETED SOLUTION

Budgeted Support Vector Machines

The solution of the SVM (see full solution section) is non-parametric, the size of the classification function (that is the number of Support Vectors) is not predetermined and is modeled based on the training samples. To keep the complexity under control budgeted models can be adopted. A set of m basis elements $\{\mathbf{c}_1,...,\mathbf{c}_m\}$ can be selected and $\mathbf{w}$ (see FULL solution) is approximated by $\mathbf{w}\simeq \sum_{j=1}^m\beta_j\phi(\mathbf{c}_j)$, giving rise to a classifier of size $m$:

$f(\mathbf{x_i})=\sum_{j=1}^m\beta_j K(\mathbf{x}_i,\mathbf{c}_j)$

Finally, the weights $\boldsymbol{\beta}$ are obtained solving the following minimization problem:

$\min_{\boldsymbol{\beta}} \frac{1}{2}\mathbf{\boldsymbol{\beta}^T\mathbf{K}_c\boldsymbol{\beta}}+C\sum_{i=1}^n{\xi_i}$
$(\mathbf{K_c})_{i,j} = K(\mathbf{c}_i,\mathbf{c}_j), i,j=1,...,m$
$y_i(\boldsymbol{\beta}^T\mathbf{k}_i+b)\ge 1-\xi_i, i=1,2,...,n$
$\xi_i \ge 0,i=1,2,...,n$
$(\mathbf{k}_i)_j=K(\mathbf{x}_i,\mathbf{c}_j)$

IRWLS to train a budgeted SVM:

An IRWLS schema has been successfully used to train budgeted SVMs by rearranging the cost function (Parrado, 2003):

$\min_{\boldsymbol{\beta}}\frac{1}{2}\mathbf{\boldsymbol{\beta}^T\mathbf{K}_c\boldsymbol{\beta}}+\sum_{i}a_{i}e_i^2$
$e_i=y_i-\left(\sum_{j=0}^m\beta_jK(\mathbf{x}_i,\mathbf{c}_j)+b\right)$
$a_i=\begin{cases}0,y_ie_i \le 0\\\frac{C}{e_iy_i},y_ie_i \ge 0\end{cases}$

The procedure works in a similar way to the full SVM, $a_i$ is not a function of $\boldsymbol{\beta}$ and the algorithm works iteratively following a process of obtaining $\boldsymbol{\beta}$ and recalculating $a_i$ using these weights.

To parallelize this software, a shared memory parallelization has been performed as described in (Díaz Morales, 2016a)(Díaz Morales, 2016b).

Basis elements selection

This algorithm contains two different methods to select the basis elements of the budgeted model:

Random Selection

A simple random selection of training set elements.

Sparse Greedy Matrix Aproximaton

SGMA (Smola, 2000) is a very common greedy technique to approximate a kernel matrix using outer products of a reduced number of vectors, and hence can be used to select the basis elements of the budgeted model. It has been used very successfully in many budgeted SVM implementations (Díaz Morales, 2016a)(Díaz Morales, 2016b).

It works by iteratively (greedy) adding to the selected set the element that maximizes a given criteria, as explained below.

Having a data set $\{\mathbf{x}_1$,...,$\mathbf{x}_n\}$ and a set of basis elements $\{\mathbf{c}_1$,...,$\mathbf{c}_m\}$ we can approximate the kernel functions as a linear combination of other kernels:

$k(\mathbf{x}_i,.)=\sum_{j=1}^m\alpha_{i,j}k(\mathbf{c}_j,.)$

The approximation error of this linear combination is:

$Err(\alpha^{n,m})=tr\mathbf{K}-\sum_{i=1}^n\sum_{j=1}^m\alpha_{i,j}k(\mathbf{x}_i,\mathbf{c}_j)$
$(\mathbf{K})_{i,j}=K(\mathbf{x}_i,\mathbf{x}_j), \forall i,j = 1,...,n$

And the approximation error of adding a new element $\mathbf{c}_{m+1}$ to the set of basis can be expresssed as a function of the previous error:

$Err(\alpha^{n,m+1})=Err(\alpha^{n,m})-\underbrace{\eta^{-1}\Vert\mathbf{K}_{nm}\mathbf{z}-\mathbf{k}_{m+1,n}\Vert^2}_{Error Decrease}$

Where:

$(\mathbf{k}_{m+1,n})_i=k(\mathbf{c}_{m+1},\mathbf{x}_{i}) \forall i=1,...,n$
$(\mathbf{k}_{nc})_i=k(\mathbf{x}_i,\mathbf{c}_{m+1}) \forall i=1,...,m$
$(\mathbf{k}_{mc})_i=k(\mathbf{c}_i,\mathbf{c}_{m+1}) \forall i=1,...,m$
$\mathbf{z}=\mathbf{K_c}^{-1}\mathbf{k}_{mc}$
$\eta=1-\mathbf{z}^T\mathbf{k}_{nc}$
$(\mathbf{K}_{nm})_{ij}=k(\mathbf{x}_i,\mathbf{c}_j) \forall i=1,...,n, j=1,...,m$

SGMA aims at looking for the best elements to be used in the budgeted model. It has to identify the elements of the training set whose projections most accurately represent the projected Support Vectors. To do that, in every iteration it selects a group of new candidates and grows the set of basis elements with the element with higher Error Decrease (ED).