PYTHON EXTENSION

Import the library:

Once the library has been compiled and the python extension has been installed, you can import the library using the following command line:

import LIBIRWLS

Functions:

This library gives you 5 different functions to use the library:

LIBIRWLS.full_train: To create a classifier using the PIRWLS algorithm.
LIBIRWLS.budgeted_train: To create a classifier using the PSIRWLS algorithm.
LIBIRWLS.predict: To use a classifier on a dataset.
LIBIRWLS.save: To save a classifier in a file.
LIBIRWLS.load: To load a classifier from a file.

LIBIRWLS.full_train:

It creates a classifier object using our parallel IRWLS procedure. See the library webpage for a detailed description.

model = LIBIRWLS.full_train(data, labels, gamma=1, C=1, threads=1, workingSet=500, eta=0.001, kernel=1,verbose=1)

Parameters:

data: Training set, format numpy 2d array
labels: Training set labels (numpy 1d array with values +1 and -1)
kernel: kernel type:
- 0 for Linear kernel u'*v
- 1 for radial basis function exp(-gamma*|u-v|^2)
gamma: gamma in the radial basis kernel function
C: SVM Cost
workingSet: Size of the Least Squares Problem in every iteration
threads: It is the number of parallel threads (info)
eta: IRWLS stop criteria
verbose (default 1)::
- 0 Silent mode
- 1 Print screen information about the training procedure

LIBIRWLS.budgeted_train:

It creates a classifier object using our parallel approxiamte IRWLS procedure. See the library webpage for a detailed description.

model = LIBIRWLS.budgeted_train(data, labels, gamma=1, C=1, threads=1, size=500, algorithm=0.001, kernel=1, verbose=1)

Parameters:

data: Training set, format numpy 2d array
labels: Training set labels (numpy 1d array with values +1 and -1)
kernel: kernel type:
- 0 for Linear kernel u'*v
- 1 for radial basis function exp(-gamma*|u-v|^2)
gamma: gamma in the radial basis kernel function
C: SVM Cost
algorithm (default 1): The procedure to obtain the basis elements (see PSIRWLS)
- 0 Random Selectione
- 1 SGMA (Sparse Greedy Matrix Approximation)
size: The size of the classifier (numbero of kernel evaluations, see PSIRWLS)
threads: It is the number of parallel threads (info)
verbose (default 1):
- 0 Silent mode
- 1 Print screen information about the training procedure

LIBIRWLS.predict:

To uses a classifier object to classify a new dataset.

predictions = LIBIRWLS.predict(model, data, labels=None, Threads=1, Soft=0,verbose=1)

Parameters:

model: A classifier obtained with this library (full_train or budgeted_train).
data: The dataset to classify (numpy 2d array format)
labels: Dataset labels (optinal parameter, if provided the function will show the accuracy of the classifier, numpy 1d array format with values +1 and -1)
Threads: It is the number of parallel threads (info)
Soft (default 0):
- 0 The classifier returns the class prediction (see outputs)
- 1 The classifier returns the output of the classification function (see outputs)
verose (default 1):
- 0 Silent mode
- 1 Print screen information about the training procedure

LIBIRWLS.save:

To save a classifier in a file

LIBIRWLS.save(model, file)

Parameters:

model: A classifier obtained with this library (using full_train or budgeted_train).
file: The file where the model will be saved

LIBIRWLS.load:

To load a classifier from a file

model = LIBIRWLS.load(file)

Parameters:

file: The file where a classifier was saved

Example

Download datasets:

To test this tool, you must have a dataset in the correct format, the following python command lines will download the adult dataset from the libsvm repository (training dataset file = a9a, test dataset file = a9a.t)

import urllib
urllib.urlretrieve ("https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a9a", "a9a")
urllib.urlretrieve ("https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a9a.t","a9a.t")

Creating the numpy matrices and vectors:

These dataset have libsvm format, you can load and transform them into numpy matrices using the following command lines

from sklearn.datasets import load_svmlight_file
import numpy as np
Xtr,Ytr = load_svmlight_file("a9a")
Xtst,Ytst = load_svmlight_file("a9a.t")
Xtr=Xtr.todense()
Xtst=Xtst.todense()

Training a classifier:

If you want to create a classifier using the PIRWLS algorithm (LIBIRWLS.full_train command) using the training dataset (Xtr a 2d numpy matrix with the variables and Ytr a 1d array with the classes), with gamma = 0.001, C = 1000 and one single thread you must use:

model = LIBIRWLS.full_train(Xtr, Ytr, gamma=0.001, C=1000, threads=1)

Testing the classifier:

If you want to use the classifier to classify a new dataset (e.g. the matrix Xtst) and obtain the hard classification (Soft=0, see outputs) :

predictions = LIBIRWLS.predict(model, Xtst, Soft=0, threads=2)

You can them compare the predictions with the true labels (Ytst array) to evaluate the accuracy of this classifier