PYTHON EXTENSION
Import the library:
Once the library has been compiled and the python extension has been installed, you can import the library using the following command line:
Functions:
This library gives you 5 different functions to use the library:
- LIBIRWLS.full_train: To create a classifier using the PIRWLS algorithm.
- LIBIRWLS.budgeted_train: To create a classifier using the PSIRWLS algorithm.
- LIBIRWLS.predict: To use a classifier on a dataset.
- LIBIRWLS.save: To save a classifier in a file.
- LIBIRWLS.load: To load a classifier from a file.
LIBIRWLS.full_train:
It creates a classifier object using our parallel IRWLS procedure. See the library webpage for a detailed description.
- model = LIBIRWLS.full_train(data, labels, gamma=1, C=1, threads=1, workingSet=500, eta=0.001, kernel=1,verbose=1)
Parameters:
- data: Training set, format numpy 2d array
- labels: Training set labels (numpy 1d array with values +1 and -1)
-
- kernel: kernel type:
- 0 for Linear kernel u'*v
- 1 for radial basis function exp(-gamma*|u-v|^2)
- gamma: gamma in the radial basis kernel function
- C: SVM Cost
- workingSet: Size of the Least Squares Problem in every iteration
- threads: It is the number of parallel threads (info)
- eta: IRWLS stop criteria
- verbose (default 1)::
- 0 Silent mode
- 1 Print screen information about the training procedure
LIBIRWLS.budgeted_train:
It creates a classifier object using our parallel approxiamte IRWLS procedure. See the library webpage for a detailed description.
- model = LIBIRWLS.budgeted_train(data, labels, gamma=1, C=1, threads=1, size=500, algorithm=0.001, kernel=1, verbose=1)
Parameters:
- data: Training set, format numpy 2d array
- labels: Training set labels (numpy 1d array with values +1 and -1)
-
- kernel: kernel type:
- 0 for Linear kernel u'*v
- 1 for radial basis function exp(-gamma*|u-v|^2)
- gamma: gamma in the radial basis kernel function
- C: SVM Cost
- algorithm (default 1): The procedure to obtain the basis elements (see PSIRWLS)
- 0 Random Selectione
- 1 SGMA (Sparse Greedy Matrix Approximation)
- size: The size of the classifier (numbero of kernel evaluations, see PSIRWLS)
- threads: It is the number of parallel threads (info)
- verbose (default 1):
- 0 Silent mode
- 1 Print screen information about the training procedure
LIBIRWLS.predict:
To uses a classifier object to classify a new dataset.
- predictions = LIBIRWLS.predict(model, data, labels=None, Threads=1, Soft=0,verbose=1)
Parameters:
- model: A classifier obtained with this library (full_train or budgeted_train).
- data: The dataset to classify (numpy 2d array format)
- labels: Dataset labels (optinal parameter, if provided the function will show the accuracy of the classifier, numpy 1d array format with values +1 and -1)
- Threads: It is the number of parallel threads (info)
- Soft (default 0):
- 0 The classifier returns the class prediction (see outputs)
- 1 The classifier returns the output of the classification function (see outputs)
- verose (default 1):
- 0 Silent mode
- 1 Print screen information about the training procedure
LIBIRWLS.save:
To save a classifier in a file
- LIBIRWLS.save(model, file)
Parameters:
- model: A classifier obtained with this library (using full_train or budgeted_train).
- file: The file where the model will be saved
LIBIRWLS.load:
To load a classifier from a file
- model = LIBIRWLS.load(file)
Parameters:
- file: The file where a classifier was saved
Example
Download datasets:
To test this tool, you must have a dataset in the correct format, the following python command lines will download the adult dataset from the libsvm repository (training dataset file = a9a, test dataset file = a9a.t)
import urllib
urllib.urlretrieve ("https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a9a", "a9a")
urllib.urlretrieve ("https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a9a.t","a9a.t")
Creating the numpy matrices and vectors:
These dataset have libsvm format, you can load and transform them into numpy matrices using the following command lines
from sklearn.datasets import load_svmlight_file
import numpy as np
Xtr,Ytr = load_svmlight_file("a9a")
Xtst,Ytst = load_svmlight_file("a9a.t")
Xtr=Xtr.todense()
Xtst=Xtst.todense()
Training a classifier:
If you want to create a classifier using the PIRWLS algorithm (LIBIRWLS.full_train command) using the training dataset (Xtr a 2d numpy matrix with the variables and Ytr a 1d array with the classes), with gamma = 0.001, C = 1000 and one single thread you must use:
model = LIBIRWLS.full_train(Xtr, Ytr, gamma=0.001, C=1000, threads=1)
Testing the classifier:
If you want to use the classifier to classify a new dataset (e.g. the matrix Xtst) and obtain the hard classification (Soft=0, see outputs) :
predictions = LIBIRWLS.predict(model, Xtst, Soft=0, threads=2)
You can them compare the predictions with the true labels (Ytst array) to evaluate the accuracy of this classifier