PYTHON EXTENSION


Import the library:

Once the library has been compiled and the python extension has been installed, you can import the library using the following command line:

  • import LIBIRWLS

Functions:

This library gives you 5 different functions to use the library:

  • LIBIRWLS.full_train: To create a classifier using the PIRWLS algorithm.
  • LIBIRWLS.budgeted_train: To create a classifier using the PSIRWLS algorithm.
  • LIBIRWLS.predict: To use a classifier on a dataset.
  • LIBIRWLS.save: To save a classifier in a file.
  • LIBIRWLS.load: To load a classifier from a file.

LIBIRWLS.full_train:

It creates a classifier object using our parallel IRWLS procedure. See the library webpage for a detailed description.

  • model = LIBIRWLS.full_train(data, labels, gamma=1, C=1, threads=1, workingSet=500, eta=0.001, kernel=1,verbose=1)

Parameters:

  • data: Training set, format numpy 2d array
  • labels: Training set labels (numpy 1d array with values +1 and -1)
  • kernel: kernel type:
    • 0 for Linear kernel u'*v
    • 1 for radial basis function exp(-gamma*|u-v|^2)
  • gamma: gamma in the radial basis kernel function
  • C: SVM Cost
  • workingSet: Size of the Least Squares Problem in every iteration
  • threads: It is the number of parallel threads (info)
  • eta: IRWLS stop criteria
  • verbose (default 1)::
    • 0 Silent mode
    • 1 Print screen information about the training procedure

LIBIRWLS.budgeted_train:

It creates a classifier object using our parallel approxiamte IRWLS procedure. See the library webpage for a detailed description.

  • model = LIBIRWLS.budgeted_train(data, labels, gamma=1, C=1, threads=1, size=500, algorithm=0.001, kernel=1, verbose=1)

Parameters:

  • data: Training set, format numpy 2d array
  • labels: Training set labels (numpy 1d array with values +1 and -1)
  • kernel: kernel type:
    • 0 for Linear kernel u'*v
    • 1 for radial basis function exp(-gamma*|u-v|^2)
  • gamma: gamma in the radial basis kernel function
  • C: SVM Cost
  • algorithm (default 1): The procedure to obtain the basis elements (see PSIRWLS)
    • 0 Random Selectione
    • 1 SGMA (Sparse Greedy Matrix Approximation)
  • size: The size of the classifier (numbero of kernel evaluations, see PSIRWLS)
  • threads: It is the number of parallel threads (info)
  • verbose (default 1):
    • 0 Silent mode
    • 1 Print screen information about the training procedure

LIBIRWLS.predict:

To uses a classifier object to classify a new dataset.

  • predictions = LIBIRWLS.predict(model, data, labels=None, Threads=1, Soft=0,verbose=1)

Parameters:

  • model: A classifier obtained with this library (full_train or budgeted_train).
  • data: The dataset to classify (numpy 2d array format)
  • labels: Dataset labels (optinal parameter, if provided the function will show the accuracy of the classifier, numpy 1d array format with values +1 and -1)
  • Threads: It is the number of parallel threads (info)
  • Soft (default 0):
    • 0 The classifier returns the class prediction (see outputs)
    • 1 The classifier returns the output of the classification function (see outputs)
  • verose (default 1):
    • 0 Silent mode
    • 1 Print screen information about the training procedure

LIBIRWLS.save:

To save a classifier in a file

  • LIBIRWLS.save(model, file)

Parameters:

  • model: A classifier obtained with this library (using full_train or budgeted_train).
  • file: The file where the model will be saved

LIBIRWLS.load:

To load a classifier from a file

  • model = LIBIRWLS.load(file)

Parameters:

  • file: The file where a classifier was saved

Example

Download datasets:

To test this tool, you must have a dataset in the correct format, the following python command lines will download the adult dataset from the libsvm repository (training dataset file = a9a, test dataset file = a9a.t)

import urllib
urllib.urlretrieve ("https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a9a", "a9a")
urllib.urlretrieve ("https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary/a9a.t","a9a.t")

Creating the numpy matrices and vectors:

These dataset have libsvm format, you can load and transform them into numpy matrices using the following command lines

from sklearn.datasets import load_svmlight_file
import numpy as np
Xtr,Ytr = load_svmlight_file("a9a")
Xtst,Ytst = load_svmlight_file("a9a.t")
Xtr=Xtr.todense()
Xtst=Xtst.todense()

Training a classifier:

If you want to create a classifier using the PIRWLS algorithm (LIBIRWLS.full_train command) using the training dataset (Xtr a 2d numpy matrix with the variables and Ytr a 1d array with the classes), with gamma = 0.001, C = 1000 and one single thread you must use:

model = LIBIRWLS.full_train(Xtr, Ytr, gamma=0.001, C=1000, threads=1)

Testing the classifier:

If you want to use the classifier to classify a new dataset (e.g. the matrix Xtst) and obtain the hard classification (Soft=0, see outputs) :

predictions = LIBIRWLS.predict(model, Xtst, Soft=0, threads=2)

You can them compare the predictions with the true labels (Ytst array) to evaluate the accuracy of this classifier


Copyright © 2014-2017