COMMAND LINE

Commands:

This library can currently train full SVMs and Budgeted SVMs. Once this library has been compiled, you can find the executable files under the folder "bin". The syntax to train a model is very intuitive a similar to other svm based tools like libsvm or svmlight.

To train a model using the algorithms you must use their respective commands followed by the options (see options section), the file that contains the labeled training set (see datasets section to see the supported formats) and the file where the trained model will be saved (see model file section):

./full-train [options] training_set_file model_file
./budgeted-train [options] training_set_file model_file

To classify a new dataset you must specify the command LIBIRWLS-predict followed by the options (see options section), the file that contains the dataset to classify (it can be a labeled or unalbeled dataset, see datasets section to see the supported formats), the file that contains a trained model (previously obtained using the algorithms, see model file section) and finally the file to save the classification of the dataset (see output section):

./LIBIRWLS-predict [options] dataset_file model_file output_file

Options:

Training options:

General options

-k kernel type (see algorithms, default 1):
- 0 = Linear kernel u'v
- 1 = radial basis function exp(-gamma|u-v|^2)
-g Gamma: Set gamma in the radial basis kernel function (see algorithms, default 1)
-c Cost: Set the SVM Cost (see algorithms, default 1)
-t Number_of_Threads: It is the number of parallel threads (see parallelization, default 1)
-f File format (see datasets, default 1):
- 0 = CSV format
- 1 = libsvm format
-p separator: csv separator character (only applicable if CSV format is selected, default ",")
-v verbose (default 1):
- 0 = Silen mode, no screen messages
- 1 = Screen messages

Specific options of full-train

-w Working_set_size: Size of the Least Squares Problem in every iteration (default 500)
-e eta: Stop criteria (default 0.001)

Specific options of budgeted-train

-s Classifier_size: Size of the classifier (see budgeted solution, default 50)
-a Algorithm: Algorithm for centroids selection (see budgeted solution,default 1)
- 0 = Random Selection
- 1 = SGMA (Sparse Greedy Matrix Approximation)

Classification options:

To use with the command LIBIRWLS-predict.

-s Soft output (see output, default 0):

0 Hard classification (class prediction)
1 Soft classification

-l Labeled: (see datasets, default 0)

1 if the dataset is labeled
0 if the dataset is unlabeled
-t Number_of_Threads: It is the number of parallel threads (see parallelization, default 1)
-f File format (see datasets, default 1):
- 0 = CSV format
- 1 = libsvm format
-p separator: csv separator character (only applicable if CSV format is selected, default ",")
-v verbose (default 1):
- 0 = Silen mode, no screen messages
- 1 = Screen messages

Dataset Files:

This library supports datasets in two different file formats LibSVM and CSV (the format must be specified using the options). The datasets must be labeled for training and labeled or unlabeled for testing. The labels must take values +1 and -1.

You can find a high number of datasets in libSVM format in the libsvm repository. You can test this library with them, take into account that many of them don't have labels with values +1 and -1 and you need to adapt them.

LibSVM format

Labeled - Every line is a sample and is composed by the class (+1 or -1) and followed by pairs of feature index feature value:

+1 1:5 7:2 15:6
+1 1:5 7:2 13:6 23:3
-1 2:4 7:3 10:6 23:1

Unlabeled - Every line is a sample and is composed by pairs of feature index and feature value:

1:5 7:2 15:6
1:5 7:2 13:6 23:3
2:4 7:3 10:6 23:1

CSV format

Each line of the file is a data record. Each record consists of one or more fields, separated by commas (the separator character can be modified using the command options).

Labeled - The first value of every row must tell the class of the data and must take +1 or -1 value:

+1,5,0,0,3,0,0,25
+1,5,0,2,0,6,3,0
-1,0,4,3,0,0,6,1

Unlabeled - Every line contains only the value of every feature

5,0,0,3,0,0,25
5,0,2,0,6,3,0
0,4,3,0,0,6,1

Model Files:

The model files store a trined classifier and they are not directly editable. They contain the numerical values of the model in binary format.

Output Files:

The classification function of the SVM takes the form of a linear combination of kernel functions:

$f(\mathbf{x_i})=\sum_{j=1}^m\beta_j K(\mathbf{x}_i,\mathbf{c}_j)$

If this value is positive, the sample will be classified as class +1. Otherwise, the data will be classified as -1. A hard classification directly tell the class, a soft classification tell the output of this classification function.

When we use the command "LIBIRWLS-predict" to classify a dataset the result will be stored in a file where every row is the classification of a data. The format (soft or hard) can be specified using the options.

Examples

Example 1 - Training using full-train

If you have a training set in a file called "training.txt" with libsvm format (-f 1) and you want to create a classifier using our full SVM training procedure (full-train command) with gaussian kernel (-k 1), gamma = 0.001 (-g 0.001), cost = 1000 (-c 1000) and using 4 threads to speed up the training (-t 4) you can use the following command:

/Users/robedm/LIBIRWLS

./bin/full-train -k 1 -g 0.001 -c 1000 -f 1 -t 4 training.txt model.mod

The classifier will be saved in a file called "model.mod".

In this example the options -k 1 and -f 1 are not neccesary because they are the default values of these parameters.

Example 2 - Training using budgeted-train

If you have a training set in a file called "training.csv" with csv format (-f 0) and for computational reasons you want to obtain an approximate solution (budgeted-train command) with a fixed classifier size = 100 (-s 100) with gaussian kernel (-k 1), gamma = 0.1 (-g 0.1), cost = 10 (-c 10) and using 2 threads to speed up the training (-t 2) you can use the following command:

/Users/robedm/LIBIRWLS

./bin/budgeted-train -k 1 -g 0.1 -a 0 -s 100 -c 10 -f 0 -t 2 training.csv model2.mod

The classifier will be saved in a file called "model2.mod".

Example 3 - Classification

If you want to use the model created in the Example 2 (model2.mod) to classifiy a labeled dataset (-l 1) stored in a file called "test.txt" with libsvm format (-f 1) and save the hard predictions (-s 1, see outputs) in a file called output.txt you can use this command:

/Users/robedm/LIBIRWLS

./bin/LIBIRWLS-predict -f 1 -l 1 test.txt model2.mod output.txt