This library can currently train full SVMs and Budgeted SVMs. Once this library has been compiled, you can find the executable files under the folder "bin". The syntax to train a model is very intuitive a similar to other svm based tools like libsvm or svmlight.
To train a model using the algorithms you must use their respective commands followed by the options (see options section), the file that contains the labeled training set (see datasets section to see the supported formats) and the file where the trained model will be saved (see model file section):
To classify a new dataset you must specify the command LIBIRWLS-predict followed by the options (see options section), the file that contains the dataset to classify (it can be a labeled or unalbeled dataset, see datasets section to see the supported formats), the file that contains a trained model (previously obtained using the algorithms, see model file section) and finally the file to save the classification of the dataset (see output section):
General options
Specific options of full-train
Specific options of budgeted-train
To use with the command LIBIRWLS-predict.
This library supports datasets in two different file formats LibSVM and CSV (the format must be specified using the options). The datasets must be labeled for training and labeled or unlabeled for testing. The labels must take values +1 and -1.
You can find a high number of datasets in libSVM format in the libsvm repository. You can test this library with them, take into account that many of them don't have labels with values +1 and -1 and you need to adapt them.
LibSVM format
Labeled - Every line is a sample and is composed by the class (+1 or -1) and followed by pairs of feature index feature value:
+1 1:5 7:2 15:6
+1 1:5 7:2 13:6 23:3
-1 2:4 7:3 10:6 23:1
Unlabeled - Every line is a sample and is composed by pairs of feature index and feature value:
1:5 7:2 15:6
1:5 7:2 13:6 23:3
2:4 7:3 10:6 23:1
CSV format
Each line of the file is a data record. Each record consists of one or more fields, separated by commas (the separator character can be modified using the command options).
Labeled - The first value of every row must tell the class of the data and must take +1 or -1 value:
+1,5,0,0,3,0,0,25
+1,5,0,2,0,6,3,0
-1,0,4,3,0,0,6,1
Unlabeled - Every line contains only the value of every feature
5,0,0,3,0,0,25
5,0,2,0,6,3,0
0,4,3,0,0,6,1
The model files store a trined classifier and they are not directly editable. They contain the numerical values of the model in binary format.
The classification function of the SVM takes the form of a linear combination of kernel functions:
$f(\mathbf{x_i})=\sum_{j=1}^m\beta_j K(\mathbf{x}_i,\mathbf{c}_j)$
If this value is positive, the sample will be classified as class +1. Otherwise, the data will be classified as -1. A hard classification directly tell the class, a soft classification tell the output of this classification function.
When we use the command "LIBIRWLS-predict" to classify a dataset the result will be stored in a file where every row is the classification of a data. The format (soft or hard) can be specified using the options.
If you have a training set in a file called "training.txt" with libsvm format (-f 1) and you want to create a classifier using our full SVM training procedure (full-train command) with gaussian kernel (-k 1), gamma = 0.001 (-g 0.001), cost = 1000 (-c 1000) and using 4 threads to speed up the training (-t 4) you can use the following command:
The classifier will be saved in a file called "model.mod".
In this example the options -k 1 and -f 1 are not neccesary because they are the default values of these parameters.
If you have a training set in a file called "training.csv" with csv format (-f 0) and for computational reasons you want to obtain an approximate solution (budgeted-train command) with a fixed classifier size = 100 (-s 100) with gaussian kernel (-k 1), gamma = 0.1 (-g 0.1), cost = 10 (-c 10) and using 2 threads to speed up the training (-t 2) you can use the following command:
The classifier will be saved in a file called "model2.mod".
If you want to use the model created in the Example 2 (model2.mod) to classifiy a labeled dataset (-l 1) stored in a file called "test.txt" with libsvm format (-f 1) and save the hard predictions (-s 1, see outputs) in a file called output.txt you can use this command: