pyplt.plalgorithms package¶

Submodules¶

pyplt.plalgorithms.backprop_tf module¶

class pyplt.plalgorithms.backprop_tf.BackpropagationTF(ann_topology=None, learn_rate=0.001, error_threshold=0.001, epochs=10, activation_functions=None, batch_size=32, debug=False)¶

Bases: pyplt.plalgorithms.base.PLAlgorithm

Backpropagation algorithm implemented with the tensorflow package.

This is a gradient-descent algorithm that iteratively (over a given number of epochs) optimizes an error function by adjusting the weights of an artificial neural network (ANN) model proportionally to the gradient of the error with respect to the current value of the weights and current data samples. The proportion and therefore the strength of each update is regulated by the given learning rate. The error function used is the Rank Margin function which for a given pair of data samples (A and B, with A preferred over B) yields 0 if the network output for A (fA) is more than one unit larger than the network output for B (fB) and 1.0-((fA)-(fB)) otherwise. The total error is averaged over the complete set of pairs in the training set. If the error is below a given threshold, training stops before reaching the specified number of epochs, and the current weight values are returned as the final model. In PLT, the algorithm was implemented using the tensorflow library.

Initializes the BackpropagationTF object.

Parameters:

ann_topology (list or None, optional) – a list indicating the topology of the artificial neural network (ANN) to be used with the algorithm. The list contains the number of neurons in each layer of the ANN, excludes the input layer but including the output layer (must always be 1 neuron in size); a value of None is equivalent to [1] indicating an ANN with no hidden layers and only an output layer (consisting of 1 neuron) (default None).
learn_rate (float, optional) – the learning rate used in the weight update step of the Backpropagation algorithm (default 0.001).
error_threshold (float, optional) – a threshold at or below which the error of a model is considered to be sufficiently trained (default 0.001).
epochs (int, optional) – the maximum number of iterations the algorithm should make over the entire pairwise rank training set (default 10).
activation_functions (list of pyplt.util.enums.ActivationType or None, optional) – a list of the activation functions to be used across the neurons for each layer of the ANN (default None); if None, all layers will use the Rectified Linear Unit (ReLU) function i.e. pyplt.util.enums.ActivationType.RELU, except for the output layer which will use the Logistic Sigmoid function i.e. pyplt.util.enums.ActivationType.SIGMOID.
batch_size (int, optional) – number of samples per gradient update (default 32).
debug (bool, optional) – specifies whether or not to print notes to console for debugging (default False).

calc_train_accuracy(train_objects, train_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

An algorithm-specific approach to calculating the training accuracy of the learned model.

This method is implemented explicitly for this algorithm since this approach is substantially more efficient for algorithms using the tensorflow package than the calc_train_accuracy() method of pyplt.plalgorithms.base.PLAlgorithm objects allows.

The training accuracy is determined by calculating the percentage of how many of the training ranks the model is able to predict correctly.

Parameters:

train_objects (pandas.DataFrame) – the objects data the model was trained on.
train_ranks (pandas.DataFrame) – the pairwise rank data the model was trained on.
use_feats (list of str or None, optional) – a subset of the original features to be used when training; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

the training accuracy of the learned model – if execution is completed successfully.
None – if aborted before completion by exec_stopper.

Return type:

float

clean_up()¶

Close the tensorflow session once the algorithm class instance is no longer needed.

V.IMP. THIS FUNCTION MUST BE CALLED WHEN THE CLASS INSTANCE IS NO LONGER IN USE unless a context manager is used around the BackpropagationTF class instance!!!

init_train(n_features)¶

Initialize the model (topology).

This method is to be called if one wishes to initialize the model (topology) explicitly. This is done by declaring tensorflow placeholders, variables, and operations. This may be used, for example, to use the same BackpropagationTF object but simply modify its topology while evaluating different feature sets during wrapper-type feature selection processes. If not called explicitly, the train() method will call it once implicitly.

Parameters:	n_features (int) – the number of features to be used during the training process.

load_model()¶: Load a model which was trained using this algorithm. # TODO: to be actually implemented.

predict(input_object, progress_window=None, exec_stopper=None)¶

Predict the output of a given input object by running it through the learned model.

Parameters:	input_object (one row from a pandas.DataFrame) – the input data corresponding to a single object. progress_window (`pyplt.gui.experiment.progresswindow.ProgressWindow`, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None). exec_stopper (`pyplt.util.AbortFlag`, optional) – an abort flag object used to abort the execution before completion (default None).
Returns:	the predicted output resulting from running the learned model using the given input.
Return type:	float

save_model(timestamp, path='', suppress=False)¶

Save the ANN model to a Comma Separated Value (CSV) file at the path indicated by the user.

Optionally, the file creation may be suppressed and a pandas.DataFrame representation of the model returned instead.

The file/DataFrame stores the weights, biases, and activation functions of each neuron in each layer of the ANN. Each row represents these values for a neuron in a layer, starting from the first neuron in the first hidden layer (if applicable), and moving forward neuron-by-neuron, layer-by-layer, until the output neuron is reached. The number of columns is variable as the file stores enough columns to represent the maximum number of weights across all neurons in the network.

Weights columns are labeled with the letter ‘w’ followed by the index of the incoming neuron from which the given weight is connected the current neuron. Hidden layers in the ‘layer’ column are labelled with the letter ‘h’ followed by the index of the layer. The output layer is simply labelled as ‘OUTPUT’.

Parameters:

timestamp (float) – the timestamp to be included in the file name.
path (str, optional) – the path at which the file is to be saved (default “”). If “”, the file is saved to a logs folder in the project root directory by default.
suppress (bool, optional) – specifies whether or not to suppress the file creation and return a pandas.DataFrame representation of the model instead (default False).

Returns:

a pandas.DataFrame representation of the model, if the suppress parameter was set to True, otherwise None.

Return type:

pandas.DataFrame – if suppress is True
None – otherwise

test(objects, test_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

An algorithm-specific approach to testing/validating the model using the given test data.

This method is implemented explicitly for this algorithm since this approach is substantially more efficient for algorithms using the tensorflow package than the test() method of the base class pyplt.plalgorithms.base.PLAlgorithm.

Parameters:

objects (pandas.DataFrame) – the objects data for the model to be tested/validated on.
test_ranks (pandas.DataFrame) – the pairwise rank data for the model to be tested/validated on.
use_feats (list of str or None, optional) – a subset of the original features to be used during the testing/validation process; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

the test/validation accuracy of the learned model – if execution is completed successfully.
None – if aborted before completion by exec_stopper.

Return type:

float

train(train_objects, train_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

Run a tensorflow session to infer an ANN model using the given training data.

The given pairwise rank data is split into a set of preferred objects and a set of non-preferred objects, which are then fed into the ANN. The resulting (predicted) model output of each object in a given rank pair is compared to the actual preference and the error is calculated via a Rank Margin error function. The algorithm attempts to optimize the average error over the entire set of ranks across several iterations (epochs) until it reaches the maximum number number of iterations (epochs) or reaches the error threshold.

Parameters:

train_objects (pandas.DataFrame) – the objects data to train the model on.
train_ranks (pandas.DataFrame) – the pairwise rank data to train the model on.
use_feats (list of str or None, optional) – a subset of the original features to be used when training; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

True – if execution is completed successfully.
None – if experiment is aborted before completion by exec_stopper.

static transform_data(object_)¶

Transform an object into the format required by this particular algorithm implementation.

In this case, nothing changes.

Parameters:	object (one row from a pandas.DataFrame) – the object to be transformed.
Returns:	the transformed object in the form of an array.
Return type:	numpy.ndarray

pyplt.plalgorithms.base module¶

class pyplt.plalgorithms.base.PLAlgorithm(description='A preference learning algorithm.', name='', debug=False, **kwargs)¶

Bases: object

Base class for all preference learning algorithms.

Initializes the PLAlgorithm object.

Parameters:	description (str, optional) – a description of the algorithm (default “A preference learning algorithm.”). name (str, optional) – the name of the algorithm (default “”). debug (bool, optional) – specifies whether or not to print notes to console for debugging (default False). kwargs – any additional parameters for the algorithm.

calc_train_accuracy(train_objects, train_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

Base method for calculating the training accuracy of the learned model.

The training accuracy is determined by calculating the percentage of how many of the training ranks the model is able to predict correctly.

Parameters:

train_objects (pandas.DataFrame) – the objects data the model was trained on.
train_ranks (pandas.DataFrame) – the pairwise rank data the model was trained on.
use_feats (list of str or None, optional) – a subset of the original features to be used when training; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

the training accuracy of the learned model – if execution is completed successfully.
None – if aborted before completion by exec_stopper.

Return type:

float

clean_up()¶

Base method for any potential final clean up instructions to be carried out.

Does nothing unless overriden in child class.

get_description()¶

Get the preference learning algorithm.

Returns:	the description of the algorithm.
Return type:	str

get_name()¶

Get the name of the preference learning algorithm.

Returns:	the name of the algorithm.
Return type:	str

get_params()¶

Return all additional parameters of the preference learning algorithm (if applicable).

Returns:	a dict containing all additional parameters of the algorithm with the parameter names as the dict’s keys and the corresponding parameter values as the dict’s values (if applicable).
Return type:	dict

get_params_string()¶

Return a string representation of all additional parameters of the preference learning algorithm (if applicable).

Returns:	the string representation of all additional parameters of the algorithm (if applicable).
Return type:	str

get_train_accuracy()¶

Get the training accuracy of the learned model.

Returns:	the training accuracy of the learned model.
Return type:	float

init_train(n_features)¶

Abstract method for carrying out any initializations prior to the training stage.

All children classes must implement this method.

Parameters:	n_features (int) – the number of features to be used during the training process.

load_model()¶

Abstract method for loading a model which was trained using this algorithm.

All children classes must implement this method.

predict(input_object, progress_window=None, exec_stopper=None)¶

Abstract method for predicting the output of a given input by running it through the learned model.

All children classes must implement this method.

Parameters:	input_object (one row from a pandas.DataFrame) – the input data corresponding to a single object. progress_window (`pyplt.gui.experiment.progresswindow.ProgressWindow`, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None). exec_stopper (`pyplt.util.AbortFlag`, optional) – an abort flag object used to abort the execution before completion (default None).
Returns:	a list containing the predicted output resulting from running the learned model using the given input.
Return type:	list of float (size 1)

save_model(timestamp, path='', suppress=False)¶

Abstract model to save the model to file.

Optionally, the file creation may be suppressed and a pandas.DataFrame representation of the model returned instead.

All children classes must implement this method.

Parameters:

timestamp (float) – the timestamp to be included in the file name.
path (str, optional) – the path at which the file is to be saved (default “”). If “”, the file is saved to a logs folder in the project root directory by default.
suppress (bool, optional) – specifies whether or not to suppress the file creation and return a pandas.DataFrame representation of the model instead (default False).

Returns:

a pandas.DataFrame representation of the model, if the suppress parameter was set to True, otherwise None.

Return type:

pandas.DataFrame – if suppress is True
None – otherwise

save_model_with_dialog(timestamp, parent_window, suffix='')¶

Open a file dialog window (GUI) and save the learned model to file at the path indicated by the user.

The model file must be a Comma Separated Value (CSV)-type file with the extension ‘.csv’.

Parameters:	timestamp (float) – the timestamp to be included in the file name. parent_window (tkinter.Toplevel) – the window widget which the file dialog window widget will be stacked on top of. suffix (str, optional) – an additional string to add at the end of the file name (default “”).
Returns:	specifies whether or not the file was successfully saved.
Return type:	bool

test(objects, test_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

Base method for calculating the prediction accuracy of the learned model on a given dataset (test set).

The prediction accuracy is determined by calculating the percentage of how many of the test ranks the model is able to predict correctly.

Parameters:

objects (pandas.DataFrame) – the objects data to be predicted by the model.
test_ranks (pandas.DataFrame) – the pairwise rank data to be predicted by the model.
use_feats (list of str or None, optional) – a subset of the original features to be used during the prediction process; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

the prediction accuracy of the learned model – if execution is completed successfully.
None – if aborted before completion by exec_stopper.

train(train_objects, train_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

Abstract method for the training stage in the machine learning process.

All children classes must implement this method.

Parameters:

train_objects (pandas.DataFrame) – containing the objects data to train the model on.
train_ranks (pandas.DataFrame) – containing the pairwise rank data to train the model on.
use_feats (list of str or None, optional) – a subset of the original features to be used when training; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

True or any other value – if execution is completed successfully.
None – if experiment is aborted before completion by exec_stopper.

static transform_data(object_)¶

Abstract method to transform a sample (object) into the format required by this particular algorithm implementation.

All children classes must implement this method.

Parameters:	object (one row from a pandas.DataFrame) – the data sample (object) to be transformed.
Returns:	the transformed object.

pyplt.plalgorithms.ranknet module¶

class pyplt.plalgorithms.ranknet.RankNet(ann_topology=None, learn_rate=0.001, epochs=100, hidden_activation_functions=None, batch_size=32, debug=False)¶

Bases: pyplt.plalgorithms.base.PLAlgorithm

RankNet algorithm implemented with the keras package.

The RankNet algorithm is an extension of the Backpropagation algorithm which uses a probabilistic cost function to handle ordered pairs of data. As in Backpropagation, the algorithm iteratively (over a given number of epochs) optimizes the error function by adjusting the weights of an artificial neural network (ANN) model proportionally to the gradient of the error with respect to the current value of the weights and current data samples. The error function used is the binary cross-entropy function. The proportion and therefore the strength of each update is regulated by the given learning rate. The total error is averaged over the complete set of pairs in the training set. In PLT, the algorithm was implemented using the keras library.

Initialize the RankNet instance.

Parameters:

ann_topology (list or None, optional) – a list indicating the topology of the artificial neural network (ANN) to be used with the algorithm. The list contains the number of neurons in each layer of the ANN, excludes the input layer but including the output layer (must always be 1 neuron in size); a value of None is equivalent to [1] indicating an ANN with no hidden layers and only an output layer (consisting of 1 neuron) (default None).
hidden_activation_functions (list of pyplt.plalgorithms.backprop_tf.ActivationType or None, optional) – a list of the activation function to be used across the neurons for each hidden layer of the ANN; if None, all hidden layers will use the Rectified Linear Unit (ReLU) function i.e. pyplt.plalgorithms.backprop_tf.ActivationType.RELU (default None). Note that this parameter excludes the activation function at the output layer of the network which is fixed.
learn_rate (float, optional) – the learning rate used in the weight update step of the Backpropagation algorithm (default 0.001).
epochs (int, optional) – the maximum number of iterations the algorithm should make over the entire pairwise rank training set (default 10).
batch_size (int, optional) – number of samples per gradient update (default 32).
debug (bool, optional) – specifies whether or not to print notes to console for debugging (default False).

calc_train_accuracy(train_objects, train_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

An algorithm-specific approach to calculating the training accuracy of the learned model.

This method is implemented explicitly for this algorithm since this approach is substantially more efficient for algorithms using the keras package than the calc_train_accuracy() method of pyplt.plalgorithms.base.PLAlgorithm objects allows.

The training accuracy is determined by calculating the percentage of how many of the training ranks the model is able to predict correctly.

Parameters:

train_objects (pandas.DataFrame) – the objects data the model was trained on.
train_ranks (pandas.DataFrame) – the pairwise rank data the model was trained on.
use_feats (list of str or None, optional) – a subset of the original features to be used when training; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

the training accuracy of the learned model – if execution is completed successfully.
None – if aborted before completion by exec_stopper.

Return type:

float

clean_up()¶: Close the backend tensorflow session once the algorithm class instance is no longer needed.

init_train(n_features)¶

Initialize the model (topology).

This is done by declaring keras placeholders, variables, and operations. This may also be used, for example, to simply modify (re-initialize) the topology of the model while evaluating different feature sets during wrapper-type feature selection processes.

Parameters:	n_features (int) – the number of features to be used during the training process.

predict(input_object, progress_window=None, exec_stopper=None)¶

Abstract method for predicting the output of a given input by running it through the learned model.

All children classes must implement this method.

Parameters:	input_object (one row from a pandas.DataFrame) – the input data corresponding to a single object. progress_window (`pyplt.gui.experiment.progresswindow.ProgressWindow`, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None). exec_stopper (`pyplt.util.AbortFlag`, optional) – an abort flag object used to abort the execution before completion (default None).
Returns:	a list containing the predicted output resulting from running the learned model using the given input.
Return type:	list of float (size 1)

save_model(timestamp, path='', suppress=False)¶

Save the trained model to file in a human-readable format.

Optionally, the file creation may be suppressed and a pandas.DataFrame representation of the model returned instead.

Parameters:

timestamp (float) – the timestamp to be included in the file name.
path (str, optional) – the path at which the file is to be saved (default “”). If “”, the file is saved to a logs folder in the project root directory by default.
suppress (bool, optional) – specifies whether or not to suppress the file creation and return a pandas.DataFrame representation of the model instead (default False).

Returns:

a pandas.DataFrame representation of the model, if the suppress parameter was set to True, otherwise None.

Return type:

pandas.DataFrame – if suppress is True
None – otherwise

test(objects, test_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

An algorithm-specific approach to testing/validating the model using the given test data.

This method is implemented explicitly for this algorithm since this approach is substantially more efficient for algorithms using the keras package than the test() method of the base class pyplt.plalgorithms.base.PLAlgorithm.

Parameters:

objects (pandas.DataFrame) – the objects data for the model to be tested/validated on.
test_ranks (pandas.DataFrame) – the pairwise rank data for the model to be tested/validated on.
use_feats (list of str or None, optional) – a subset of the original features to be used during the testing/validation process; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

the test/validation accuracy of the learned model – if execution is completed successfully.
None – if aborted before completion by exec_stopper.

Return type:

float

train(train_objects, train_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

Infer an ANN model using the given training data.

Parameters:

train_objects (pandas.DataFrame) – the objects data to train the model on.
train_ranks (pandas.DataFrame) – the pairwise rank data to train the model on.
use_feats (list of str or None, optional) – a subset of the original features to be used when training; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

True – if execution is completed successfully.
None – if experiment is aborted before completion by exec_stopper.

transform_data(object_)¶

Transform a sample (object) into the format required by this particular algorithm implementation.

In this case, no transformation is needed.

Parameters:	object (one row from a pandas.DataFrame) – the data sample (object) to be transformed.
Returns:	the transformed object (same as object_ in this case).

pyplt.plalgorithms.ranksvc module¶

class pyplt.plalgorithms.ranksvc.RankSVC(kernel=<KernelType.RBF: 1>, gamma='auto', degree=3, debug=False)¶

Bases: pyplt.plalgorithms.base.PLAlgorithm

RankSVM algorithm implemented using the scikit-learn library.

N.B. This implementation is similar to the implementation in the pyplt.plalgorithms.ranksvm.RankSVM class but instead of using the OneClassSVM class of the scikit-learn libary, this implementation uses the SVC class of the same library. The input and output of the model are treated differently as the SVC model is a binary classifier (see pairwise_transform_from_ranks()). Consequently, unlike the RankSVM implementation, the model cannot predict a real-valued output for a single object/instance. Rather, the model can only be used on pairs of objects in order for the output to make sense. This implementation is only available in the API of PLT.

A Support Vector Machine (SVM) is a binary classifier that separates the input put samples linearly in a projected space. The decision boundary of the classifier is given by a linear combination of training samples (called support vectors) in the projected space. The projection in provided by the kernel function that the user must select. The support vector and weights are selected to satisfy a set of constrains derived from the input samples and a cost parameter which regulates the penalization of misclassified training samples. In PLT, the algorithm was implemented using the scikit-learn library. In this implementation, the quadratic programmer solver contained in LIBSVM is used. The RankSVM algorithm is a rank-based version of traditional SVM training algorithms. It uses the same solver as standard training algorithms for binary SVMs; the only difference lies in the set of constraints which are defined in terms of pairwise preferences between training samples.

Initializes the RankSVM object.

Parameters:

kernel (pyplt.util.enums.KernelType, optional) – the kernel function mapping the input samples to the projected space (default pyplt.util.enums.KernelType.RBF).
gamma (float or 'auto', optional) – the kernel coefficient for the ‘rbf’, ‘poly’ and ‘sigmoid’ kernels. If gamma is set to ‘auto’ then 1/n_features will be used instead (default ‘auto’).
degree (float, optional) – the degree of the polynomial (‘poly’) kernel function (default 3).
debug (bool, optional) – specifies whether or not to print notes to console for debugging (default False).

Raises:

InvalidParameterValueException – if the user attempts to use a gamma value <= 0.0.

calc_train_accuracy(train_objects, train_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

An algorithm-specific approach to calculates the training accuracy of the learned model.

This method is tailored specifically for this algorithm implementation and therefore replaces the calc_train_accuracy() method of pyplt.plalgorithms.base.PLAlgorithm.

The training accuracy is determined by calculating the percentage of how many of the training ranks the binary classification model is able to predict correctly.

Parameters:

train_objects (pandas.DataFrame) – the objects data the model was trained on.
train_ranks (pandas.DataFrame) – the pairwise rank data the model was trained on.
use_feats (list of str or None, optional) – a subset of the original features to be used when training; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

the training accuracy of the learned model – if execution is completed successfully.
None – if aborted before completion by exec_stopper.

Return type:

float

pairwise_transform_from_ranks(objects, ranks, use_feats=None)¶

Convert a rank-based dataset into the required format for use by RankSVM prior to the training stage.

For each rank (pair of objects) in ranks, a feature vector subtraction is carried out between the two objects (both feature vectors) from either side (i.e., a-b and b-a for a given pair of objects/feature vectors a and b where a is preferred over b) and stored as a new transformed data point in X_trans. Additionally, for each positive difference (a-b), a value of +1 is stored as its corresponding target class label in y_trans whereas value of -1 is stored for each negative difference (b-a).

Parameters:

objects (pandas.DataFrame) – the objects data to be converted.
ranks (pandas.DataFrame) – the pairwise rank data to be converted.
use_feats (list of str or None, optional) – a subset of the original features to be used when training (default None). If None, all original features are used.

Returns:

a tuple containing:

the converted dataset ready to be used by RankSVM in the form of two arrays:
- array of shape (n_ranks*2, n_feaures) which stores the positive and negative feature vector differences for each rank.
- array of shape n_ranks*2 which stores the corresponding target class labels (alternating +1s and -1s).
a copy of the actual objects data (pandas.DataFrame) used in the transformation.

Return type:

tuple (size 3)

save_model(timestamp, path='', suppress=False)¶

Save the RankSVM model to a Comma Separated Value (CSV) file at the path indicated by the user.

Optionally, the file creation may be suppressed and a pandas.DataFrame representation of the model returned instead.

The file/DataFrame stores support vectors and corresponding alpha values of the SVM model.

The first column contains the support vectors each representing an object ID. The second column contains the alpha values corresponding to the support vectors in the first column.

The parameters (kernel, gamma and degree) used to construct the model are stored within the file name.

Parameters:

timestamp (float) – the timestamp to be included in the file name.
path (str, optional) – the path at which the file is to be saved (default “”). If “”, the file is saved to a logs folder in the project root directory by default. The kernel, gamma, and degree parameters are automatically included in the file name.
suppress (bool, optional) – specifies whether or not to suppress the file creation and return a pandas.DataFrame representation of the model instead (default False).

Returns:

a pandas.DataFrame representation of the model, if the suppress parameter was set to True, otherwise None.

Return type:

pandas.DataFrame – if suppress is True
None – otherwise

test(objects, test_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

An algorithm-specific approach to testing/validating the model using the given test data.

Parameters:

objects (pandas.DataFrame) – the objects data that the model was trained on.
test_ranks (pandas.DataFrame) – the pairwise rank data for the model to be tested/validated on.
use_feats (list of str or None, optional) – a subset of the original features to be used during the testing/validation process; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

the test/validation accuracy of the learned model – if execution is completed successfully.
None – if aborted before completion by exec_stopper.

Return type:

float

train(train_objects, train_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

Train a RankSVM model on the given training data.

Parameters:

train_objects (pandas.DataFrame) – the objects data to train the model on.
train_ranks (pandas DataFrame) – the pairwise rank data to train the model on.
use_feats (list of str or None, optional) – a subset of the original features to be used when training; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

None – if experiment is aborted before completion by exec_stopper.

static transform_data(object_)¶

Transform an object into the format required by this particular implementation of RankSVM.

Parameters:	object (one row from a pandas.DataFrame) – the object to be transformed.
Returns:	the transformed object in the form of an array.
Return type:	numpy.ndarray

pyplt.plalgorithms.ranksvm module¶

class pyplt.plalgorithms.ranksvm.RankSVM(kernel=<KernelType.RBF: 1>, gamma='auto', degree=3, debug=False)¶

Bases: pyplt.plalgorithms.base.PLAlgorithm

RankSVM algorithm implemented using the scikit-learn library.

A Support Vector Machine (SVM) is a binary classifier that separates the input put samples linearly in a projected space. The decision boundary of the classifier is given by a linear combination of training samples (called support vectors) in the projected space. The projection in provided by the kernel function that the user must select. The support vector and weights are selected to satisfy a set of constrains derived from the input samples and a cost parameter which regulates the penalization of misclassified training samples. In PLT, the algorithm was implemented using the scikit-learn library. In this implementation, the quadratic programmer solver contained in LIBSVM is used. The RankSVM algorithm is a rank-based version of traditional SVM training algorithms. It uses the same solver as standard training algorithms for binary SVMs; the only difference lies in the set of constraints which are defined in terms of pairwise preferences between training samples.

Initializes the RankSVM object.

Parameters:

kernel (pyplt.util.enums.KernelType, optional) – the kernel function mapping the input samples to the projected space (default pyplt.util.enums.KernelType.RBF).
gamma (float or 'auto', optional) – the kernel coefficient for the ‘rbf’, ‘poly’ and ‘sigmoid’ kernels. If gamma is set to ‘auto’ then 1/n_features will be used instead (default ‘auto’).
degree (float, optional) – the degree of the polynomial (‘poly’) kernel function (default 3).
debug (bool, optional) – specifies whether or not to print notes to console for debugging (default False).

Raises:

InvalidParameterValueException – if the user attempts to use a gamma value <= 0.0.

calc_train_accuracy(train_objects, train_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

An algorithm-specific approach to calculates the training accuracy of the learned model.

This method is tailored specifically for this algorithm implementation and therefore replaces the calc_train_accuracy() method of pyplt.plalgorithms.base.PLAlgorithm.

The training accuracy is determined by calculating the percentage of how many of the training ranks the model is able to predict correctly.

Parameters:

train_objects (pandas.DataFrame) – the objects data the model was trained on.
train_ranks (pandas.DataFrame) – the pairwise rank data the model was trained on.
use_feats (list of str or None, optional) – a subset of the original features to be used when training; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

the training accuracy of the learned model – if execution is completed successfully.
None – if aborted before completion by exec_stopper.

Return type:

float

predict_m(input_objects, progress_window=None, exec_stopper=None)¶

Predict the output of a given set of input samples by running them through the learned RankSVM model.

Parameters:

input_objects (numpy.ndarray) – array of shape [n_samples, n_feats] containing the input data corresponding to a set of (test) objects.
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

a list containing the average predicted output resulting from running the learned model using the given input objects – if execution is completed successfully.
None – if aborted before completion by exec_stopper.

Return type:

list of float (size 1)

save_model(timestamp, path='', suppress=False)¶

Save the RankSVM model to a Comma Separated Value (CSV) file at the path indicated by the user.

Optionally, the file creation may be suppressed and a pandas.DataFrame representation of the model returned instead.

The file/DataFrame stores support vectors and corresponding alpha values of the SVM model.

The first column contains the support vectors each representing a rank in the form of a tuple (int, int) containing the ID of the preferred object in the pair, followed by the ID of the non-preferred object in the pair. The second column contains the alpha values corresponding to the support vectors in the first column.

The parameters (kernel, gamma and degree) used to construct the model are stored within the file name.

Parameters:

timestamp (float) – the timestamp to be included in the file name.
path (str, optional) – the path at which the file is to be saved (default “”). If “”, the file is saved to a logs folder in the project root directory by default. The kernel, gamma, and degree parameters are automatically included in the file name.
suppress (bool, optional) – specifies whether or not to suppress the file creation and return a pandas.DataFrame representation of the model instead (default False).

Returns:

a pandas.DataFrame representation of the model, if the suppress parameter was set to True, otherwise None.

Return type:

pandas.DataFrame – if suppress is True
None – otherwise

test(objects, test_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

An algorithm-specific approach to testing/validating the model using the given test data.

Parameters:

objects (pandas.DataFrame) – the objects data that the model was trained on.
test_ranks (pandas.DataFrame) – the pairwise rank data for the model to be tested/validated on.
use_feats (list of str or None, optional) – a subset of the original features to be used during the testing/validation process; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

the test/validation accuracy of the learned model – if execution is completed successfully.
None – if aborted before completion by exec_stopper.

Return type:

float

train(train_objects, train_ranks, use_feats=None, progress_window=None, exec_stopper=None)¶

Train a RankSVM model on the given training data.

Parameters:

train_objects (pandas.DataFrame) – the objects data to train the model on.
train_ranks (pandas DataFrame) – the pairwise rank data to train the model on.
use_feats (list of str or None, optional) – a subset of the original features to be used when training; if None, all original features are used (default None).
progress_window (pyplt.gui.experiment.progresswindow.ProgressWindow, optional) – a GUI object (extending the tkinter.Toplevel widget) used to display a progress log and progress bar during the experiment execution (default None).
exec_stopper (pyplt.util.AbortFlag, optional) – an abort flag object used to abort the execution before completion (default None).

Returns:

True – if execution is completed successfully.
None – if experiment is aborted before completion by exec_stopper.

static transform_data(object_)¶

Transform an object into the format required by this particular implementation of RankSVM.

Parameters:	object (one row from a pandas.DataFrame) – the object to be transformed.
Returns:	the transformed object in the form of an array.
Return type:	numpy.ndarray

Module contents¶

This package contains backend modules that manage the preference learning step of an experiment.