pyplt.evaluation package

Submodules

pyplt.evaluation.base module

class pyplt.evaluation.base.Evaluator(description='A validation/testing method.', name='', debug=False, **kwargs)

Bases: object

Base class for all evaluation (validation or testing) methods.

Initializes the Evaluator object.

Parameters:
  • description (str, optional) – a description of the evaluation method (default “A validation/testing method.”).
  • name (str, optional) – the name of the evaluation method (default “”).
  • debug (bool, optional) – specifies whether or not to print notes to console for debugging purposes (default False).
  • kwargs – any additional parameters for the evaluation method.
get_description()

Get the description of the evaluation method.

Returns:the description of the evaluation method.
Return type:str
get_name()

Get the name of the evaluation method.

Returns:the name of the evaluation method.
Return type:str
get_params()

Return all additional parameters of the evaluation method (if applicable).

Returns:a dict containing all additional parameters of the evaluation method with the parameter names as the dict’s keys and the corresponding parameter values as the dict’s values (if applicable).
Return type:dict
get_params_string()

Return a string representation of all additional parameters of the evaluation method (if applicable).

Returns:the string representation of all additional parameters of the evaluation method (if applicable).
Return type:str

pyplt.evaluation.cross_validation module

class pyplt.evaluation.cross_validation.KFoldCrossValidation(k=3, test_folds=None)

Bases: pyplt.evaluation.base.Evaluator

K-Fold Cross Validation.

Initializes the KFoldCrossValidation object.

The dataset may be split into folds in two ways: automatically or manually. If automatic, the k argument is to be used. If manual, the user may specify the fold index for each sample in the dataset via the test_folds argument.

Parameters:
  • k (int, optional) – the number of folds to uniformly split the data into when using the automatic approach (default 3).
  • test_folds (numpy.ndarray or None, optional) – an array specifying the fold index for each sample in the dataset when using the manual approach (default None). The entry test_folds[i] specifies the index of the test set that sample i belongs to. It is also possible to exclude sample i from any test set (i.e., include sample i in every training set) by setting test_folds[i] to -1. If test_folds is None, the automatic approach is assumed and only the k parameter is considered. Otherwise, the manual approach is assumed and only the test_folds parameter is considered.
Raises:

InvalidParameterValueException – if a k parameter value less than 2 is used.

split(data)

Get the indices for the training set and test set of the next fold of the dataset.

If the single file format is used, the indices are given with respect to objects. Otherwise (if the dual file format is used), the indices are given with respect to the ranks.

Parameters:data (pandas.DataFrame or tuple of pandas.DataFrame (size 2)) – the data to be split into folds. If the single file format is used, a single pandas.DataFrame containing the data should be passed. If the dual file format is used, a tuple containing both the objects and ranks (each a pandas.DataFrame) should be passed.
Returns:yields two arrays containing the integer-based indices for the training set and test set of the next fold.
Return type:
  • train: numpy.ndarray
  • test: numpy.ndarray
class pyplt.evaluation.cross_validation.PreprocessedFolds(folds)

Bases: object

Class for neatly storing and working with a dataset that has been split into two or more folds.

The data in each fold is assumed to be pre-processed prior to instantiation of this class.

Initializes the PreprocessedFolds instance and stores the given fold data.

Parameters:folds (list of tuples (size 4):) –

a list of tuples containing the pre-processed training set and test set (if applicable) of each fold. Each tuple (fold) should contain:

  • train_objects: pandas.DataFrame
  • train_ranks: pandas.DataFrame
  • test_objects: pandas.DataFrame (if applicable) or None
  • test_ranks: pandas.DataFrame (if applicable) or None

If either the test_objects or test_ranks of the first fold is None, it is assumed that only training will be carried out.

get_features()

Get the features defining the objects in the data.

These are determined by looking at the features of the training objects in the first fold.

get_n_folds()

Get the number of folds in the data.

is_training_only()

Indicate whether or not training only is to be applied on the given data.

next_fold()

Get the pre-processed training set and test set (if applicable) of the next fold.

Returns:yields the pre-processed training set and test set (if applicable) of the next fold.
Return type:
  • train_objects: pandas.DataFrame
  • train_ranks: pandas.DataFrame
  • test_objects: pandas.DataFrame (if applicable) or None
  • test_ranks: pandas.DataFrame (if applicable) or None

pyplt.evaluation.holdout module

class pyplt.evaluation.holdout.HoldOut(test_proportion=0.3, debug=False)

Bases: pyplt.evaluation.base.Evaluator

Holdout evaluator.

This evaluation method splits the pairwise rank data into a training set and a test set. The training set is used to train the model via preference learning whereas the test set is used to estimate the prediction accuracy of the model. Often, 70% of the data is used as the training set while the remaining 30% is used as the test set (i.e., a test proportion of 0.3) however the user may choose a different proportion.

Initializes the HoldOut object.

Parameters:
  • test_proportion (float, optional) – the proportion of data to be used as the test set; the remaining data is used as the training data (default 0.3).
  • debug (bool, optional) – specifies whether or not to print notes to console for debugging (default False).
split(data)

Split the given dataset into a training set and a test set according to the given proportion parameter.

If the single file format is used, the indices are given with respect to objects. Otherwise (if the dual file format is used), the indices are given with respect to the ranks.

Parameters:data (pandas.DataFrame or tuple of pandas.DataFrame (size 2)) – the data to be split into folds. If the single file format is used, a single pandas.DataFrame containing the data should be passed. If the dual file format is used, a tuple containing both the objects and ranks (each a pandas.DataFrame) should be passed.
Returns:two arrays containing the indices for the training set and test set.
Return type:
  • train: numpy.ndarray
  • test: numpy.ndarray

Module contents

This package contains backend modules that manage the evaluation step of an experiment.