learningmodels.scikit

Module Contents

class learningmodels.scikit.GaussianProcessRegressorModel(units=None, **kwargs)

Learns the duration of a task from data using scikit-learn’s GaussianProcessRegressor

model

GaussianProcessRegressor – The underlying model used to predict the data

units

TimeUnits, optional – The time units the resulting durations should be in. Defaults to TimeUnits.seconds

is_trained

bool – A boolean value indicating if the model has been trained.

ordering

list[str] – The ordering of the input data used to construct input data

Parameters:units (TimeUnits, optional) – The time units the resulting durations should be in. Defaults to TimeUnits.seconds
Keyword Arguments:
 kernel – The kernel to use in the regressor model. Defaults to ConstantKernel() + Matern(length_scale=1, nu=3 / 2) + WhiteKernel(noise_level=1)
__init__(units=None, **kwargs)
train(input_data, durations, ordering=None)

Trains the model from input data and durations

Note

If a Pandas DataFrame is used for the input data, the ordering of the data will be determined by the ordering of the colunms. If a pandas DataFrame is not used, then the ordering will need to be provided. Each Task must provide data as a dictionary in which the keys are the same as the names in the ordering/column names of the DataFrame

Parameters:
  • input_data (array-like) – The data to train the data from
  • durations (array-like) – The durations associated with the data
  • ordering (list[str], optional) – The ordering of the data
Raises:

ValueError – When a non-DataFrame is provided as the input_data and no ordering is provided

predict(input_data)

Predicts the duration of a task given its data

Parameters:
  • input_data (dict) – A dict containing the data necessary to predict the duration. The format must be as
  • pairs in which the key is the name of the data and the value is its value. (key-value) –
Returns:

The estimated duration of the task.

Return type:

DurationPdf