hyperparameter_hunter package¶

Subpackages¶

Submodules¶

hyperparameter_hunter.algorithm_handlers module¶

hyperparameter_hunter.algorithm_handlers.identify_algorithm(model_initializer)¶

Determine the name, and module of the algorithm provided by model_initializer

Parameters

model_initializer: functools.partial, or class, or class instance: The algorithm class being used to initialize a model

Returns

algorithm_name: str: The name of the algorithm provided by model_initializer
module_name: str: The name of the module housing the algorithm provided by model_initializer

Examples

>>> from sklearn.cluster import DBSCAN, SpectralClustering
>>> from functools import partial
>>> identify_algorithm(DBSCAN)
('DBSCAN', 'sklearn')
>>> identify_algorithm(DBSCAN())
('DBSCAN', 'sklearn')
>>> identify_algorithm(partial(SpectralClustering))
('SpectralClustering', 'sklearn')

hyperparameter_hunter.algorithm_handlers.identify_algorithm_hyperparameters(model_initializer)¶

Determine keyword-arguments accepted by model_initializer, along with their default values

Parameters

model_initializer: functools.partial, or class, or class instance: The algorithm class being used to initialize a model

Returns

hyperparameter_defaults: dict: The dict of kwargs accepted by model_initializer and their default values

hyperparameter_hunter.environment module¶

This module is central to the proper functioning of the entire library. It defines Environment, which (when activated) is used by the vast majority of the other operation-critical modules in the library. Environment can be viewed as a simple storage container that defines settings that characterize the Experiments/OptimizationProtocols to be conducted, and influence how those processes are carried out

Notes¶

Despite the fact that hyperparameter_hunter.settings is the only module listed as being “related”, pretty much all the other modules in the library are related to hyperparameter_hunter.environment.Environment by way of this relation

class hyperparameter_hunter.environment.Environment(train_dataset, environment_params_path=None, *, results_path=None, metrics=None, holdout_dataset=None, test_dataset=None, target_column=None, id_column=None, do_predict_proba=None, prediction_formatter=None, metrics_params=None, cv_type=None, runs=None, global_random_seed=None, random_seeds=None, random_seed_bounds=None, cv_params=None, verbose=None, file_blacklist=None, reporting_params=None, to_csv_params=None, do_full_save=None, experiment_callbacks=None, experiment_recorders=None, save_transformed_metrics=None)¶

Bases: object

Class to organize the parameters that allow Experiments/OptPros to be fairly compared

Environment is the collective starting point for all of HyperparameterHunter’s biggest and best toys: Experiments and OptimizationProtocols. Without an Environment, neither of these will work.

The Environment is where we declare all the parameters that transcend traditional “hyperparameters”. It houses the stuff without which machine learning can’t even really start. Specifically, Environment cares about 1) The data used for fitting/predicting, 2) The cross-validation scheme used to split the data and fit models; and 3) How to evaluate the predictions made on that data. There are plenty of other goodies documented below, but the absolutely mission-critical parameters concerned with the above tasks are train_dataset, cv_type, cv_params, and metrics. Additionally, it’s important to provide results_path, so Experiment/OptPro results can be saved, which is kind of what HyperparameterHunter is all about

Parameters

train_dataset: Pandas.DataFrame, or str path

The training data for the experiment. Will be split into train/holdout data, if applicable, and train/validation data if cross-validation is to be performed. If str, will attempt to read file at path via pandas.read_csv(). For more information on which columns will be used during fitting/predicting, see the “Dataset columns” note in the “Notes” section below

environment_params_path: String path, or None, default=None

If not None and is valid .json filepath containing an object (dict), the file’s contents are treated as the default values for all keys that match any of the below kwargs used to initialize Environment

results_path: String path, or None, default=None

If valid directory path and the results directory has not yet been created, it will be created here. If this does not end with <ASSETS_DIRNAME>, it will be appended. If <ASSETS_DIRNAME> already exists at this path, new results will also be stored here. If None or invalid, results will not be stored

metrics: Dict, List, or None, default=None

Iterable describing the metrics to be recorded, along with a means to compute the value of each metric. Should be of one of the two following forms:

List Form:

[“<metric name>”, “<metric name>”, …]: Where each value is a string that names an attribute in sklearn.metrics
[Metric, Metric, …]: Where each value of the list is an instance of metrics.Metric
[(<name>, <metric_function>, [<direction>]), (<*args>), …]: Where each value of the list is a tuple of arguments that will be used to instantiate a metrics.Metric. Arguments given in tuples must be in order expected by metrics.Metric: (name, metric_function, direction)

Dict Form:

{“<metric name>”: <metric_function>, …}: Where each key is a name for the corresponding metric callable, which is used to compute the value of the metric
{“<metric name>”: (<metric_function>, <direction>), …}: Where each key is a name for the corresponding metric callable and direction, all of which are used to instantiate a metrics.Metric
{“<metric name>”: “<sklearn metric name>”, …}: Where each key is a name for the metric, and each value is the name of the attribute in sklearn.metrics for which the corresponding key is an alias
{“<metric name>”: None, …}: Where each key is the name of the attribute in sklearn.metrics
{“<metric name>”: Metric, …}: Where each key names an instance of metrics.Metric. This is the internally-used format to which all other formats will be converted

Metric callable functions should expect inputs of form (target, prediction), and should return floats. See the documentation of metrics.Metric for information regarding expected parameters and types

holdout_dataset: Pandas.DataFrame, callable, str path, or None, default=None

If pd.DataFrame, this is the holdout dataset. If callable, expects a function that takes (self.train: DataFrame, self.target_column: str) as input and returns the new (self.train: DataFrame, self.holdout: DataFrame). If str, will attempt to read file at path via pandas.read_csv(). Else, there is no holdout set. For more information on which columns will be used during fitting/predicting, see the “Dataset columns” note in the “Notes” section below

test_dataset: Pandas.DataFrame, str path, or None, default=None

The testing data for the experiment. Structure should be identical to that of train_dataset, except its target_column column can be empty or non-existent, because test_dataset predictions will never be evaluated. If str, will attempt to read file at path via pandas.read_csv(). For more information on which columns will be used during fitting/predicting, see the “Dataset columns” note in the “Notes” section below

target_column: Str, or list, default=’target’

If str, denotes the column name in all provided datasets (except test) that contains the target output. If list, should be a list of strs designating multiple target columns. For example, in a multi-class classification dataset like UCI’s hand-written digits, target_column would be a list containing ten strings. In this example, the target_column data would be sparse, with a 1 to signify that a sample is a written example of a digit (0-9). For a working example, see ‘hyperparameter_hunter/examples/lib_keras_multi_classification_example.py’

id_column: Str, or None, default=None

If not None, str denoting the column name in all provided datasets containing sample IDs

do_predict_proba: Boolean, or int, default=False

If False, models.Model.fit() will call models.Model.model.predict()
If True, it will call models.Model.model.predict_proba(), and the values in all columns will be used as the actual prediction values
If do_predict_proba is an int, models.Model.fit() will call models.Model.model.predict_proba(), as is the case when do_predict_proba is True, but the int supplied as do_predict_proba declares the column index to use as the actual prediction values
For example, for a model to call the predict method, do_predict_proba=False (default). For a model to call the predict_proba method, and use all of the class probabilities, do_predict_proba=True. To call the predict_proba method, and use the class probabilities in the first column, do_predict_proba=0. To use the second column (index 1) of the result, do_predict_proba=1 - This often corresponds to the positive class’s probabilities in binary classification problems. To use the third column do_predict_proba=2, and so on

prediction_formatter: Callable, or None, default=None

If callable, expected to have same signature as utils.result_utils.format_predictions(). That is, the callable will receive (raw_predictions: np.array, dataset_df: pd.DataFrame, target_column: str, id_column: str or None) as input and should return a properly formatted prediction DataFrame. The callable uses raw_predictions as the content, dataset_df to provide any id column, and target_column to identify the column in which to place raw_predictions

metrics_params: Dict, or None, default=dict()

Dictionary of extra parameters to provide to metrics.ScoringMixIn.__init__(). metrics must be provided either 1) as an input kwarg to Environment.__init__() (see metrics), or 2) as a key in metrics_params, but not both. An Exception will be raised if both are given, or if neither is given

cv_type: Class or str, default=’KFold’

The class to define cross-validation splits. If str, it must be an attribute of sklearn.model_selection._split, and it must be a cross-validation class that inherits one of the following sklearn classes: BaseCrossValidator, or _RepeatedSplits. Valid str values include ‘KFold’, and ‘RepeatedKFold’, although there are many more. It must implement the following methods: [__init__, split]. If using a custom class, see the following tested sklearn classes for proper implementations: [KFold, StratifiedKFold, RepeatedKFold, RepeatedStratifiedKFold]. The arguments provided to cv_type.__init__() will be Environment.cv_params, which should include the following: [‘n_splits’ <int>, ‘n_repeats’ <int> (if applicable)]. cv_type.split() will receive the following arguments: [BaseExperiment.train_input_data, BaseExperiment.train_target_data]

runs: Int, default=1

The number of times to fit a model within each fold to perform multiple-run-averaging with different random seeds

global_random_seed: Int, default=32

The initial random seed used just before generating an Experiment’s random_seeds. This ensures consistency for random_seeds between Experiments, without having to explicitly provide it here

random_seeds: None, or List, default=None

If None, random_seeds of the appropriate shape will be created automatically. Else, must be a list of ints of shape (cv_params[‘n_repeats’], cv_params[‘n_splits’], runs). If cv_params does not have the key n_repeats (because standard cross-validation is being used), the value will default to 1. See experiments.BaseExperiment._random_seed_initializer() for info on expected shape

random_seed_bounds: List, default=[0, 100000]

A list containing two integers: the lower and upper bounds, respectively, for generating an Experiment’s random seeds in experiments.BaseExperiment._random_seed_initializer(). Generally, leave this kwarg alone

cv_params: dict, or None, default=dict()

Parameters provided upon initialization of cv_type. Keys may be any args accepted by cv_type.__init__(). Number of fold splits must be provided via “n_splits”, and number of repeats (if applicable for cv_type) must be provided via “n_repeats”

verbose: Int, boolean, default=3

Verbosity of printing for any experiments performed while this Environment is active

Higher values indicate more frequent logging. Logs are still recorded in the heartbeat file regardless of verbosity level. verbose only dictates which logs are visible in the console. The following table illustrates which types of logging messages will be visible with each verbosity level:

| Verbosity | Keys/IDs | Final Score | Repetitions* | Folds | Runs* | Run Starts* | Result Files | Other |
|:---------:|:--------:|:-----------:|:------------:|:-----:|:-----:|:-----------:|:------------:|:-----:|
|     0     |          |             |              |       |       |             |              |       |
|     1     |    Yes   |     Yes     |              |       |       |             |              |       |
|     2     |    Yes   |     Yes     |      Yes     |  Yes  |       |             |              |       |
|     3     |    Yes   |     Yes     |      Yes     |  Yes  |  Yes  |             |              |       |
|     4     |    Yes   |     Yes     |      Yes     |  Yes  |  Yes  |     Yes     |      Yes     |  Yes  |

*: If such logging is deemed appropriate with the given cross-validation parameters. In other words, repetition/run logging will only be verbose if Environment was given more than one repetition/run, respectively

file_blacklist: List of str, or None, or ‘ALL’, default=None

If list of str, the result files named within are not saved to their respective directory in “<ASSETS_DIRNAME>/Experiments”. If None, all result files are saved. If ‘ALL’, nothing at all will be saved for the Experiments. If the path of the file that initializes an Experiment does not end with a “.py” extension, the Experiment proceeds as if “script_backup” had been added to file_blacklist. This means that backup files will not be created for Jupyter notebooks (or any other non-“.py” files). For info on acceptable values, see validate_file_blacklist()

reporting_params: Dict, default=dict()

Parameters passed to initialize reporting.ReportingHandler

to_csv_params: Dict, default=dict()

Parameters passed to the calls to pandas.frame.DataFrame.to_csv() in recorders. In particular, this is where an Experiment’s final prediction files are saved, so the values here will affect the format of the .csv prediction files. Warning: If to_csv_params contains the key “path_or_buf”, it will be removed. Otherwise, all items are supplied directly to to_csv(), including kwargs it might not be expecting if they are given

do_full_save: None, or callable, default=:func:`utils.result_utils.default_do_full_save`

If callable, expected to take an Experiment’s result description dict as input and return a boolean. If None, treated as a callable that returns True. This parameter is used by recorders.DescriptionRecorder to determine whether the Experiment result files following the description should also be created. If do_full_save returns False, result file-saving is stopped early, and only the description is saved. If do_full_save returns True, all files not in file_blacklist are saved normally. This allows you to skip creation of an Experiment’s predictions, logs, and heartbeats if its score does not meet some threshold you set, for example. do_full_save receives the Experiment description dict as input, so for help setting do_full_save, just look into one of your Experiment descriptions

experiment_callbacks: `LambdaCallback`, or list of `LambdaCallback` (optional)

Callbacks injected directly into Experiments, adding new functionality, or customizing existing processes. Should be a LambdaCallback or a list of such classes. LambdaCallback can be created using callbacks.bases.lambda_callback(), which documents the options for creating callbacks. experiment_callbacks will be added to the MRO of the executed Experiment class by experiment_core.ExperimentMeta at __call__ time, making experiment_callbacks new base classes of the Experiment. See callbacks.bases.lambda_callback() for more information. Note that the Experiments conducted by OptPros will still benefit from experiment_callbacks. The presence of LambdaCallbacks will affect neither Environment keys, nor Experiment keys. In other words, for the purposes of Experiment matching/recording, all other factors being equal, an Experiment with experiment_callbacks is considered identical to an Experiment without, despite whatever custom functionality was added by the LambdaCallbacks

experiment_recorders: List, None, default=None

If not None, may be a list whose values are tuples of (<recorders.BaseRecorder descendant>, <str result_path>). The result_path str should be a path relative to results_path that specifies the directory/file in which the product of the custom recorder should be saved. The contents of experiment_recorders will be provided to recorders.RecorderList upon completion of an Experiment, and, if the subclassing documentation in recorders is followed properly, will create or update a result file for the just-executed Experiment

save_transformed_metrics: Boolean (optional)

Declares manner in which a model’s predictions should be evaluated through the provided metrics, with regard to target data transformations. This setting can be ignored if no transformation of the target variable takes place (either through FeatureEngineer, EngineerStep, or otherwise).

The default value of save_transformed_metrics depends on the dtype of the target data in train_dataset. If all target columns are numeric, save_transformed_metrics`=False, meaning metric evaluation should use the original/inverted targets and predictions. Else if any target column is non-numeric, `save_transformed_metrics`=True, meaning evaluation should use the transformed targets and predictions because most metrics require numeric inputs. This is described further in :attr:`save_transformed_metrics. A more descriptive name for this may be “calculate_metrics_using_transformed_predictions”, but that’s a bit verbose–even by my standards

Other Parameters

cross_validation_type: …

Alias for cv_type *

cross_validation_params: …

Alias for cv_params *

metrics_map: …

Alias for metrics *

reporting_handler_params: …

Alias for reporting_params *

root_results_path: …

Alias for results_path *

Notes

Dataset columns: In order to specify the columns to be used by the three dataset kwargs (train_dataset, holdout_dataset, test_dataset) during fitting and predicting, a few attributes can be used. On Environment initialization, the columns specified by the following kwargs will be separated from the rest of the dataset during training/predicting: 1) target_column, which names the column containing the target output labels for the input data; and 2) id_column, which (if given) represents the name of the column that contains identifying information for each data sample, and should otherwise have no relation to the actual data. Additionally, the feature_selector kwarg of the descendants of hyperparameter_hunter.experiments.BaseExperiment (like hyperparameter_hunter.experiments.CVExperiment) is used to filter out columns of the given datasets prior to fitting. See its documentation for more information, but it can effectively be used to remove any columns from the datasets

Overriding default kwargs at environment_params_path: If you have any of the above kwargs specified in the .json file at environment_params_path (except environment_params_path, which will be ignored), you can override its value by passing it as a kwarg when initializing Environment. The contents at environment_params_path are only used when the matching kwarg supplied at initialization is None. See “/examples/environment_params_path_example.py” for details

The order of precedence for determining the value of each parameter is as follows, with items at the top having the highest priority, and deferring only to the items below if their own value is None:

1)kwargs passed directly to Environment.__init__() on initialization,
2)keys of the file at environment_params_path (if valid .json object),
3)keys of hyperparameter_hunter.environment.Environment.DEFAULT_PARAMS

do_predict_proba: Because this parameter can be either a boolean or an integer, it is important to explicitly pass booleans rather than truthy or falsey values. Similarly, only pass integers if you intend for the value to be used as a column index. Do not pass 0 to mean False, or 1 to mean True

Attributes

train_input: DatasetSentinel: Sentinel replaced with current train input data during Model fitting/predicting. Commonly given in the model_extra_params kwargs of hyperparameter_hunter.experiments.BaseExperiment or hyperparameter_hunter.optimization.protocol_core.BaseOptPro.forge_experiment() for eval_set-like hyperparameters. Importantly, the actual value of this Sentinel is determined after performing cross-validation data splitting, and after executing FeatureEngineer
train_target: DatasetSentinel: Like train_input, except for current train target data
validation_input: DatasetSentinel: Like train_input, except for current validation input data
validation_target: DatasetSentinel: Like train_input, except for current validation target data
holdout_input: DatasetSentinel: Like train_input, except for current holdout input data
holdout_target: DatasetSentinel: Like train_input, except for current holdout target data

Methods

`environment_workflow`(self)	Execute all methods required to validate the environment and run Experiments
`format_result_paths`(self)	Remove paths contained in file_blacklist, and format others to prepare for saving results
`generate_cross_experiment_key`(self)	Generate a key to describe the current Environment’s cross-experiment parameters
`initialize_reporting`(self)	Initialize reporting for the Environment and Experiments conducted during its lifetime
`update_custom_environment_params`(self)	Try to update null parameters from environment_params_path, or DEFAULT_PARAMS
`validate_parameters`(self)	Ensure the provided parameters are valid and properly formatted

DEFAULT_PARAMS = {'cv_params': {}, 'cv_type': 'KFold', 'do_full_save': <function default_do_full_save>, 'do_predict_proba': False, 'environment_params_path': None, 'file_blacklist': None, 'global_random_seed': 32, 'id_column': None, 'metrics': None, 'metrics_params': {}, 'prediction_formatter': <function format_predictions>, 'random_seed_bounds': [0, 100000], 'random_seeds': None, 'reporting_params': {'console_params': None, 'float_format': '{:.5f}', 'heartbeat_params': None, 'heartbeat_path': None}, 'results_path': None, 'runs': 1, 'save_transformed_metrics': None, 'target_column': 'target', 'to_csv_params': {}, 'verbose': 3}¶

property results_path¶

property target_column¶

property train_dataset¶

property test_dataset¶

property holdout_dataset¶

property file_blacklist¶

property cv_type¶

property to_csv_params¶

property cross_experiment_params¶

property experiment_callbacks¶

property save_transformed_metrics¶

If save_transformed_metrics is True, and target transformation does occur, then experiment metrics are calculated using the transformed targets and predictions, which is the form returned directly by a fitted model’s predict method. For example, if target data is label-encoded, and an feature_engineering.EngineerStep is used to one-hot encode the target, then metrics functions will receive the following as input: (one-hot-encoded targets, one-hot-encoded predictions).

Conversely, if save_transformed_metrics is False, and target transformation does occur, then experiment metrics are calculated using the inverse of the transformed targets and predictions, which is same form as the original target data. Continuing the example of label-encoded target data, and an feature_engineering.EngineerStep to one-hot encode the target, in this case, metrics functions will receive the following as input: (label-encoded targets, label-encoded predictions)

environment_workflow(self)¶: Execute all methods required to validate the environment and run Experiments

validate_parameters(self)¶: Ensure the provided parameters are valid and properly formatted

format_result_paths(self)¶: Remove paths contained in file_blacklist, and format others to prepare for saving results

update_custom_environment_params(self)¶: Try to update null parameters from environment_params_path, or DEFAULT_PARAMS

generate_cross_experiment_key(self)¶: Generate a key to describe the current Environment’s cross-experiment parameters

initialize_reporting(self)¶: Initialize reporting for the Environment and Experiments conducted during its lifetime

property train_input¶

Get a DatasetSentinel representing an Experiment’s fold_train_input

Returns

DatasetSentinel:: A Sentinel that will be converted to hyperparameter_hunter.experiments.BaseExperiment.fold_train_input upon Model initialization

property train_target¶

Get a DatasetSentinel representing an Experiment’s fold_train_target

Returns

DatasetSentinel:: A Sentinel that will be converted to hyperparameter_hunter.experiments.BaseExperiment.fold_train_target upon Model initialization

property validation_input¶

Get a DatasetSentinel representing an Experiment’s fold_validation_input

Returns

DatasetSentinel:: A Sentinel that will be converted to hyperparameter_hunter.experiments.BaseExperiment.fold_validation_input upon Model initialization

property validation_target¶

Get a DatasetSentinel representing an Experiment’s fold_validation_target

Returns

DatasetSentinel:: A Sentinel that will be converted to hyperparameter_hunter.experiments.BaseExperiment.fold_validation_target upon Model initialization

property holdout_input¶

Get a DatasetSentinel representing an Experiment’s holdout_input_data

Returns

DatasetSentinel:: A Sentinel that will be converted to hyperparameter_hunter.experiments.BaseExperiment.holdout_input_data upon Model initialization

property holdout_target¶

Get a DatasetSentinel representing an Experiment’s holdout_target_data

Returns

DatasetSentinel:: A Sentinel that will be converted to hyperparameter_hunter.experiments.BaseExperiment.holdout_target_data upon Model initialization

hyperparameter_hunter.environment.define_holdout_set(train_set:pandas.core.frame.DataFrame, holdout_set:Union[pandas.core.frame.DataFrame, <built-in function callable>, str, NoneType], target_column:Union[str, List[str]]) → Tuple[pandas.core.frame.DataFrame, Union[pandas.core.frame.DataFrame, NoneType]]¶

Create holdout_set (if necessary) by loading a DataFrame from a .csv file, or by separating train_set, and return the updated (train_set, holdout_set) pair

Parameters

train_set: Pandas.DataFrame: Training DataFrame. Will be split into train/holdout data, if holdout_set is callable
holdout_set: Pandas.DataFrame, callable, str, or None: If pd.DataFrame, this is the holdout dataset. If callable, expects a function that takes (train_set, target_column) as input and returns the new (train_set, holdout_set). If str, will attempt to read file at path via pandas.read_csv(). Else, no holdout set
target_column: Str, or list: If str, denotes the column name in provided datasets that contains the target output. If list, should be a list of strs designating multiple target columns

Returns

train_set: Pandas.DataFrame: train_set if holdout_set is not callable. Else train_set modified by holdout_set
holdout_set: Pandas.DataFrame, or None: Original DataFrame, or DataFrame read from str filepath, or a portion of train_set if holdout_set is callable, or None

hyperparameter_hunter.environment.validate_file_blacklist(blacklist)¶

Validate contents of blacklist. For most values, the corresponding file is saved upon completion of the experiment. See the “Notes” section below for details on some special cases

Parameters

blacklist: List of strings, or None: The result files that should not be saved

Returns

blacklist: List: If not empty, acceptable list of result file types to blacklist

Notes

‘heartbeat’: If the heartbeat file is saved, a new file is not generated and saved to the “Experiments/Heartbeats” directory as is the case with most other files. Instead, the general “Heartbeat.log” file is copied and renamed to the current experiment id, then saved to the appropriate dir. This is because the general “Heartbeat.log” file represents the heartbeat for whatever experiment is currently in progress.

‘script_backup’: This file is saved as quickly as possible after starting a new experiment, rather than waiting for the experiment to end. There are two reasons for this behavior: 1) to avoid saving any changes that may have been made to a file after it has been executed, and 2) to have the offending file in the event of a catastrophic failure that results in no other files being saved. As stated in the documentation of the file_blacklist parameter of Environment, if the path of the file that initializes an Experiment does not end with a “.py” extension, the Experiment proceeds as if “script_backup” had been added to blacklist. This means that backup files will not be created for Jupyter notebooks (or any other non-“.py” files)

‘description’ and ‘tested_keys’: These two results types constitute a bare minimum of sorts for experiment recording. If either of these two are blacklisted, then as far as the library is concerned, the experiment never took place.

‘tested_keys’ (continued): If this string is included in the blacklist, then the contents of the “KeyAttributeLookup” directory will also be excluded from the list of files to update

‘current_heartbeat’: The general heartbeat file that should be stored at ‘HyperparameterHunterAssets/Heartbeat.log’. If this value is blacklisted, then ‘heartbeat’ is also added to blacklist automatically out of necessity. This is done because the heartbeat file for the current experiment cannot be created as a copy of the general heartbeat file if the general heartbeat file is never created in the first place

hyperparameter_hunter.exceptions module¶

This module defines a few custom Exception classes, and it provides the means for Exceptions to be added to the Heartbeat result files of Experiments

Related¶

hyperparameter_hunter.reporting: This module executes hyperparameter_hunter.exception_handler.hook_exception_handler() to ensure that any raised Exceptions are also recorded in the Heartbeat files of the Experiment for which the Exception was raised in order to assist in debugging

hyperparameter_hunter.exceptions.handle_exception(exception_type, exception_value, exception_traceback)¶

Intercept raised exceptions to ensure they are included in an Experiment’s log files

Parameters

exception_type: Exception: The class type of the exception that was raised
exception_value: Str: The message produced by the exception
exception_traceback: Exception.traceback: The traceback provided by the raised exception

Raises

SystemExit: If exception_type is a subclass of KeyboardInterrupt

hyperparameter_hunter.exceptions.hook_exception_handler()¶: Set sys.excepthook to hyperparameter_hunter.exception_handler.handle_exception()

exception hyperparameter_hunter.exceptions.EnvironmentInactiveError(message=None, extra='')¶

Bases: Exception

Exception raised when an active instance of hyperparameter_hunter.environments.Environment is not detected

Parameters

message: String, or None, default=None: A message to provide upon raising EnvironmentExceptionError
extra: String, default=’’: Extra content to append onto the end of message before raising the Exception

exception hyperparameter_hunter.exceptions.EnvironmentInvalidError(message=None, extra='')¶

Bases: Exception

Exception raised when there is an active instance of hyperparameter_hunter.environments.Environment, but it is invalid for some reason

Parameters

message: String, or None, default=None: A message to provide upon raising EnvironmentInvalidError
extra: String, default=’’: Extra content to append onto the end of message before raising the Exception

exception hyperparameter_hunter.exceptions.RepeatedExperimentError(message=None, extra='')¶

Bases: Exception

Exception raised when a saved Experiment is found with the same hyperparameters as the Experiment being executed

Parameters

message: String, or None, default=None: A message to provide upon raising RepeatedExperimentError
extra: String, default=’’: Extra content to append onto the end of message before raising the Exception

exception hyperparameter_hunter.exceptions.IncompatibleCandidateError(candidate, template)¶

Bases: Exception

Exception raised when a candidate hyperparameter set is incompatible with a template

Parameters

candidate: Any: Hyperparameter set that is incompatible with the choices/concrete values of template
template: Any: Hyperparameter set defined by forge_experiment(). May include any combination of space choices and concrete values

exception hyperparameter_hunter.exceptions.ContinueRemap¶: Bases: Exception

exception hyperparameter_hunter.exceptions.DeprecatedWarning(obj_name, v_deprecate, v_remove, details='')¶

Bases: DeprecationWarning

Warning class for deprecated callables. This is a specialization of the built-in DeprecationWarning, adding parameters that allow us to get information into the __str__ that ends up being sent through the warnings system. The attributes aren’t able to be retrieved after the warning gets raised and passed through the system as only the class–not the instance–and message are what gets preserved

Parameters

obj_name: String: The name of the callable being deprecated
v_deprecate: String: The version that obj is deprecated in
v_remove: String: The version that obj gets removed in
details: String, default=””: Deprecation details, such as directions on what to use instead of the deprecated code

exception hyperparameter_hunter.exceptions.UnsupportedWarning(obj_name, v_deprecate, v_remove, details='')¶

Bases: hyperparameter_hunter.exceptions.DeprecatedWarning

Warning class for callable to warn that it is being unsupported

hyperparameter_hunter.experiment_core module¶

This module is the core of all of the experimentation in hyperparameter_hunter, hence its name. It is impossible to understand hyperparameter_hunter.experiments without first having a grasp on what hyperparameter_hunter.experiment_core.ExperimentMeta is doing. This module serves to bridge the gap between Experiments, and hyperparameter_hunter.callbacks by dynamically making Experiments inherit various callbacks depending on the inputs given in order to make Experiments completely functional

Related¶

hyperparameter_hunter.experiments: Defines the structure of the experimentation process. While certainly very important, hyperparameter_hunter.experiments wouldn’t do much at all without hyperparameter_hunter.callbacks, or hyperparameter_hunter.experiment_core
hyperparameter_hunter.callbacks: Defines parent classes to the classes defined in hyperparameter_hunter.experiments. This not only makes it very easy to find the entire workflow for a given task, but also ensures that each instance of an Experiment inherits exactly the functionality that it needs. For example, if no holdout data was given, then experiment_core.ExperimentMeta will not add callbacks.evaluators.EvaluatorHoldout or callbacks.predictors.PredictorHoldout to the list of callbacks inherited by the Experiment. This means that the Experiment never needs to check for the existence of holdout data in order to determine how it should proceed because it literally doesn’t have the code that deals with holdout data

Notes¶

Was a metaclass really necessary here? Probably not, but it’s being used for two reasons: 1) metaclasses are fun, and programming (especially artificial intelligence) should be fun; and 2) it allowed for a very clean separation between the various functions demanded by Experiments that are provided by hyperparameter_hunter.callbacks. Having each of the callbacks separated in their own classes makes it very easy to debug existing functionality, and to add new callbacks in the future

class hyperparameter_hunter.experiment_core.ExperimentMeta¶

Bases: type

Create a new class object that stores necessary class-wide callbacks to __class_wide_bases

Methods

`__call__`(cls, \args, \\*kwargs)	Store necessary instance-wide callbacks to `__instance_bases`, sort all dynamically added callback base classes, then add them to the instance
`mro`()	return a type’s method resolution order

hyperparameter_hunter.experiment_core.base_callback_class_sorter(auxiliary_bases, parent_class_order=None)¶

Sort callback classes in order to preserve the intended MRO of their descendant, and to enable callbacks that may depend on one another to function properly

Parameters

auxiliary_bases: List: The callback classes to be sorted according to the order in which their parent is found in parent_class_order. For example, if a class (x) in auxiliary_bases is the only descendant of the last class in parent_class_order, then class x will be moved to the last position in sorted_auxiliary_bases. If multiple classes in auxiliary_bases are descendants of the same parent in parent_class_order, they will be sorted alphabetically (from A-Z)
parent_class_order: List, or None, default=<See description>: List of base callback classes that define the sort order for auxiliary_bases. Note that these are not the normal callback classes that add to the functionality of an Experiment, but the base classes from which the callback classes are descendants. All the classes in parent_class_order should be defined in hyperparameter_hunter.callbacks.bases. The last class in parent_class_order should be hyperparameter_hunter.callbacks.bases.BaseCallback, which is the parent class for all other base classes. This ensures that custom callbacks defined by hyperparameter_hunter.callbacks.bases.lambda_callback() will be recognized as valid and executed last

Returns

sorted_auxiliary_bases: List: The contents of auxiliary_bases sorted according to their parents’ location in parent_class_order, then alphabetically

Raises

ValueError: If auxiliary_bases contains a class that is not a descendant of any of the classes in parent_class_order

Examples

>>> in_0 = [AggregatorEvaluations, AggregatorTimes, EvaluatorOOF, EvaluatorHoldout, LoggerFitStatus, PredictorOOF, PredictorHoldout, PredictorTest]
>>> out_0 = [PredictorHoldout, PredictorOOF, PredictorTest, EvaluatorHoldout, EvaluatorOOF, AggregatorEvaluations, AggregatorTimes, LoggerFitStatus]
>>> assert base_callback_class_sorter(in_0) == out_0
>>> in_1 = [AggregatorEvaluations, AggregatorTimes, EvaluatorOOF, EvaluatorHoldout, LoggerFitStatus, PredictorOOF, PredictorHoldout, PredictorTest]
>>> out_1 = [PredictorHoldout, PredictorOOF, PredictorTest, EvaluatorHoldout, EvaluatorOOF, AggregatorEvaluations, AggregatorTimes, LoggerFitStatus]
>>> assert base_callback_class_sorter(in_1) == out_1
>>> in_2 = [PredictorOOF, PredictorHoldout, AggregatorTimes, PredictorTest, AggregatorEvaluations, EvaluatorOOF, EvaluatorHoldout, LoggerFitStatus]
>>> out_2 = [PredictorHoldout, PredictorOOF, PredictorTest, EvaluatorHoldout, EvaluatorOOF, AggregatorEvaluations, AggregatorTimes, LoggerFitStatus]
>>> assert base_callback_class_sorter(in_2) == out_2
>>> in_3 = [PredictorTest, EvaluatorHoldout, LoggerFitStatus, AggregatorTimes, PredictorHoldout, PredictorOOF, AggregatorEvaluations, EvaluatorOOF]
>>> out_3 = [PredictorHoldout, PredictorOOF, PredictorTest, EvaluatorHoldout, EvaluatorOOF, AggregatorEvaluations, AggregatorTimes, LoggerFitStatus]
>>> assert base_callback_class_sorter(in_3) == out_3
>>> in_4 = [LoggerFitStatus, EvaluatorOOF, PredictorTest, EvaluatorHoldout, AggregatorTimes, AggregatorEvaluations, PredictorHoldout, PredictorOOF]
>>> out_4 = [PredictorHoldout, PredictorOOF, PredictorTest, EvaluatorHoldout, EvaluatorOOF, AggregatorEvaluations, AggregatorTimes, LoggerFitStatus]
>>> assert base_callback_class_sorter(in_4) == out_4
>>> in_5 = [AggregatorEvaluations, PredictorTest, PredictorOOF, EvaluatorOOF, EvaluatorHoldout]
>>> out_5 = [PredictorOOF, PredictorTest, EvaluatorHoldout, EvaluatorOOF, AggregatorEvaluations]
>>> assert base_callback_class_sorter(in_5) == out_5
>>> in_6 = [EvaluatorOOF, PredictorOOF, EvaluatorHoldout, AggregatorEvaluations, PredictorTest]
>>> out_6 = [PredictorOOF, PredictorTest, EvaluatorHoldout, EvaluatorOOF, AggregatorEvaluations]
>>> assert base_callback_class_sorter(in_6) == out_6
>>> in_7 = [PredictorTest, EvaluatorHoldout, PredictorOOF]
>>> out_7 = [PredictorOOF, PredictorTest, EvaluatorHoldout]
>>> assert base_callback_class_sorter(in_7) == out_7
>>> in_8 = [PredictorTest, PredictorOOF, EvaluatorHoldout]
>>> out_8 = [PredictorOOF, PredictorTest, EvaluatorHoldout]
>>> assert base_callback_class_sorter(in_8) == out_8

>>> base_callback_class_sorter([type("Foo", (object,), {}), PredictorTest, EvaluatorHoldout, PredictorOOF])
Traceback (most recent call last):
    File "experiment_core.py", line ?, in base_callback_class_sorter
ValueError: Base class not descendant of acceptable parent class: [<class 'hyperparameter_hunter.experiment_core.Foo'>]

hyperparameter_hunter.experiments module¶

This module contains the classes used for constructing and conducting an Experiment (most notably, CVExperiment). Any class contained herein whose name starts with “Base” should not be used directly. CVExperiment is the preferred means of conducting one-off experimentation

Related¶

hyperparameter_hunter.experiment_core: Defines ExperimentMeta, an understanding of which is critical to being able to understand experiments
hyperparameter_hunter.metrics: Defines ScoringMixIn, a parent of experiments.BaseExperiment that enables scoring and evaluating models
hyperparameter_hunter.models: Used to instantiate the actual learning models, which are a single part of the entire experimentation workflow, albeit the most significant part

Notes¶

As mentioned above, the inner workings of experiments will be very confusing without a grasp on what’s going on in experiment_core, and its related modules

class hyperparameter_hunter.experiments.BaseExperiment(model_initializer, model_init_params=None, model_extra_params=None, feature_engineer=None, feature_selector=None, notes=None, do_raise_repeated=False, auto_start=True, target_metric=None)¶

Bases: hyperparameter_hunter.metrics.ScoringMixIn

One-off Experimentation base class

Bare-bones Description: Runs the cross-validation scheme defined by Environment, during which 1) Datasets are processed according to feature_engineer; 2) Models are built by instantiating model_initializer with model_init_params; 3) Models are trained on processed data, optionally using parameters from model_extra_params; 4) Results are logged and recorded for each fitting period; 5) Descriptions, predictions, results (both averages and individual periods), etc. are saved.

What’s the Big Deal? The most important takeaway from the above description is that descriptions/results are THOROUGH and REUSABLE. By thorough, I mean that all of a model’s hyperparameters are saved, not just the ones given in model_init_params. This may sound odd, but it’s important because it makes results reusable during optimization, when you may be using a different set of hyperparameters. It helps with other things like preventing duplicate experiments and ensembling, as well. But the big part is that this transforms hyperparameter optimization from an isolated, throwaway process we can only afford when an ML project is sufficiently “mature” to a process that covers the entire lifespan of a project. No Experiment is forgotten or wasted. Optimization is automatically given the data it needs to succeed by drawing on all your past Experiments and optimization rounds.

The Experiment has three primary missions: 1. Act as scaffold for organizing ML Experimentation and optimization 2. Record Experiment descriptions and results 3. Eliminate lots of repetitive/error-prone boilerplate code

Providing a scaffold for the entire ML process is critical because without a standardized format, everything we do looks different. Without a unified scaffold, development is slower, more confusing, and less adaptable. One of the benefits of standardizing the format of ML Experimentation is that it enables us to exhaustively record all the important characteristics of Experiment, as well as an assortment of customizable result files – all in a way that allows them to be reused in the future.

What About Data/Metrics? Experiments require an active Environment in order to function, from which the Experiment collects important cross-experiment parameters, such as datasets, metrics, cross-validation schemes, and even callbacks to inherit, among many other properties documented in Environment

Parameters

model_initializer: Class, or functools.partial, or class instance

Algorithm class used to initialize a model, such as XGBoost’s XGBRegressor, or SKLearn’s KNeighborsClassifier; although, there are hundreds of possibilities across many different ML libraries. model_initializer is expected to define at least fit and predict methods. model_initializer will be initialized with model_init_params, and its “extra” methods (fit, predict, etc.) will be invoked with parameters in model_extra_params

model_init_params: Dict, or object (optional)

Dictionary of arguments given to create an instance of model_initializer. Any kwargs that are considered valid by the __init__ method of model_initializer are valid in model_init_params.

One of the key features that makes HyperparameterHunter so magical is that ALL hyperparameters in the signature of model_initializer (and their default values) are discovered – whether or not they are explicitly given in model_init_params. Not only does this make Experiment result descriptions incredibly thorough, it also makes optimization smoother, more effective, and far less work for the user. For example, take LightGBM’s LGBMRegressor, with model_init_params`=`dict(learning_rate=0.2). HyperparameterHunter recognizes that this differs from the default of 0.1. It also recognizes that LGBMRegressor is actually initialized with more than a dozen other hyperparameters we didn’t bother mentioning, and it records their values, too. So if we want to optimize num_leaves tomorrow, the OptPro doesn’t start from scratch. It knows that we ran an Experiment that didn’t explicitly mention num_leaves, but its default value was 31, and it uses this information to fuel optimization – all without us having to manually keep track of tons of janky collections of hyperparameters. In fact, we really don’t need to go out of our way at all. HyperparameterHunter just acts as our faithful lab assistant, keeping track of all the stuff we’d rather not worry about

model_extra_params: Dict (optional)

Dictionary of extra parameters for models’ non-initialization methods (like fit, predict, predict_proba, etc.), and for neural networks. To specify parameters for an extra method, place them in a dict named for the extra method to which the parameters should be given. For example, to call fit with early_stopping_rounds`=5, use `model_extra_params`=`dict(fit=dict(early_stopping_rounds=5)).

For models whose fit methods have a kwarg like eval_set (such as XGBoost’s), one can use the DatasetSentinel attributes of the current active Environment, documented under its “Attributes” section and under train_input. An example using several DatasetSentinels can be found in HyperparameterHunter’s [XGBoost Classification Example](https://github.com/HunterMcGushion/hyperparameter_hunter/blob/master/examples/xgboost_examples/classification.py)

feature_engineer: `FeatureEngineer`, or list (optional)

Feature engineering/transformation/pre-processing steps to apply to datasets defined in Environment. If list, will be used to initialize FeatureEngineer, and can contain any of the following values:

EngineerStep instance

Function input to :class:~hyperparameter_hunter.feature_engineering.EngineerStep`

For important information on properly formatting EngineerStep functions, please see the documentation of EngineerStep. OptPros can perform hyperparameter optimization of feature_engineer steps. This capability adds a third allowed value to the above list and is documented in forge_experiment()

feature_selector: List of str, callable, or list of booleans (optional)

Column names to include as input data for all provided DataFrames. If None, feature_selector is set to all columns in train_dataset, less target_column, and id_column. feature_selector is provided as the second argument for calls to pandas.DataFrame.loc when constructing datasets

notes: String (optional)

Additional information about the Experiment that will be saved with the Experiment’s description result file. This serves no purpose other than to facilitate saving Experiment details in a more readable format

do_raise_repeated: Boolean, default=False

If True and this Experiment locates a previous Experiment’s results with matching Environment and Hyperparameter Keys, a RepeatedExperimentError will be raised. Else, a warning will be logged

auto_start: Boolean, default=True

If True, after the Experiment is initialized, it will automatically call BaseExperiment.preparation_workflow(), followed by BaseExperiment.experiment_workflow(), effectively completing all essential tasks without requiring additional method calls

target_metric: Tuple, str, default=(‘oof’, <:attr:`environment.Environment.metrics`[0]>)

Path denoting the metric to be used to compare completed Experiments or to use for certain early stopping procedures in some model classes. The first value should be one of [‘oof’, ‘holdout’, ‘in_fold’]. The second value should be the name of a metric being recorded according to the values supplied in hyperparameter_hunter.environment.Environment.metrics_params. See the documentation for hyperparameter_hunter.metrics.get_formatted_target_metric() for more info. Any values returned by, or used as the target_metric input to this function are acceptable values for target_metric

callbacks: `LambdaCallback`, or list of `LambdaCallback` (optional)

Callbacks injected directly into concrete Experiment (CVExperiment), adding new functionality, or customizing existing processes. Should be a LambdaCallback or a list of such classes. LambdaCallback can be created using callbacks.bases.lambda_callback(), which documents the options for creating callbacks. callbacks will be added to the MRO of the Experiment by experiment_core.ExperimentMeta at __call__ time, making callbacks new base classes of the Experiment. See callbacks.bases.lambda_callback() for more information. The presence of LambdaCallbacks will not affect Experiment keys. In other words, for the purposes of Experiment matching/recording, all other factors being equal, an Experiment with callbacks is considered identical to an Experiment without, despite whatever custom functionality was added by the LambdaCallbacks

See also

hyperparameter_hunter.optimization.protocol_core.BaseOptPro.forge_experiment(): OptPro method to define hyperparameter search scaffold for building Experiments during optimization. This method follows the same format as Experiment initialization, but it adds the ability to provide hyperparameter values as ranges to search over, via subclasses of Dimension. The other notable difference is that forge_experiment removes the auto_start and target_metric kwargs, which is described in the forge_experiment docstring Notes
Environment: Provides critical information on how Experiments should be conducted, as well as the data to be used by Experiments. An Environment must be active before executing any Experiment or OptPro
lambda_callback(): Enables customization of the Experimentation process and access to all Experiment internals through a collection of methods that are invoked at all the important periods over an Experiment’s lifespan. These can be provided via the experiment_callbacks kwarg of Environment, and the callback classes literally get thrown in to the parent classes of the Experiment, so they’re kind of a big deal

Methods

`evaluate`(self, data_type, target, prediction)	Apply metric(s) to the given data to calculate the value of the prediction
`execute`(self)	Execute the fitting protocol for the Experiment, comprising the following: instantiation of learners for each run, preprocessing of data as appropriate, training learners, making predictions, and evaluating and aggregating those predictions and other stats/metrics for later use
`experiment_workflow`(self)	Define the actual experiment process, including execution, result saving, and cleanup
`on_exp_start`(self)	Prepare data prior to executing fitting protocol (cross-validation), by 1) Initializing formal `datasets` attributes, 2) Invoking feature_engineer to perform “pre_cv”-stage preprocessing, and 3) Updating datasets to include their (transformed) counterparts in feature_engineer
`preparation_workflow`(self)	Execute all tasks that must take place before the experiment is actually started.

experiment_workflow(self)¶: Define the actual experiment process, including execution, result saving, and cleanup

preparation_workflow(self)¶: Execute all tasks that must take place before the experiment is actually started. Such tasks include (but are not limited to): Creating experiment IDs and hyperparameter keys, creating script backups, and validating parameters

abstract execute(self)¶: Execute the fitting protocol for the Experiment, comprising the following: instantiation of learners for each run, preprocessing of data as appropriate, training learners, making predictions, and evaluating and aggregating those predictions and other stats/metrics for later use

on_exp_start(self)¶: Prepare data prior to executing fitting protocol (cross-validation), by 1) Initializing formal datasets attributes, 2) Invoking feature_engineer to perform “pre_cv”-stage preprocessing, and 3) Updating datasets to include their (transformed) counterparts in feature_engineer

class hyperparameter_hunter.experiments.BaseCVExperiment(model_initializer, model_init_params=None, model_extra_params=None, feature_engineer=None, feature_selector=None, notes=None, do_raise_repeated=False, auto_start=True, target_metric=None)¶

Bases: hyperparameter_hunter.experiments.BaseExperiment

Methods

`cross_validation_workflow`(self)	Execute workflow for cross-validation process, consisting of the following tasks: 1) Create train and validation split indices for all folds, 2) Iterate through folds, performing cv_fold_workflow for each, 3) Average accumulated predictions over fold splits, 4) Evaluate final predictions, 5) Format final predictions to prepare for saving
`cv_fold_workflow`(self)	Execute workflow for individual fold, consisting of the following tasks: Execute overridden `on_fold_start()` tasks, 2) Perform cv_run_workflow for each run, 3) Execute overridden `on_fold_end()` tasks
`cv_run_workflow`(self)	Execute run workflow, consisting of: 1) Execute overridden `on_run_start()` tasks, 2) Initialize and fit Model, 3) Execute overridden `on_run_end()` tasks
`evaluate`(self, data_type, target, prediction)	Apply metric(s) to the given data to calculate the value of the prediction
`execute`(self)	Execute the fitting protocol for the Experiment, comprising the following: instantiation of learners for each run, preprocessing of data as appropriate, training learners, making predictions, and evaluating and aggregating those predictions and other stats/metrics for later use
`experiment_workflow`(self)	Define the actual experiment process, including execution, result saving, and cleanup
`on_exp_start`(self)	Prepare data prior to executing fitting protocol (cross-validation), by 1) Initializing formal `datasets` attributes, 2) Invoking feature_engineer to perform “pre_cv”-stage preprocessing, and 3) Updating datasets to include their (transformed) counterparts in feature_engineer
`on_fold_start`(self)	Override `on_fold_start()` tasks set by `experiment_core.ExperimentMeta`, consisting of: 1) Split train/validation data, 2) Make copies of holdout/test data for current fold (for feature engineering), 3) Log start, 4) Execute original tasks
`on_run_start`(self)	Override `on_run_start()` tasks organized by `experiment_core.ExperimentMeta`, consisting of: 1) Set random seed and update model parameters according to current seed, 2) Log run start, 3) Execute original tasks
`preparation_workflow`(self)	Execute all tasks that must take place before the experiment is actually started.

execute(self)¶: Execute the fitting protocol for the Experiment, comprising the following: instantiation of learners for each run, preprocessing of data as appropriate, training learners, making predictions, and evaluating and aggregating those predictions and other stats/metrics for later use

cross_validation_workflow(self)¶: Execute workflow for cross-validation process, consisting of the following tasks: 1) Create train and validation split indices for all folds, 2) Iterate through folds, performing cv_fold_workflow for each, 3) Average accumulated predictions over fold splits, 4) Evaluate final predictions, 5) Format final predictions to prepare for saving

on_fold_start(self)¶: Override on_fold_start() tasks set by experiment_core.ExperimentMeta, consisting of: 1) Split train/validation data, 2) Make copies of holdout/test data for current fold (for feature engineering), 3) Log start, 4) Execute original tasks

cv_fold_workflow(self)¶: Execute workflow for individual fold, consisting of the following tasks: Execute overridden on_fold_start() tasks, 2) Perform cv_run_workflow for each run, 3) Execute overridden on_fold_end() tasks

on_run_start(self)¶: Override on_run_start() tasks organized by experiment_core.ExperimentMeta, consisting of: 1) Set random seed and update model parameters according to current seed, 2) Log run start, 3) Execute original tasks

cv_run_workflow(self)¶: Execute run workflow, consisting of: 1) Execute overridden on_run_start() tasks, 2) Initialize and fit Model, 3) Execute overridden on_run_end() tasks

class hyperparameter_hunter.experiments.CVExperiment(model_initializer, model_init_params=None, model_extra_params=None, feature_engineer=None, feature_selector=None, notes=None, do_raise_repeated=False, auto_start=True, target_metric=None, callbacks=None)¶

Bases: hyperparameter_hunter.experiments.BaseCVExperiment

Attributes

source_script

Methods

`cross_validation_workflow`(self)	Execute workflow for cross-validation process, consisting of the following tasks: 1) Create train and validation split indices for all folds, 2) Iterate through folds, performing cv_fold_workflow for each, 3) Average accumulated predictions over fold splits, 4) Evaluate final predictions, 5) Format final predictions to prepare for saving
`cv_fold_workflow`(self)	Execute workflow for individual fold, consisting of the following tasks: Execute overridden `on_fold_start()` tasks, 2) Perform cv_run_workflow for each run, 3) Execute overridden `on_fold_end()` tasks
`cv_run_workflow`(self)	Execute run workflow, consisting of: 1) Execute overridden `on_run_start()` tasks, 2) Initialize and fit Model, 3) Execute overridden `on_run_end()` tasks
`evaluate`(self, data_type, target, prediction)	Apply metric(s) to the given data to calculate the value of the prediction
`execute`(self)	Execute the fitting protocol for the Experiment, comprising the following: instantiation of learners for each run, preprocessing of data as appropriate, training learners, making predictions, and evaluating and aggregating those predictions and other stats/metrics for later use
`experiment_workflow`(self)	Define the actual experiment process, including execution, result saving, and cleanup
`on_exp_start`(self)	Prepare data prior to executing fitting protocol (cross-validation), by 1) Initializing formal `datasets` attributes, 2) Invoking feature_engineer to perform “pre_cv”-stage preprocessing, and 3) Updating datasets to include their (transformed) counterparts in feature_engineer
`on_fold_start`(self)	Override `on_fold_start()` tasks set by `experiment_core.ExperimentMeta`, consisting of: 1) Split train/validation data, 2) Make copies of holdout/test data for current fold (for feature engineering), 3) Log start, 4) Execute original tasks
`on_run_start`(self)	Override `on_run_start()` tasks organized by `experiment_core.ExperimentMeta`, consisting of: 1) Set random seed and update model parameters according to current seed, 2) Log run start, 3) Execute original tasks
`preparation_workflow`(self)	Execute all tasks that must take place before the experiment is actually started.

source_script = None¶

hyperparameter_hunter.experiments.get_cv_indices(folds, cv_params, input_data, target_data)¶

Produce iterables of cross validation indices in the shape of (n_repeats, n_folds)

Parameters

folds: Instance of `cv_type`: Cross validation folds object, whose split() receives input_data and target_data
cv_params: Dict: Parameters given to instantiate folds. Must contain n_splits. May contain n_repeats
input_data: pandas.DataFrame: Input data to be split by folds, to which yielded indices will correspond
target_data: pandas.DataFrame: Target data to be split by folds, to which yielded indices will correspond

Yields

Generator: Cross validation indices in shape of (<n_repeats or 1>, <n_splits>)

hyperparameter_hunter.feature_engineering module¶

This module organizes and executes feature engineering/preprocessing step functions. The central components of the module are FeatureEngineer and EngineerStep - everything else is built to support those two classes. This module works with a very broad definition of “feature engineering”. The following is a non-exhaustive list of transformations that are considered valid for FeatureEngineer step functions:

Manual feature creation
Input data scaling/normalization/standardization
Target data transformation
Re-sampling
Data imputation
Feature selection/elimination
Encoding (one-hot, label, etc.)
Binarization/binning/discretization
Feature extraction (as for NLP/image recognition tasks)
Feature shuffling

Related¶

hyperparameter_hunter.space: Only related when optimizing FeatureEngineer steps within an Optimization Protocol, but defines Categorical, which is the mechanism for defining a feature engineer step search space, and RejectedOptional, which is used to represent the absence of a feature engineer step, when labeled as optional

class hyperparameter_hunter.feature_engineering.EMPTY_SENTINEL¶: Bases: object

class hyperparameter_hunter.feature_engineering.DatasetNameReport(params: Tuple[str], stage: str)¶

Bases: object

Characterize the relationships between the dataset names params

Parameters

params: Tuple[str]: Dataset names requested by a feature engineering step callable. Must be a subset of {“train_data”, “train_inputs”, “train_targets”, “validation_data”, “validation_inputs”, “validation_targets”, “holdout_data”, “holdout_inputs”, “holdout_targets”, “test_inputs”, “all_data”, “all_inputs”, “all_targets”, “non_train_data”, “non_train_inputs”, “non_train_targets”}
stage: String in {“pre_cv”, “intra_cv”}: Feature engineering stage during which the datasets params are requested

Attributes

merged_datasets: List[tuple]: Tuples of strings denoting paths to datasets that represent a merge between multiple datasets. Merged datasets are those prefixed with either “all” or “non_train”. These paths are locations in descendants
coupled_datasets: List[tuple]: Tuples of strings denoting paths to datasets that represent a coupling of “inputs” and “targets” datasets. Coupled datasets are those suffixed with “data”. These paths are locations in descendants, and the values at each path should be a dict containing keys with “inputs” and “targets” suffixes
leaves: Dict[tuple, str]: Mapping of full path tuples in descendants to their leaf values. Tuple paths represent the steps necessary to reach the standard dataset leaf value in descendants by traversing merged and coupled datasets. Values in leaves should be identical to the last element of the corresponding tuple key
descendants: DescendantsType: Nested dict in which all keys are dataset name strings, and all leaf values are None. Represents the structure of the requested dataset names, traversing over merged and coupled datasets (if necessary) in order to reach the standard dataset leaves

hyperparameter_hunter.feature_engineering.names_for_merge(merge_to:str, stage:str) → List[str]¶

Retrieve the names of the standard datasets that are allowed to be included in a merged DataFrame of type merge_to at stage stage

Parameters

merge_to: String: Type of merged dataframe to produce. Should be one of the following: {“all_data”, “all_inputs”, “all_targets”, “non_train_data”, “non_train_inputs”, “non_train_targets”}
stage: String in {“pre_cv”, “intra_cv}: Feature engineering stage for which the merged dataframe is requested. The results produced with each option differ only in that a merged_df created with stage=”pre_cv” will never contain “validation” data because it doesn’t exist before cross-validation has begun. Conversely, a merged_df created with stage=”intra_cv” will contain the appropriate “validation” data if it exists

Returns

names: List: Subset of {“train_data”, “train_inputs”, “train_targets”, “validation_data”, “validation_inputs”, “validation_targets”, “holdout_data”, “holdout_inputs”, “holdout_targets”, “test_inputs”}

Examples

>>> names_for_merge("all_data", "intra_cv")
['train_data', 'validation_data', 'holdout_data']
>>> names_for_merge("all_inputs", "intra_cv")
['train_inputs', 'validation_inputs', 'holdout_inputs', 'test_inputs']
>>> names_for_merge("all_targets", "intra_cv")
['train_targets', 'validation_targets', 'holdout_targets']
>>> names_for_merge("all_data", "pre_cv")
['train_data', 'holdout_data']
>>> names_for_merge("all_inputs", "pre_cv")
['train_inputs', 'holdout_inputs', 'test_inputs']
>>> names_for_merge("all_targets", "pre_cv")
['train_targets', 'holdout_targets']
>>> names_for_merge("non_train_data", "intra_cv")
['validation_data', 'holdout_data']
>>> names_for_merge("non_train_inputs", "intra_cv")
['validation_inputs', 'holdout_inputs', 'test_inputs']
>>> names_for_merge("non_train_targets", "intra_cv")
['validation_targets', 'holdout_targets']
>>> names_for_merge("non_train_data", "pre_cv")
['holdout_data']
>>> names_for_merge("non_train_inputs", "pre_cv")
['holdout_inputs', 'test_inputs']
>>> names_for_merge("non_train_targets", "pre_cv")
['holdout_targets']

hyperparameter_hunter.feature_engineering.merge_dfs(merge_to:str, stage:str, dfs:Dict[str, pandas.core.frame.DataFrame]) → pandas.core.frame.DataFrame¶

Construct a multi-indexed DataFrame containing the values of dfs deemed necessary by merge_to and stage. This is the opposite of split_merged_df

Parameters

merge_to: String: Type of merged_df to produce. Should be one of the following: {“all_data”, “all_inputs”, “all_targets”, “non_train_data”, “non_train_inputs”, “non_train_targets”}
stage: String in {“pre_cv”, “intra_cv}: Feature engineering stage for which merged_df is requested
dfs: Dict: Mapping of dataset names to their DataFrame values. Keys in dfs should be a subset of {“train_data”, “train_inputs”, “train_targets”, “validation_data”, “validation_inputs”, “validation_targets”, “holdout_data”, “holdout_inputs”, “holdout_targets”, “test_inputs”}

Returns

merged_df: pd.DataFrame: Multi-indexed DataFrame, in which the first index is a string naming the dataset in dfs from which the corresponding data originates. The following index(es) are the original index(es) from the dataset in dfs. All primary indexes in merged_df will be one of the strings considered to be valid keys for dfs

Raises

ValueError: If all the DataFrames that would have been used in merged_df are None. This can happen if requesting merge_to=”non_train_targets” during stage=”pre_cv” when there is no holdout dataset available. Under these circumstances, the holdout dataset targets would be the sole contents of merged_df, rendering merged_df invalid since the data is unavailable

See also

names_for_merge: Describes how stage values differ

hyperparameter_hunter.feature_engineering.split_merged_df(merged_df:pandas.core.frame.DataFrame) → Dict[str, pandas.core.frame.DataFrame]¶

Separate a multi-indexed DataFrame into a dict mapping primary indexes in merged_df to DataFrames containing one fewer dimension than merged_df. This is the opposite of merge_dfs

Parameters

merged_df: pd.DataFrame: Multi-indexed DataFrame of the form returned by merge_dfs() to split into the separate DataFrames named by the primary indexes of merged_df

Returns

dfs: Dict: Mapping of dataset names to their DataFrame values. Keys in dfs will be a subset of {“train_data”, “train_inputs”, “train_targets”, “validation_data”, “validation_inputs”, “validation_targets”, “holdout_data”, “holdout_inputs”, “holdout_targets”, “test_inputs”} containing only those values that are also primary indexes in merged_df

hyperparameter_hunter.feature_engineering.validate_dataset_names(params:Tuple[str], stage:str) → List[str]¶

Produce the names of merged datasets in params and verify there are no duplicate references to any datasets in params

Parameters

params: Tuple[str]: Dataset names requested by a feature engineering step callable. Must be a subset of {“train_data”, “train_inputs”, “train_targets”, “validation_data”, “validation_inputs”, “validation_targets”, “holdout_data”, “holdout_inputs”, “holdout_targets”, “test_inputs”, “all_data”, “all_inputs”, “all_targets”, “non_train_data”, “non_train_inputs”, “non_train_targets”}
stage: String in {“pre_cv”, “intra_cv}: Feature engineering stage for which merged_df is requested

Returns

List[str]: Names of merged datasets in params

Raises

ValueError: If requested params contain a duplicate reference to any dataset, either by way of merging/coupling or not

class hyperparameter_hunter.feature_engineering.EngineerStep(f: Callable, stage=None, name=None, params=None, do_validate=False)¶

Bases: object

Container for individual FeatureEngineer step functions

Compartmentalizes functions of singular engineer steps and allows for greater customization than a raw engineer step function

Parameters

f: Callable

Feature engineering step function that requests, modifies, and returns datasets params

Step functions should follow these guidelines:

Request as input a subset of the 11 data strings listed in params

Do whatever you want to the DataFrames given as input

Return new DataFrame values of the input parameters in same order as requested

If performing a task like target transformation, causing predictions to be transformed, it is often desirable to inverse-transform the predictions to be of the expected form. This can easily be done by returning an extra value from f (after the datasets) that is either a callable, or a transformer class that was fitted during the execution of f and implements an inverse_transform method. This is the only instance in which it is acceptable for f to return values that don’t mimic its input parameters. See the engineer function definition using SKLearn’s QuantileTransformer in the Examples section below for an actual inverse-transformation-compatible implementation

stage: String in {“pre_cv”, “intra_cv”}, or None, default=None

Feature engineering stage during which the callable f will be given the datasets params to modify and return. If None, will be inferred based on params.

“pre_cv” functions are applied only once in the experiment: when it starts

“intra_cv” functions are reapplied for each fold in the cross-validation splits

If stage is left to be inferred, “pre_cv” will usually be selected. However, if any params (or parameters in the signature of f) are prefixed with “validation…” or “non_train…”, then stage will inferred as “intra_cv”. See the Notes section below for suggestions on the stage to use for different functions

name: String, or None, default=None

Identifier for the transformation applied by this engineering step. If None, f.__name__ will be used

params: Tuple[str], or None, default=None

Dataset names requested by feature engineering step callable f. If None, will be inferred by parsing the signature of f. Must be a subset of the following 11 strings:

Input Data

“train_inputs”
“validation_inputs”
“holdout_inputs”
“test_inputs”
“all_inputs”
("train_inputs" + ["validation_inputs"] + "holdout_inputs" + "test_inputs")
“non_train_inputs”
(["validation_inputs"] + "holdout_inputs" + "test_inputs")

Target Data

“train_targets”
“validation_targets”
“holdout_targets”
“all_targets” ("train_targets" + ["validation_targets"] + "holdout_targets")
“non_train_targets” (["validation_targets"] + "holdout_targets")

As an alternative to the above list, just remember that the first half of all parameter names should be one of {“train”, “validation”, “holdout”, “test”, “all”, “non_train”}, and the second half should be either “inputs” or “targets”. The only exception to this rule is “test_targets”, which doesn’t exist.

Inference of “validation” params is affected by stage. During the “pre_cv” stage, the validation dataset has not yet been created and is still a part of the train dataset. During the “intra_cv” stage, the validation dataset is created by removing a portion of the train dataset, and their values passed to f reflect this fact. This also means that the values of the merged (“all”/”non_train”-prefixed) datasets may or may not contain “validation” data depending on the stage; however, this is all handled internally, so you probably don’t need to worry about it.

params may not include multiple references to the same dataset, either directly or indirectly. This means (“train_inputs”, “train_inputs”) is invalid due to duplicate direct references. Less obviously, (“train_inputs”, “all_inputs”) is invalid because “all_inputs” includes “train_inputs”

do_validate: Boolean, or “strict”, default=False

… Experimental… Whether to validate the datasets resulting from feature engineering steps. If True, hashes of the new datasets will be compared to those of the originals to ensure they were actually modified. Results will be logged. If do_validate = “strict”, an exception will be raised if any anomalies are found, rather than logging a message. If do_validate = False, no validation will be performed

See also

FeatureEngineer: The container for EngineerStep instances - EngineerStep`s should always be provided to HyperparameterHunter through a `FeatureEngineer
Categorical: Can be used during optimization to search through a group of EngineerStep`s given as `categories. The optional kwarg of Categorical designates a FeatureEngineer step that may be one of the EngineerStep`s in `categories, or may be omitted entirely
get_engineering_step_stage(): More information on stage inference and situations where overriding it may be prudent

Notes

stage: Generally, feature engineering conducted in the “pre_cv” stage should regard each sample/row as independent entities. For example, steps like converting a string day of the week to one-hot encoded columns, or imputing missing values by replacement with -1 might be conducted “pre_cv”, since they are unlikely to introduce an information leakage. Conversely, steps like scaling/normalization, whose results for the data in one row are affected by the data in other rows should be performed “intra_cv” in order to recalculate the final values of the datasets for each cross validation split and avoid information leakage.

params: In the list of the 11 valid params strings, “test_inputs” is notably missing the “…_targets” counterpart accompanying the other datasets. The “targets” suffix is missing because test data targets are never given. Note that although “test_inputs” is still included in both “all_inputs” and “non_train_inputs”, its lack of a target column means that “all_targets” and “non_train_targets” may have different lengths than their “inputs”-suffixed counterparts

Examples

>>> from sklearn.preprocessing import StandardScaler, QuantileTransformer
>>> def s_scale(train_inputs, non_train_inputs):
...     s = StandardScaler()
...     train_inputs[train_inputs.columns] = s.fit_transform(train_inputs.values)
...     non_train_inputs[train_inputs.columns] = s.transform(non_train_inputs.values)
...     return train_inputs, non_train_inputs
>>> # Sensible parameter defaults inferred based on `f`
>>> es_0 = EngineerStep(s_scale)
>>> es_0.stage
'intra_cv'
>>> es_0.name
's_scale'
>>> es_0.params
('train_inputs', 'non_train_inputs')
>>> # Override `stage` if you want to fit your scaler on OOF data like a crazy person
>>> es_1 = EngineerStep(s_scale, stage="pre_cv")
>>> es_1.stage
'pre_cv'

Watch out for multiple requests to the same data

>>> es_2 = EngineerStep(s_scale, params=("train_inputs", "all_inputs"))
Traceback (most recent call last):
    File "feature_engineering.py", line ? in validate_dataset_names
ValueError: Requested params include duplicate references to `train_inputs` by way of:
   - ('all_inputs', 'train_inputs')
   - ('train_inputs',)
Each dataset may only be requested by a single param for each function

Error is the same if `(train_inputs, all_inputs)` is in the actual function signature

EngineerStep functions aren’t just limited to transformations. Make your own features!

>>> def sqr_sum(all_inputs):
...     all_inputs["square_sum"] = all_inputs.agg(
...         lambda row: np.sqrt(np.sum([np.square(_) for _ in row])), axis="columns"
...     )
...     return all_inputs
>>> es_3 = EngineerStep(sqr_sum)
>>> es_3.stage
'pre_cv'
>>> es_3.name
'sqr_sum'
>>> es_3.params
('all_inputs',)

Inverse-transformation Implementation:

>>> def q_transform(train_targets, non_train_targets):
...     t = QuantileTransformer(output_distribution="normal")
...     train_targets[train_targets.columns] = t.fit_transform(train_targets.values)
...     non_train_targets[train_targets.columns] = t.transform(non_train_targets.values)
...     return train_targets, non_train_targets, t
>>> # Note that `train_targets` and `non_train_targets` must still be returned in order,
>>> #   but they are followed by `t`, an instance of `QuantileTransformer` we just fitted,
>>> #   whose `inverse_transform` method will be called on predictions
>>> es_4 = EngineerStep(q_transform)
>>> es_4.stage
'intra_cv'
>>> es_4.name
'q_transform'
>>> es_4.params
('train_targets', 'non_train_targets')
>>> # `params` does not include any returned transformers - Only data requested as input

Attributes

f: Feature engineering step callable that requests, modifies, and returns datasets
name: Identifier for the transformation applied by this engineering step
params: Dataset names requested by feature engineering step callable f.
stage: Feature engineering stage during which the EngineerStep will be executed

Methods

`__call__`(self, \\datasets, …)	Apply `f` to datasets to produce updated datasets.
`get_comparison_attrs`(step_obj, dict])	Build a dict of critical `EngineerStep` attributes
`get_datasets_for_f`(self, datasets, …)	Produce a dict of DataFrames containing only the merged datasets and standard datasets requested in `params`.
`get_key_data`(self)	Produce a dict of critical attributes describing the `EngineerStep` instance for use by key-making classes
`honorary_step_from_dict`(step_dict, dimension)	Get an EngineerStep from dimension that is equal to its dict form, step_dict
`inverse_transform`(self, data)	Perform the inverse transformation for this engineer step (if it exists)
`stringify`(self)	Make a stringified representation of self, compatible with `EngineerStep.__eq__()`

inverse_transform(self, data)¶

Perform the inverse transformation for this engineer step (if it exists)

Parameters

data: Array-like: Data to inverse transform with inversion or inversion.inverse_transform

Returns

Array-like: If inversion is None, return data unmodified. Else, return the result of inversion or inversion.inverse_transform, given data

get_datasets_for_f(self, datasets:Dict[str, pandas.core.frame.DataFrame]) → Dict[str, pandas.core.frame.DataFrame]¶

Produce a dict of DataFrames containing only the merged datasets and standard datasets requested in params. In other words, add the requested merged datasets and remove unnecessary standard datasets

Parameters

datasets: DFDict: Original dict of datasets, containing all datasets provided to EngineerStep.__call__(), some of which may be superfluous, or may require additional processing to resolve merged/coupled datasets

Returns

DFDict: Updated version of datasets, in which unnecessary datasets have been filtered out, and the requested merged datasets have been added

get_key_data(self) → dict¶

Produce a dict of critical attributes describing the EngineerStep instance for use by key-making classes

Returns

Dict: Important attributes describing this EngineerStep instance

property f¶: Feature engineering step callable that requests, modifies, and returns datasets

property name¶: Identifier for the transformation applied by this engineering step

property params¶: Dataset names requested by feature engineering step callable f. See documentation in EngineerStep.__init__() for more information/restrictions

property stage¶: Feature engineering stage during which the EngineerStep will be executed

static get_comparison_attrs(step_obj:Union[_ForwardRef('EngineerStep'), dict]) → dict¶

Build a dict of critical EngineerStep attributes

Parameters

step_obj: EngineerStep, dict: Object for which critical EngineerStep attributes should be collected

Returns

attr_vals: Dict: Critical EngineerStep attributes. If step_obj does not have a necessary attribute (for EngineerStep) or a necessary key (for dict), its value in attr_vals will be a placeholder object. This is to facilitate comparison, while also ensuring missing values will always be considered unequal to other values

Examples

>>> def dummy_f(train_inputs, non_train_inputs):
...     return train_inputs, non_train_inputs
>>> es_0 = EngineerStep(dummy_f)
>>> EngineerStep.get_comparison_attrs(es_0)  # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
{'name': 'dummy_f',
 'f': <function dummy_f at ...>,
 'params': ('train_inputs', 'non_train_inputs'),
 'stage': 'intra_cv',
 'do_validate': False}
>>> EngineerStep.get_comparison_attrs(
...     dict(foo="hello", f=dummy_f, params=["all_inputs", "all_targets"], stage="pre_cv")
... )  # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
{'name': <object object at ...>,
 'f': <function dummy_f at ...>,
 'params': ('all_inputs', 'all_targets'),
 'stage': 'pre_cv',
 'do_validate': <object object at ...>}

stringify(self) → str¶

Make a stringified representation of self, compatible with EngineerStep.__eq__()

Returns

String: String describing all critical attributes of the EngineerStep instance. This value is not particularly human-friendly due to both its length and the fact that EngineerStep.f is represented by its hash

Examples

>>> def dummy_f(train_inputs, non_train_inputs):
...     return train_inputs, non_train_inputs
>>> EngineerStep(dummy_f).stringify()  # doctest: +ELLIPSIS
"EngineerStep(dummy_f, ..., ('train_inputs', 'non_train_inputs'), intra_cv, False)"
>>> EngineerStep(dummy_f, stage="pre_cv").stringify()  # doctest: +ELLIPSIS
"EngineerStep(dummy_f, ..., ('train_inputs', 'non_train_inputs'), pre_cv, False)"

classmethod honorary_step_from_dict(step_dict:dict, dimension:hyperparameter_hunter.space.dimensions.Categorical)¶

Get an EngineerStep from dimension that is equal to its dict form, step_dict

Parameters

step_dict: Dict

Dict of form saved in Experiment description files for EngineerStep. Expected to have following keys, with values of the given types:

“name”: String
“f”: String (SHA256 hash)
“params”: List[str], or Tuple[str, …]
“stage”: String in {“pre_cv”, “intra_cv”}
“do_validate”: Boolean

dimension: Categorical

Categorical instance expected to contain the EngineerStep equivalent of step_dict in its categories

Returns

EngineerStep: From dimension.categories if it is the EngineerStep equivalent of step_dict

Raises

ValueError: If dimension.categories does not contain an EngineerStep matching step_dict

class hyperparameter_hunter.feature_engineering.FeatureEngineer(steps=None, do_validate=False, **datasets)¶

Bases: object

Class to organize feature engineering step callables steps (EngineerStep instances) and the datasets that the steps request and return.

Parameters

steps: List, or None, default=None

List of arbitrary length, containing any of the following values:

EngineerStep instance,

Function to provide as input to EngineerStep, or

Categorical, with categories comprising a selection of the previous two steps values (optimization only)

The third value can only be used during optimization. The feature_engineer provided to CVExperiment, for example, may only contain the first two values. To search a space optionally including an EngineerStep, use the optional kwarg of Categorical.

See EngineerStep for information on properly formatted EngineerStep functions. Additional engineering steps may be added via add_step()

do_validate: Boolean, or “strict”, default=False

… Experimental… Whether to validate the datasets resulting from feature engineering steps. If True, hashes of the new datasets will be compared to those of the originals to ensure they were actually modified. Results will be logged. If do_validate = “strict”, an exception will be raised if any anomalies are found, rather than logging a message. If do_validate = False, no validation will be performed

**datasets: DFDict

This is not expected to be provided on initialization and is offered primarily for debugging/testing. Mapping of datasets necessary to perform feature engineering steps

See also

EngineerStep: For proper formatting of non-Categorical values of steps

Notes

If steps does include any instances of hyperparameter_hunter.space.dimensions.Categorical, this FeatureEngineer instance will not be usable by Experiments. It can only be used by Optimization Protocols. Furthermore, the FeatureEngineer that the Optimization Protocol actually ends up using will not pass identity checks against the original FeatureEngineer that contained Categorical steps

Examples

>>> from sklearn.preprocessing import StandardScaler, MinMaxScaler, QuantileTransformer
>>> # Define some engineer step functions to play with
>>> def s_scale(train_inputs, non_train_inputs):
...     s = StandardScaler()
...     train_inputs[train_inputs.columns] = s.fit_transform(train_inputs.values)
...     non_train_inputs[train_inputs.columns] = s.transform(non_train_inputs.values)
...     return train_inputs, non_train_inputs
>>> def mm_scale(train_inputs, non_train_inputs):
...     s = MinMaxScaler()
...     train_inputs[train_inputs.columns] = s.fit_transform(train_inputs.values)
...     non_train_inputs[train_inputs.columns] = s.transform(non_train_inputs.values)
...     return train_inputs, non_train_inputs
>>> def q_transform(train_targets, non_train_targets):
...     t = QuantileTransformer(output_distribution="normal")
...     train_targets[train_targets.columns] = t.fit_transform(train_targets.values)
...     non_train_targets[train_targets.columns] = t.transform(non_train_targets.values)
...     return train_targets, non_train_targets, t
>>> def sqr_sum(all_inputs):
...     all_inputs["square_sum"] = all_inputs.agg(
...         lambda row: np.sqrt(np.sum([np.square(_) for _ in row])), axis="columns"
...     )
...     return all_inputs

FeatureEngineer steps wrapped by `EngineerStep` == raw function steps - as long as the `EngineerStep` is using the default parameters

>>> # FeatureEngineer steps wrapped by `EngineerStep` == raw function steps
>>> #   ... As long as the `EngineerStep` is using the default parameters
>>> fe_0 = FeatureEngineer([sqr_sum, s_scale])
>>> fe_1 = FeatureEngineer([EngineerStep(sqr_sum), EngineerStep(s_scale)])
>>> fe_0.steps == fe_1.steps
True
>>> fe_2 = FeatureEngineer([sqr_sum, EngineerStep(s_scale), q_transform])

`Categorical` can be used during optimization and placed anywhere in `steps`. `Categorical` can also handle either `EngineerStep` categories or raw functions. Use the `optional` kwarg of `Categorical` to test some questionable steps

>>> fe_3 = FeatureEngineer([sqr_sum, Categorical([s_scale, mm_scale]), q_transform])
>>> fe_4 = FeatureEngineer([Categorical([sqr_sum], optional=True), s_scale, q_transform])
>>> fe_5 = FeatureEngineer([
...     Categorical([sqr_sum], optional=True),
...     Categorical([EngineerStep(s_scale), mm_scale]),
...     q_transform
... ])

Attributes

steps: Feature engineering steps to execute in sequence on FeatureEngineer.__call__()

Methods

`__call__`(self, stage, \\datasets, …)	Execute all feature engineering steps in `steps` for stage, with datasets datasets as inputs
`add_step`(self, step, …)	Add an engineering step to `steps` to be executed with the other contents of `steps` on `FeatureEngineer.__call__()`
`get_key_data`(self)	Produce a dict of critical attributes describing the `FeatureEngineer` instance for use by key-making classes
`inverse_transform`(self, data)	Perform the inverse transformation for all engineer steps in `steps` in sequence on data

inverse_transform(self, data)¶

Perform the inverse transformation for all engineer steps in steps in sequence on data

Parameters

data: Array-like: Data to inverse transform with any inversions present in steps

Returns

Array-like: Result of sequentially calling inverse transformations in steps on data. If any step has EngineerStep.inversion = None, data is unmodified for that step, and proceeds to next engineer step inversion

property steps¶: Feature engineering steps to execute in sequence on FeatureEngineer.__call__()

get_key_data(self) → dict¶

Produce a dict of critical attributes describing the FeatureEngineer instance for use by key-making classes

Returns

Dict: Important attributes describing this FeatureEngineer instance

add_step(self, step:Union[Callable, hyperparameter_hunter.space.dimensions.Categorical], stage:str=None, name:str=None, before:str=<class 'hyperparameter_hunter.feature_engineering.EMPTY_SENTINEL'>, after:str=<class 'hyperparameter_hunter.feature_engineering.EMPTY_SENTINEL'>, number:int=<class 'hyperparameter_hunter.feature_engineering.EMPTY_SENTINEL'>)¶

Add an engineering step to steps to be executed with the other contents of steps on FeatureEngineer.__call__()

Parameters

step: Callable, or `EngineerStep`, or `Categorical`: If EngineerStep instance, will be added directly to steps. Otherwise, must be a feature engineering step callable that requests, modifies, and returns datasets, which will be used to instantiate a EngineerStep to add to steps. If Categorical, categories should contain EngineerStep instances or callables
stage: String in {“pre_cv”, “intra_cv”}, or None, default=None: Feature engineering stage during which the callable step will be executed
name: String, or None, default=None: Identifier for the transformation applied by this engineering step. If None and step is not an EngineerStep, will be inferred during EngineerStep instantiation
before: String, default=EMPTY_SENTINEL: … Experimental…
after: String, default=EMPTY_SENTINEL: … Experimental…
number: String, default=EMPTY_SENTINEL: … Experimental…

hyperparameter_hunter.feature_engineering.get_engineering_step_stage(datasets:Tuple[str, ...]) → str¶

Determine the stage in which a feature engineering step that requests datasets as input should be executed

Parameters

datasets: Tuple[str]: Dataset names requested by a feature engineering step callable

Returns

stage: {“pre_cv”, “intra_cv”}: “pre_cv” if a step processing the given datasets should be executed in the pre-cross-validation stage. “intra_cv” if the step should be executed for each cross-validation split. If any of the elements in datasets is prefixed with “validation” or “non_train”, stage will be “intra_cv”. Otherwise, it will be “pre_cv”

Notes

Generally, feature engineering conducted in the “pre_cv” stage should regard each sample/row as independent entities. For example, steps like converting a string day of the week to one-hot encoded columns, or imputing missing values by replacement with -1 might be conducted “pre_cv”, since they are unlikely to introduce an information leakage. Conversely, steps like scaling/normalization, whose results for the data in one row are affected by the data in other rows should be performed “intra_cv” in order to recalculate the final values of the datasets for each cross validation split and avoid information leakage

Technically, the inference of stage=”intra_cv” due to the existence of a “non_train”-prefixed value in datasets could unnecessarily force steps to be executed “intra_cv” if, for example, there is no validation data. However, this is safer than the alternative of executing these steps “pre_cv”, in which validation data would be a subset of train data, probably introducing information leakage. A simple workaround for this is to explicitly provide EngineerStep with the desired stage parameter to bypass this inference

Examples

>>> get_engineering_step_stage(("train_inputs", "validation_inputs", "holdout_inputs"))
'intra_cv'
>>> get_engineering_step_stage(("all_data"))
'pre_cv'
>>> get_engineering_step_stage(("all_inputs", "all_targets"))
'pre_cv'
>>> get_engineering_step_stage(("train_data", "non_train_data"))
'intra_cv'

class hyperparameter_hunter.feature_engineering.ParameterParser¶

Bases: ast.NodeVisitor

ast.NodeVisitor subclass that collects the arguments specified in the signature of a callable node, as well as the values returned by the callable, in the attributes args and returns, respectively

Methods

`generic_visit`(self, node)	Called if no explicit visitor function exists for a node.
`visit`(self, node)	Visit a node.

visit_Return
visit_arg

visit_arg(self, node)¶

visit_Return(self, node)¶

hyperparameter_hunter.feature_engineering.get_engineering_step_params(f:<built-in function callable>) → Tuple[str]¶

Verify that callable f requests valid input parameters, and returns a tuple of the same parameters, with the assumption that the parameters are modified by f

Parameters

f: Callable: Feature engineering step function that requests, modifies, and returns datasets

Returns

Tuple: Argument/return value names declared by f

Examples

>>> def impute_negative_one(all_inputs):
...     all_inputs.fillna(-1, inplace=True)
...     return all_inputs
>>> get_engineering_step_params(impute_negative_one)
('all_inputs',)
>>> def standard_scale(train_inputs, non_train_inputs):
...     scaler = StandardScaler()
...     train_inputs[train_inputs.columns] = scaler.fit_transform(train_inputs.values)
...     non_train_inputs[train_inputs.columns] = scaler.transform(non_train_inputs.values)
...     return train_inputs, non_train_inputs
>>> get_engineering_step_params(standard_scale)
('train_inputs', 'non_train_inputs')
>>> def error_invalid_dataset(train_inputs, foo):
...     return train_inputs, foo
>>> get_engineering_step_params(error_invalid_dataset)
Traceback (most recent call last):
    File "feature_engineering.py", line ?, in get_engineering_step_params
ValueError: Invalid dataset name: 'foo'

hyperparameter_hunter.feature_engineering.hash_datasets(datasets:dict) → dict¶

Describe datasets with dicts of hashes for their values, column names, and column values

Parameters

datasets: Dict: Mapping of dataset names to pandas.DataFrame instances

Returns

hashes: Dict: Mapping with same keys as datasets, whose values are dicts returned from _hash_dataset() that provide hashes for each DataFrame and its column names/values

Examples

>>> df_x = pd.DataFrame(dict(a=[0, 1], b=[2, 3], c=[4, 5]))
>>> df_y = pd.DataFrame(dict(a=[0, 1], b=[6, 7], d=[8, 9]))
>>> hash_datasets(dict(x=df_x, y=df_y)) == dict(x=_hash_dataset(df_x), y=_hash_dataset(df_y))
True

hyperparameter_hunter.importer module¶

This module provides utilities to intercept external imports and load them using custom logic

Related¶

hyperparameter_hunter.__init__: Executes the import hooks to ensure assets are properly imported prior to starting any real work
hyperparameter_hunter.tracers: Defines tracing metaclasses applied by hyperparameter_hunter.importer to imports

class hyperparameter_hunter.importer.Interceptor(module_name, custom_loader, asset_name=None)¶

Bases: _frozen_importlib_external.PathFinder

Class to intercept loading of an external module in order to provide custom loading logic

Parameters

module_name: String: The path of the module, for which loading should be handled by custom_loader
custom_loader: Descendant of `importlib.machinery.SourceFileLoader`: Should implement exec_module(), which should call its superclass’s exec_module(), then perform the custom loading logic, and return module

Methods

`find_module`(fullname[, path])	find the module on sys.path or ‘path’ based on sys.path_hooks and sys.path_importer_cache.
`find_spec`(self, full_name[, path, target])	Perform custom loading logic if full_name == `module_name`
`invalidate_caches`()	Call the invalidate_caches() method on all path entry finders stored in sys.path_importer_caches (where implemented).

find_spec(self, full_name, path=None, target=None)¶: Perform custom loading logic if full_name == module_name

class hyperparameter_hunter.importer.KerasLayerLoader(fullname, path)¶

Bases: _frozen_importlib_external.SourceFileLoader

Cache the module name and the path to the file found by the finder.

Methods

`create_module`(self, spec)	Use default semantics for module creation.
`exec_module`(self, module)	Set module.Layer to a traced version of itself via `tracers.ArgumentTracer`
`get_code`(self, fullname)	Concrete implementation of InspectLoader.get_code.
`get_data`(self, path)	Return the data from path as raw bytes.
`get_filename`(self[, name])	Return the path to the source file as found by the finder.
`get_source`(self, fullname)	Concrete implementation of InspectLoader.get_source.
`is_package`(self, fullname)	Concrete implementation of InspectLoader.is_package by checking if the path returned by get_filename has a filename of ‘__init__.py’.
`load_module`(self[, name])	Load a module from a file.
`path_mtime`(self, path)	Optional method that returns the modification time (an int) for the specified path, where path is a str.
`path_stats`(self, path)	Return the metadata for the path.
`set_data`(self, path, data, \*[, _mode])	Write bytes data to a file.
`source_to_code`(self, data, path, \*[, _optimize])	Return the code object compiled from source.

exec_module(self, module)¶: Set module.Layer to a traced version of itself via tracers.ArgumentTracer

hyperparameter_hunter.importer.hook_keras_layer()¶: If Keras has yet to be imported, modify the inheritance structure of its base Layer class to inject attributes that keep track of the parameters provided to each layer

class hyperparameter_hunter.importer.KerasMultiInitializerLoader(fullname, path)¶

Bases: _frozen_importlib_external.SourceFileLoader

Cache the module name and the path to the file found by the finder.

Methods

`create_module`(self, spec)	Use default semantics for module creation.
`exec_module`(self, module)	Execute the module.
`get_code`(self, fullname)	Concrete implementation of InspectLoader.get_code.
`get_data`(self, path)	Return the data from path as raw bytes.
`get_filename`(self[, name])	Return the path to the source file as found by the finder.
`get_source`(self, fullname)	Concrete implementation of InspectLoader.get_source.
`is_package`(self, fullname)	Concrete implementation of InspectLoader.is_package by checking if the path returned by get_filename has a filename of ‘__init__.py’.
`load_module`(self[, name])	Load a module from a file.
`path_mtime`(self, path)	Optional method that returns the modification time (an int) for the specified path, where path is a str.
`path_stats`(self, path)	Return the metadata for the path.
`set_data`(self, path, data, \*[, _mode])	Write bytes data to a file.
`source_to_code`(self, data, path, \*[, _optimize])	Return the code object compiled from source.

exec_module(self, module)¶: Execute the module.

hyperparameter_hunter.importer.hook_keras_initializers()¶

hyperparameter_hunter.leaderboards module¶

This module defines the Leaderboard classes that are saved to the ‘HyperparameterHunterAssets/Leaderboards’ subdirectory. It provides the ability to compare all Experiment results at a glance

Related¶

hyperparameter_hunter.recorders: This module initiates the saving of Experiment entries to Leaderboards

class hyperparameter_hunter.leaderboards.Leaderboard(data=None)¶

Bases: object

The Leaderboard class is used for reading, updating, and saving leaderboard files within the ‘HyperparameterHunterAssets/Leaderboards’ subdirectory

Parameters

data: pd.DataFrame, or None, default=None: The starting state of the Leaderboard. If None, an empty DataFrame is used

Methods

`add_entry`(self, experiment, \\kwargs)	Add an entry row for experiment to `data`
`from_path`(path[, assert_existence])	Initialize a Leaderboard from a .csv path
`save`(self, path, \\kwargs)	Save the Leaderboard instance
`sort`(self, by[, ascending])	Sort the rows in `data` according to the values of a column

classmethod from_path(path, assert_existence=False)¶

Initialize a Leaderboard from a .csv path

Parameters

path: str: The path of the file to read in as a DataFrame
assert_existence: boolean, default=False: If False, and pandas.read_csv() raises FileNotFoundError, the Leaderboard will be initialized with None. Else the exception is raised normally

abstract add_entry(self, experiment, **kwargs)¶

Add an entry row for experiment to data

Parameters

experiment: :class:`experiments.BaseExperiment`: An instance of a completed Experiment from which to construct a Leaderboard entry

save(self, path, **kwargs)¶

Save the Leaderboard instance

Parameters

path: str: The file to which the Leaderboard instance should be saved
**kwargs: Dict: Additional arguments to supply to pandas.DataFrame.to_csv()

sort(self, by, ascending=False)¶

Sort the rows in data according to the values of a column

Parameters

by: str, or list of str: The column name(s) by which to sort the rows of data
ascending: boolean, default=False: The direction in which to sort the rows of data

class hyperparameter_hunter.leaderboards.GlobalLeaderboard(data=None)¶

Bases: hyperparameter_hunter.leaderboards.Leaderboard

The Leaderboard class is used for reading, updating, and saving leaderboard files within the ‘HyperparameterHunterAssets/Leaderboards’ subdirectory

Parameters

data: pd.DataFrame, or None, default=None: The starting state of the Leaderboard. If None, an empty DataFrame is used

Methods

`add_entry`(self, experiment, \\kwargs)	Add an entry row to `Leaderboard.data` (pandas.DataFrame).
`from_path`(path[, assert_existence])	Initialize a Leaderboard from a .csv path
`save`(self, path, \\kwargs)	Save the Leaderboard instance
`sort`(self, by[, ascending])	Sort the rows in `data` according to the values of a column

add_entry(self, experiment, **kwargs)¶

Add an entry row to Leaderboard.data (pandas.DataFrame). This method also handles column conflicts to an extent

Parameters

experiment: Instance of :class:`experiments.BaseExperiment` descendant: An Experiment instance for which a leaderboard entry row should be added
**kwargs: Dict: Extra keyword arguments

hyperparameter_hunter.leaderboards.evaluations_to_columns(evaluation:Dict[str, Union[collections.OrderedDict, NoneType]], decimals=10) → List[Tuple[str, numbers.Number]]¶

Convert the results of metrics.ScoringMixIn.evaluate() to a pd.DataFrame-ready format

Parameters

evaluation: Dict[str, OrderedDict]: The result of consecutive calls to metrics.ScoringMixIn.evaluate() for all given dataset types
decimals: Int, default=10: Number of decimal places to which to round. If decimals is negative, it specifies the number of positions to the left of the decimal point

Returns

column_metrics: list of pairs: A pair for each data_type-metric combination, where the first item is the key, and the second is the metric value

Examples

>>> evaluations_to_columns({
...     'in_fold': None,
...     'holdout': OrderedDict([('roc_auc_score', 0.9856), ('f1_score', 0.9768)]),
...     'oof': OrderedDict([('roc_auc_score', 0.9634)])
... })
[('oof_roc_auc_score', 0.9634), ('holdout_roc_auc_score', 0.9856), ('holdout_f1_score', 0.9768)]

hyperparameter_hunter.leaderboards.combine_column_order(df_1, df_2, both_cols=None)¶

Determine the sort order for the combined columns of two DataFrames

Parameters

df_1: pd.DataFrame: The first DataFrame, whose columns will be sorted. Columns unique to df_1 will be sorted before those of df_2
df_2: pd.DataFrame: The second DataFrame, whose columns will be sorted. Columns unique to df_2 will be sorted after those of df_1
both_cols: list, or None, default=None: If list, the column names that should be common to both DataFrames and placed last in the sort order

Returns

combined_cols: list of strings: The result of combining and sorting column names from df_1, and df_2

Examples

>>> df_1 = pd.DataFrame(columns=['A', 'B', 'C', 'Common_1', 'Common_2'])
>>> df_2 = pd.DataFrame(columns=['A', 'D', 'E', 'Common_1', 'Common_2'])
>>> combine_column_order(df_1, df_2, both_cols=['Common_1', 'Common_2'])
['A', 'B', 'C', 'D', 'E', 'Common_1', 'Common_2']
>>> combine_column_order(df_1, df_2, both_cols=None)
['A', 'Common_1', 'Common_2', 'B', 'C', 'D', 'E']

hyperparameter_hunter.metrics module¶

This module defines hyperparameter_hunter.metrics.ScoringMixIn which enables hyperparameter_hunter.experiments.BaseExperiment to score predictions and collect the results of those evaluations

Related¶

hyperparameter_hunter.experiments: This module uses hyperparameter_hunter.metrics.ScoringMixIn as the only explicit parent class to hyperparameter_hunter.experiments.BaseExperiment (that is, the only parent class that isn’t bestowed upon it by hyperparameter_hunter.experiment_core.ExperimentMeta)

class hyperparameter_hunter.metrics.Metric(name: str, metric_function: Union[callable, str, None] = None, direction: str = 'infer')¶

Bases: object

Class to encapsulate all necessary information for identifying, calculating, and evaluating metrics results

Parameters

name: String

Identifying name of the metric. Should be unique relative to any other metric names that might be provided by the user

metric_function: Callable, string, None, default=None

If callable, should expect inputs of form (target, prediction), and return a float. If string, will be treated as an attribute in sklearn.metrics. If None, name will be treated as an attribute in sklearn.metrics, the value of which will be retrieved and used as metric_function

direction: {“infer”, “max”, “min”}, default=”infer”

How to compare the result of metric_function relative to previous evaluations

“max”: Metric values should be maximized, and higher metric values are better than lower values; it should be used for measures of accuracy
“min”: Metric values should be minimized, and lower metric values are better than higher values; it should be used for measures of error or loss
“infer”: direction will be set to:
1. “min” if name (or metric_function’s name) contains “error” or “loss”
2. “max” if name contains neither of the aforementioned strings

Notes

direction = “infer” looks for “error”/”loss” in name first, then in the name of metric_function. This means that name can be an abbreviation/anything for error measures and direction will still be correctly inferred as long as the actual callable for metric_function has “error”/”loss” in its name. For example, direction = “min” is safely inferred when using “mae” for “mean_absolute_error” or “rmsle” for “root_mean_squared_logarithmic_error”. This functions as described whether metric_function is a string naming an SKLearn metric, or a callable whose name includes “error”/”loss”

Examples

>>> Metric("roc_auc_score")  # doctest: +ELLIPSIS
Metric(roc_auc_score, <function roc_auc_score at 0x...>, max)
>>> Metric("roc_auc_score", sk_metrics.roc_auc_score)  # doctest: +ELLIPSIS
Metric(roc_auc_score, <function roc_auc_score at 0x...>, max)
>>> Metric("my_f1_score", "f1_score")  # doctest: +ELLIPSIS
Metric(my_f1_score, <function f1_score at 0x...>, max)
>>> Metric("hamming_loss", sk_metrics.hamming_loss)  # doctest: +ELLIPSIS
Metric(hamming_loss, <function hamming_loss at 0x...>, min)

Respect explicit `direction` even if it doesn’t make sense for the `metric_function`

>>> Metric("r2_score", sk_metrics.r2_score, direction="min")  # doctest: +ELLIPSIS
Metric(r2_score, <function r2_score at 0x...>, min)

Direction inference based on `metric_function` name, rather than `name` itself

>>> Metric("mae", "median_absolute_error")  # doctest: +ELLIPSIS
Metric(mae, <function median_absolute_error at 0x...>, min)
>>> Metric("hl", sk_metrics.hamming_loss)  # doctest: +ELLIPSIS
Metric(hl, <function hamming_loss at 0x...>, min)

Methods

__call__(self, target, prediction)

Call self as a function.

hyperparameter_hunter.metrics.format_metrics(metrics:Union[Dict, List]) → Dict[str, hyperparameter_hunter.metrics.Metric]¶

Properly format iterable metrics to contain instances of Metric

Parameters

metrics: Dict, List

Iterable describing the metrics to be recorded, along with a means to compute the value of each metric. Should be of one of the two following forms:

List Form:

[“<metric name>”, “<metric name>”, …]: Where each value of the list is a string that names an attribute in sklearn.metrics
[Metric, Metric, …]: Where each value of the list is an instance of Metric
[(<*args>), (<*args>), …]: Where each value of the list is a tuple of arguments that will be used to instantiate a Metric. Arguments given in tuples must be in order expected by Metric

Dict Form:

{“<metric name>”: <metric_function>, …}: Where each key is a name for the corresponding metric callable, which is used to compute the value of the metric
{“<metric name>”: (<metric_function>, <direction>), …}: Where each key is a name for the corresponding metric callable and direction, all of which are used to instantiate a Metric
{“<metric name>”: “<sklearn metric name>”, …}: Where each key is a name for the metric, and each value is the name of the attribute in sklearn.metrics for which the corresponding key is an alias
{“<metric name>”: None, …}: Where each key is the name of the attribute in sklearn.metrics
{“<metric name>”: Metric, …}: Where each key names an instance of Metric. This is the internally-used format to which all other formats will be converted

Metric callable functions should expect inputs of form (target, prediction), and should return floats. See the documentation of Metric for information regarding expected parameters and types

Returns

metrics_dict: Dict: Cast of metrics to a dict, in which values are instances of Metric

Examples

>>> format_metrics(["roc_auc_score", "f1_score"])  # doctest: +ELLIPSIS
{'roc_auc_score': Metric(roc_auc_score, <function roc_auc_score at 0x...>, max), 'f1_score': Metric(f1_score, <function f1_score at 0x...>, max)}
>>> format_metrics([Metric("log_loss"), Metric("r2_score", direction="min")])  # doctest: +ELLIPSIS
{'log_loss': Metric(log_loss, <function log_loss at 0x...>, min), 'r2_score': Metric(r2_score, <function r2_score at 0x...>, min)}
>>> format_metrics({"log_loss": Metric("log_loss"), "r2_score": Metric("r2_score", direction="min")})  # doctest: +ELLIPSIS
{'log_loss': Metric(log_loss, <function log_loss at 0x...>, min), 'r2_score': Metric(r2_score, <function r2_score at 0x...>, min)}
>>> format_metrics([("log_loss", None), ("my_r2_score", "r2_score", "min")])  # doctest: +ELLIPSIS
{'log_loss': Metric(log_loss, <function log_loss at 0x...>, min), 'my_r2_score': Metric(my_r2_score, <function r2_score at 0x...>, min)}
>>> format_metrics({"roc_auc": sk_metrics.roc_auc_score, "f1": sk_metrics.f1_score})  # doctest: +ELLIPSIS
{'roc_auc': Metric(roc_auc, <function roc_auc_score at 0x...>, max), 'f1': Metric(f1, <function f1_score at 0x...>, max)}
>>> format_metrics({"log_loss": (None, ), "my_r2_score": ("r2_score", "min")})  # doctest: +ELLIPSIS
{'log_loss': Metric(log_loss, <function log_loss at 0x...>, min), 'my_r2_score': Metric(my_r2_score, <function r2_score at 0x...>, min)}
>>> format_metrics({"roc_auc": "roc_auc_score", "f1": "f1_score"})  # doctest: +ELLIPSIS
{'roc_auc': Metric(roc_auc, <function roc_auc_score at 0x...>, max), 'f1': Metric(f1, <function f1_score at 0x...>, max)}
>>> format_metrics({"roc_auc_score": None, "f1_score": None})  # doctest: +ELLIPSIS
{'roc_auc_score': Metric(roc_auc_score, <function roc_auc_score at 0x...>, max), 'f1_score': Metric(f1_score, <function f1_score at 0x...>, max)}

hyperparameter_hunter.metrics.get_formatted_target_metric(target_metric:Union[tuple, str, NoneType], metrics:dict, default_dataset:str='oof') → Tuple[str, str]¶

Return a properly formatted target_metric tuple for use with navigating evaluation results

Parameters

target_metric: Tuple, String, or None: Path denoting metric to be used. If tuple, the first value should be in [‘oof’, ‘holdout’, ‘in_fold’], and the second value should be the name of a metric supplied in metrics. If str, should be one of the two values from the tuple form. Else, a value will be chosen
metrics: Dict: Properly formatted metrics as produced by metrics.format_metrics(), in which keys are strings identifying metrics, and values are instances of metrics.Metric. See the documentation of metrics.format_metrics() for more information on different metrics formats
default_dataset: {“oof”, “holdout”, “in_fold”}, default=”oof”: The default dataset type value to use if one is not provided

Returns

target_metric: Tuple: A formatted target_metric containing two strings: a dataset_type, followed by a metric name

Examples

>>> get_formatted_target_metric(('holdout', 'roc_auc_score'), format_metrics(['roc_auc_score', 'f1_score']))
('holdout', 'roc_auc_score')
>>> get_formatted_target_metric(('holdout',), format_metrics(['roc_auc_score', 'f1_score']))
('holdout', 'roc_auc_score')
>>> get_formatted_target_metric('holdout', format_metrics(['roc_auc_score', 'f1_score']))
('holdout', 'roc_auc_score')
>>> get_formatted_target_metric('holdout', format_metrics({'roc': 'roc_auc_score', 'f1': 'f1_score'}))
('holdout', 'roc')
>>> get_formatted_target_metric('roc_auc_score', format_metrics(['roc_auc_score', 'f1_score']))
('oof', 'roc_auc_score')
>>> get_formatted_target_metric(None, format_metrics(['f1_score', 'roc_auc_score']))
('oof', 'f1_score')

class hyperparameter_hunter.metrics.ScoringMixIn(metrics, in_fold='all', oof='all', holdout='all', do_score=True)¶

Bases: object

MixIn class to manage metrics to record for each dataset type, and perform evaluations

Parameters

metrics: Dict, List: Specifies all metrics to be used by their id keys, along with a means to compute the metric. If list, all values must be strings that are attributes in sklearn.metrics. If dict, key/value pairs must be of the form: (<id>, <callable/None/str sklearn.metrics attribute>), where “id” is a str name for the metric. Its corresponding value must be one of: 1) a callable to calculate the metric, 2) None if the “id” key is an attribute in sklearn.metrics and should be used to fetch a callable, 3) a string that is an attribute in sklearn.metrics and should be used to fetch a callable. Metric callable functions should expect inputs of form (target, prediction), and should return floats
in_fold: List of strings, None, default=<all ids in `metrics`>: Which metrics (from ids in metrics) should be recorded for in-fold data
oof: List of strings, None, default=<all ids in `metrics`>: Which metrics (from ids in metrics) should be recorded for out-of-fold data
holdout: List of strings, None, default=<all ids in `metrics`>: Which metrics (from ids in metrics) should be recorded for holdout data
do_score: Boolean, default=True: This is experimental. If False, scores will be neither calculated nor recorded for the duration of the experiment

Notes

For each kwarg in [in_fold, oof, holdout], the following must be true: if the value of the kwarg is a list, its contents must be a subset of metrics

Methods

evaluate(self, data_type, target, prediction)

Apply metric(s) to the given data to calculate the value of the prediction

evaluate(self, data_type, target, prediction, return_list=False, dry_run=False)¶

Apply metric(s) to the given data to calculate the value of the prediction

Parameters

data_type: {“in_fold”, “oof”, “holdout”}: The type of dataset for which target and prediction arguments are being provided
target: Array-like: True labels for the data. Should be same shape as prediction
prediction: Array-like: Predicted labels for the data. Should be same shape as target
return_list: Boolean, default=False: If True, return list of tuples instead of dict. See “Returns” section below for details
dry_run: Boolean, default=False: If True, the value of last_evaluation_results will not be updated to include the returned _result. The core library callbacks operate under the assumption that last_evaluation_results will be updated as usual, so restrict usage to debugging or lambda_callback() implementations

Returns

_result: OrderedDict, or list: A dict whose keys are all metric keys supplied for data_type, and whose values are the results of each metric. If return_list is True, returns a list of tuples of: (<data_type metric str>, <metric result>)

Notes

The required types of target and prediction are entirely dependent on the metric callable’s expectations

hyperparameter_hunter.metrics.get_clean_prediction(target:Iterable, prediction:Iterable)¶

Create prediction that is of a form comparable to target

Parameters

target: Array-like: True labels for the data. Should be same shape as prediction
prediction: Array-like: Predicted labels for the data. Should be same shape as target

Returns

prediction: Array-like: If target types are ints, and prediction types are not, given predicted labels clipped between the min, and max of target, then rounded to the nearest integer. Else, original predicted labels

hyperparameter_hunter.metrics.classify_output(target, prediction)¶

Force continuous prediction into the discrete, classified space of target. This is not an output/feature transformer akin to SKLearn’s discretization transformers. This function is intended for use in the very specific case of having a target that is classification-like (“binary”, “multiclass”, etc.), with prediction that resembles a “continuous” target, despite being made for target. The most common reason for this occurrence is that prediction is actually the division-averaged predictions collected along the course of a CVExperiment. In this case, the original model predictions should have been classification-like; however, due to disagreement in the division predictions, the resulting average predictions appear to be continuous

Parameters

target: Array-like: # TODO: …
prediction: Array-like: # TODO: …

Returns

numpy.array: # TODO: …

Notes

Target types used by this function are defined by sklearn.utils.multiclass.type_of_target.

If a prediction value is exactly between two target values, it will assume the lower of the two values. For example, given a single prediction of 1.5 and unique labels of [0, 1, 2, 3], the value of that prediction will be 1, rather than 2

Examples

>>> import numpy as np
>>> classify_output(np.array([0, 3, 1, 2]), [0.5, 1.51, 0.66, 4.9])
array([0, 2, 1, 3])
>>> classify_output(np.array([0, 1, 2, 3]), [0.5, 1.51, 0.66, 4.9])
array([0, 2, 1, 3])
>>> # TODO: ... Add more examples, including binary classification

hyperparameter_hunter.metrics.wrap_xgboost_metric(metric, metric_name)¶

Create a function to use as the eval_metric kwarg for xgboost.sklearn.XGBModel.fit()

Parameters

metric: Function: The function to calculate the value of metric, with signature: (target, prediction)
metric_name: String: The name of the metric being evaluated

Returns

eval_metric: Function: The function to pass to XGBoost’s fit(), with signature: (prediction, target). It will return a tuple of (metric_name: str, metric_value: float)

hyperparameter_hunter.models module¶

This module provides wrapper classes around the raw algorithms being executed to facilitate use by hyperparameter_hunter.experiments.BaseExperiment. The algorithms created by most libraries can be handled by hyperparameter_hunter.models.Model, but some need special attention, hence KerasModel, and XGBoostModel. The model classes defined herein handle algorithm instantiation, as well as fitting and predicting

Related¶

hyperparameter_hunter.experiments: This module is the primary user of the classes defined in hyperparameter_hunter.models
hyperparameter_hunter.sentinels: This module defines the Sentinel classes that will be converted to the actual values they represent in hyperparameter_hunter.models.Model.__init__()

hyperparameter_hunter.models.load_model(_)¶

hyperparameter_hunter.models.model_selector(model_initializer)¶

Selects the appropriate Model class to use for model_initializer

Parameters

model_initializer: callable: The callable used to create an instance of some algorithm

Returns

Model, or one of its children

Examples

>>> from keras.wrappers.scikit_learn import KerasClassifier, KerasRegressor
>>> model_selector(KerasClassifier) == KerasModel
True
>>> model_selector(KerasRegressor) == KerasModel
True
>>> from sklearn.svm import SVC
>>> model_selector(SVC) == Model
True
>>> model_selector(None) == Model
True

class hyperparameter_hunter.models.Model(model_initializer, initialization_params, extra_params, train_input=None, train_target=None, validation_input=None, validation_target=None, do_predict_proba=False, target_metric=None, metrics=None)¶

Bases: object

Handles initialization, fitting, and prediction for provided algorithms. Consider documentation for children of Model to be identical to that of Model, except where noted

Parameters

model_initializer: Class

Expected to implement at least the following methods: 1) __init__, to which initialization_params will usually be provided unless stated otherwise in a child class’s documentation - like KerasModel. 2) fit, to which train_input, and train_target will be provided, in addition to the contents of extra_params['fit'] in some child classes - like XGBoostModel. 3) predict, or predict_proba if applicable, which should accept any array-like input of shape: (<num_samples>, train_input.shape[1])

initialization_params: Dict

A dict containing all arguments accepted by __init__() of the class model_initializer, unless stated otherwise in a child class’s documentation - like KerasModel. Arguments pertaining to random seeds will be ignored

extra_params: Dict, default={}

A dict of special parameters that are passed to a model’s non-initialization methods in special cases (such as fit, predict, predict_proba, and score). extra_params are not used for all models. See the documentation for the appropriate descendant of models.Model for information about how it handles extra_params

train_input: `pandas.DataFrame`

The model’s training input data

train_target: `pandas.DataFrame`

The true labels corresponding to the rows of train_input

validation_input: `pandas.DataFrame`, or None

The model’s validation input data to evaluate performance during fitting

validation_target: `pandas.DataFrame`, or None

The true labels corresponding to the rows of validation_input

do_predict_proba: Boolean, or int, default=False

If False, models.Model.fit() will call models.Model.model.predict()
If True, it will call models.Model.model.predict_proba(), and the values in all columns will be used as the actual prediction values
If do_predict_proba is an int, models.Model.fit() will call models.Model.model.predict_proba(), as is the case when do_predict_proba is True, but the int supplied as do_predict_proba declares the column index to use as the actual prediction values
For example, for a model to call the predict method, do_predict_proba=False (default). For a model to call the predict_proba method, and use all of the class probabilities, do_predict_proba=True. To call the predict_proba method, and use the class probabilities in the first column, do_predict_proba=0. To use the second column (index 1) of the result, do_predict_proba=1 - This often corresponds to the positive class’s probabilities in binary classification problems. To use the third column do_predict_proba=2, and so on
See the notes for the do_predict_proba parameter in the documentation of environment.Environment for additional usage notes

target_metric: Tuple

Used by some child classes (like XGBoostModel) to provide validation data to model.fit()

metrics: Dict

Used by some child classes (like XGBoostModel) to provide validation data to model.fit()

Methods

`fit`(self)	Train model according to `extra_params['fit']` (if appropriate) on training data
`initialize_model`(self)	Create an instance of a model using `model_initializer`, with `initialization_params` as input
`predict`(self, input_data)	Generate model predictions for input_data

initialize_model(self)¶: Create an instance of a model using model_initializer, with initialization_params as input

fit(self)¶: Train model according to extra_params['fit'] (if appropriate) on training data

predict(self, input_data)¶

Generate model predictions for input_data

Parameters

input_data: Array-like: Data containing the same number of features as were trained on, for which the model will predict output values

Returns

prediction: Array-like: Output predictions made by the model, using input_data

class hyperparameter_hunter.models.XGBoostModel(model_initializer, initialization_params, extra_params, train_input=None, train_target=None, validation_input=None, validation_target=None, do_predict_proba=False, target_metric=None, metrics=None)¶

Bases: hyperparameter_hunter.models.Model

A special Model class for handling XGBoost algorithms. Consider documentation to be identical to that of Model, except where noted

Parameters

model_initializer: :class:`xgboost.sklearn.XGBClassifier`, or :class:`xgboost.sklearn.XGBRegressor`: See Model
initialization_params: See :class:`Model`
extra_params: Dict, default={}: Useful keys: [‘fit’, ‘predict’]. If ‘fit’ is a key with a dict value, its contents will be provided to xgboost.sklearn.XGBModel.fit(), with the exception of the following: [‘X’, ‘y’]. If any of the aforementioned keys are in extra_params['fit'] or if extra_params['fit'] is provided, but is not a dict, an Exception will be raised
train_input: See :class:`Model`
train_target: See :class:`Model`
validation_input: See :class:`Model`
validation_target: See :class:`Model`
do_predict_proba: See :class:`Model`
target_metric: Tuple: Used to determine the ‘eval_metric’ argument to xgboost.sklearn.XGBModel.fit(). See the documentation for XGBoostModel.extra_params for more information
metrics: See :class:`Model`

Methods

`fit`(self)	Train model according to `extra_params['fit']` (if appropriate) on training data
`initialize_model`(self)	Create an instance of a model using `model_initializer`, with `initialization_params` as input
`predict`(self, input_data)	Generate model predictions for input_data

class hyperparameter_hunter.models.KerasModel(model_initializer, initialization_params, extra_params, train_input=None, train_target=None, validation_input=None, validation_target=None, do_predict_proba=False, target_metric=None, metrics=None)¶

Bases: hyperparameter_hunter.models.Model

A special Model class for handling Keras neural networks. Consider documentation to be identical to that of Model, except where noted

Parameters

model_initializer: :class:`keras.wrappers.scikit_learn.KerasClassifier`, or `keras.wrappers.scikit_learn.KerasRegressor`

Expected to implement at least the following methods: 1) __init__, to which initialization_params will usually be provided unless stated otherwise in a child class’s documentation - like KerasModel. 2) fit, to which train_input, and train_target will be provided, in addition to the contents of extra_params['fit'] in some child classes - like XGBoostModel. 3) predict, or predict_proba if applicable, which should accept any array-like input of shape: (<num_samples>, train_input.shape[1])

initialization_params: Dict containing `build_fn`

A dictionary containing the single key: build_fn, which is a callable function that returns a compiled Keras model

extra_params: Dict, default={}

The parameters expected to be passed to the extra methods of the compiled Keras model. Such methods include (but are not limited to) fit, predict, and predict_proba. Some of the common parameters given here include epochs, batch_size, and callbacks

train_input: `pandas.DataFrame`

The model’s training input data

train_target: `pandas.DataFrame`

The true labels corresponding to the rows of train_input

validation_input: `pandas.DataFrame`, or None

The model’s validation input data to evaluate performance during fitting

validation_target: `pandas.DataFrame`, or None

The true labels corresponding to the rows of validation_input

do_predict_proba: Boolean, or int, default=False

If False, models.Model.fit() will call models.Model.model.predict()
If True, it will call models.Model.model.predict_proba(), and the values in all columns will be used as the actual prediction values
If int, models.Model.fit() will call models.Model.model.predict_proba(), as is the case when do_predict_proba is True, but the int supplied as do_predict_proba declares the column index to use as the actual prediction values

For example, for a model to call the predict method, do_predict_proba=False (default). For a model to call the predict_proba method, and use all of the class probabilities, do_predict_proba=True. To call the predict_proba method, and use the class probabilities in the first column, do_predict_proba=0. To use the second column (index 1) of the result, do_predict_proba=1 - This often corresponds to the positive class’s probabilities in binary classification problems. To use the third column do_predict_proba=2, and so on.

See the notes for the do_predict_proba parameter of Environment for additional usage notes

target_metric: Tuple

Used by some child classes (like XGBoostModel) to provide validation data to model.fit()

metrics: Dict

Used by some child classes (like XGBoostModel) to provide validation data to model.fit()

Methods

`fit`(self)	Train model according to `extra_params['fit']` (if appropriate) on training data
`get_input_shape`(self[, get_dim])	Calculate the shape of the input that should be expected by the model
`initialize_keras_neural_network`(self)	Initialize Keras model wrapper (`model_initializer`) with `initialization_params`, `extra_params`, and validation_data if it can be found, as well as the input dimensions for the model
`initialize_model`(self)	Create an instance of a model using `model_initializer`, with `initialization_params` as input
`predict`(self, input_data)	Generate model predictions for input_data
`validate_keras_params`(self)	Ensure provided input parameters are properly formatted

initialize_model(self)¶: Create an instance of a model using model_initializer, with initialization_params as input

fit(self)¶: Train model according to extra_params['fit'] (if appropriate) on training data

get_input_shape(self, get_dim=False)¶

Calculate the shape of the input that should be expected by the model

Parameters

get_dim: Boolean, default=False: If True, instead of returning an input_shape tuple, an input_dim scalar will be returned

Returns

Tuple, or scalar: If get_dim=False, an input_shape tuple. Else, an input_dim scalar

validate_keras_params(self)¶: Ensure provided input parameters are properly formatted

initialize_keras_neural_network(self)¶: Initialize Keras model wrapper (model_initializer) with initialization_params, extra_params, and validation_data if it can be found, as well as the input dimensions for the model

hyperparameter_hunter.recorders module¶

This module handles recording and properly formatting the various result files requested for a completed Experiment. Coincidentally, if a particular result file was blacklisted by the active Environment, that is also handled here

Related¶

hyperparameter_hunter.experiments: This is the intended user of the contents of hyperparameter_hunter.recorders

class hyperparameter_hunter.recorders.BaseRecorder¶

Bases: object

Base class for other classes that record various Experiment result files. Critical attributes of the descendants of :class`recorders.BaseRecorder` are set here, enabling them to function properly

Returns

None: If result_path is None, which means the present result file was blacklisted by the active Environment

Raises

EnvironmentInactiveError: If settings.G.Env is None
EnvironmentInvalidError: If any of the following occur: 1) settings.G.Env does not have an attribute named ‘result_paths’, 2) settings.G.Env.result_paths does not contain the current result_path_key, 3) settings.G.Env.current_task is None

Attributes

required_attributes: Return attributes of the current Experiment that are necessary to properly record result.
result_path_key: Return key from environment.Environment.result_paths, corresponding to the

Methods

`format_result`(self)	Set `BaseRecorder.result` to the final result object to be saved by `BaseRecorder.save_result()`
`save_result`(self)	Save `BaseRecorder.result` to `BaseRecorder.result_path`, or elsewhere if special case

abstract property result_path_key¶: Return key from environment.Environment.result_paths, corresponding to the target record

abstract property required_attributes¶: Return attributes of the current Experiment that are necessary to properly record result. Specifically, BaseRecorder fetches the attrs via settings.G.Env.current_task, which can also be regarded as environment.Environment.current_task, but this is an implementation detail. It is simpler to use experiments.BaseExperiment, and its appropriate descendants as a reference for acceptable values of required_attributes

abstract format_result(self)¶: Set BaseRecorder.result to the final result object to be saved by BaseRecorder.save_result()

abstract save_result(self)¶: Save BaseRecorder.result to BaseRecorder.result_path, or elsewhere if special case

class hyperparameter_hunter.recorders.RecorderList(file_blacklist=None, extra_recorders=None)¶

Bases: object

Collection of BaseRecorder subclasses to facilitate executing group methods

Parameters

file_blacklist: List, or None, default=None: If list, used to reject any elements of RecorderList.recorders whose BaseRecorder.result_path_key is in file_blacklist
extra_recorders: List, None, default=None: If not None, may be a list whose values are tuples of (<recorders.BaseRecorder descendant>, <str result_path>). The result_path str should be a path relative to results_path, specifying the directory/file in which the product of the custom recorder will be saved. The contents of extra_recorders are appended to the list of default recorders and used to create/update result files for an Experiment. The contents of extra_recorders are blacklisted in the same way as normal recorders. That is, if file_blacklist contains the result_path_key of a recorder in extra_recorders, that recorder is blacklisted

Methods

`format_result`(self)	Execute `format_result()` for all classes in `recorders`
`save_result`(self)	Execute `save_result()` for all classes in `recorders`

format_result(self)¶: Execute format_result() for all classes in recorders

save_result(self)¶

Execute save_result() for all classes in recorders

Notes

When iterating through recorders and calling save_result(), a check is performed for exit_code. Children classes of BaseRecorder are NOT expected to explicitly return a value in their save_result(). However, if a value is returned and exit_code == ‘break’, the result-saving loop will be broken, and no further results will be saved. In practice, this is only performed for the sake of DescriptionRecorder.save_result(), which has the additional quality of being able to prevent any other result files from being saved if the result of DescriptionRecorder.do_full_save() returns False when given the formatted DescriptionRecorder.result. This can be useful when there are storage constraints, because it ensures that essential data - including keys and the results of the experiment - are saved (to ensure the experiment is not duplicated, and to enable optimization protocol learning), while extra results like Predictions are not saved

class hyperparameter_hunter.recorders.DescriptionRecorder¶

Bases: hyperparameter_hunter.recorders.BaseRecorder

Base class for other classes that record various Experiment result files. Critical attributes of the descendants of :class`recorders.BaseRecorder` are set here, enabling them to function properly

Returns

None: If result_path is None, which means the present result file was blacklisted by the active Environment

Raises

EnvironmentInactiveError: If settings.G.Env is None
EnvironmentInvalidError: If any of the following occur: 1) settings.G.Env does not have an attribute named ‘result_paths’, 2) settings.G.Env.result_paths does not contain the current result_path_key, 3) settings.G.Env.current_task is None

Methods

`format_result`(self)	Format an OrderedDict containing the Experiment’s identifying attributes, results, hyperparameters used, and other stats or information that may be useful
`save_result`(self)	Save the Experiment description as a .json file, named after `experiment_id`.

result_path_key = 'description'¶

required_attributes = ['experiment_id', 'hyperparameter_key', 'cross_experiment_key', 'last_evaluation_results', 'stat_aggregates', 'source_script', 'notes', 'model_initializer', 'do_full_save', 'model', 'algorithm_name', 'module_name']¶

format_result(self)¶: Format an OrderedDict containing the Experiment’s identifying attributes, results, hyperparameters used, and other stats or information that may be useful

save_result(self)¶

Save the Experiment description as a .json file, named after experiment_id. If do_full_save is a callable and returns False when given the description object, the result recording loop will be broken, and the remaining result files will not be saved

Returns

‘break’: This string will be returned if do_full_save is a callable and returns False when given the description object. This is the signal for recorders.RecorderList to stop recording result files

class hyperparameter_hunter.recorders.HeartbeatRecorder¶

Bases: hyperparameter_hunter.recorders.BaseRecorder

Base class for other classes that record various Experiment result files. Critical attributes of the descendants of :class`recorders.BaseRecorder` are set here, enabling them to function properly

Returns

None: If result_path is None, which means the present result file was blacklisted by the active Environment

Raises

EnvironmentInactiveError: If settings.G.Env is None
EnvironmentInvalidError: If any of the following occur: 1) settings.G.Env does not have an attribute named ‘result_paths’, 2) settings.G.Env.result_paths does not contain the current result_path_key, 3) settings.G.Env.current_task is None

Methods

`format_result`(self)	Do nothing
`save_result`(self)	Copy global Heartbeat log to results dir as .log file named for `experiment_id`

result_path_key = 'heartbeat'¶

required_attributes = ['experiment_id']¶

format_result(self)¶: Do nothing

save_result(self)¶: Copy global Heartbeat log to results dir as .log file named for experiment_id

class hyperparameter_hunter.recorders.PredictionsHoldoutRecorder¶

Bases: hyperparameter_hunter.recorders.BaseRecorder

Base class for other classes that record various Experiment result files. Critical attributes of the descendants of :class`recorders.BaseRecorder` are set here, enabling them to function properly

Returns

None: If result_path is None, which means the present result file was blacklisted by the active Environment

Raises

EnvironmentInactiveError: If settings.G.Env is None
EnvironmentInvalidError: If any of the following occur: 1) settings.G.Env does not have an attribute named ‘result_paths’, 2) settings.G.Env.result_paths does not contain the current result_path_key, 3) settings.G.Env.current_task is None

Methods

`format_result`(self)	Format predictions according to the callable `prediction_formatter`
`save_result`(self)	Save holdout predictions to a .csv file, named after `experiment_id`

result_path_key = 'predictions_holdout'¶

required_attributes = ['data_holdout', 'holdout_dataset', 'experiment_id', 'prediction_formatter', 'target_column', 'id_column', 'to_csv_params']¶

format_result(self)¶: Format predictions according to the callable prediction_formatter

save_result(self)¶: Save holdout predictions to a .csv file, named after experiment_id

class hyperparameter_hunter.recorders.PredictionsOOFRecorder¶

Bases: hyperparameter_hunter.recorders.BaseRecorder

Base class for other classes that record various Experiment result files. Critical attributes of the descendants of :class`recorders.BaseRecorder` are set here, enabling them to function properly

Returns

None: If result_path is None, which means the present result file was blacklisted by the active Environment

Raises

EnvironmentInactiveError: If settings.G.Env is None
EnvironmentInvalidError: If any of the following occur: 1) settings.G.Env does not have an attribute named ‘result_paths’, 2) settings.G.Env.result_paths does not contain the current result_path_key, 3) settings.G.Env.current_task is None

Methods

`format_result`(self)	Format predictions according to the callable `prediction_formatter`
`save_result`(self)	Save out-of-fold predictions to a .csv file, named after `experiment_id`

result_path_key = 'predictions_oof'¶

required_attributes = ['data_oof', 'train_dataset', 'experiment_id', 'prediction_formatter', 'target_column', 'id_column', 'to_csv_params']¶

format_result(self)¶: Format predictions according to the callable prediction_formatter

save_result(self)¶: Save out-of-fold predictions to a .csv file, named after experiment_id

class hyperparameter_hunter.recorders.PredictionsTestRecorder¶

Bases: hyperparameter_hunter.recorders.BaseRecorder

Base class for other classes that record various Experiment result files. Critical attributes of the descendants of :class`recorders.BaseRecorder` are set here, enabling them to function properly

Returns

None: If result_path is None, which means the present result file was blacklisted by the active Environment

Raises

EnvironmentInactiveError: If settings.G.Env is None
EnvironmentInvalidError: If any of the following occur: 1) settings.G.Env does not have an attribute named ‘result_paths’, 2) settings.G.Env.result_paths does not contain the current result_path_key, 3) settings.G.Env.current_task is None

Methods

`format_result`(self)	Format predictions according to the callable `prediction_formatter`
`save_result`(self)	Save test predictions to a .csv file, named after `experiment_id`

result_path_key = 'predictions_test'¶

required_attributes = ['data_test', 'test_dataset', 'experiment_id', 'prediction_formatter', 'target_column', 'id_column', 'to_csv_params']¶

format_result(self)¶: Format predictions according to the callable prediction_formatter

save_result(self)¶: Save test predictions to a .csv file, named after experiment_id

class hyperparameter_hunter.recorders.TestedKeyRecorder¶

Bases: hyperparameter_hunter.recorders.BaseRecorder

Base class for other classes that record various Experiment result files. Critical attributes of the descendants of :class`recorders.BaseRecorder` are set here, enabling them to function properly

Returns

None: If result_path is None, which means the present result file was blacklisted by the active Environment

Raises

EnvironmentInactiveError: If settings.G.Env is None
EnvironmentInvalidError: If any of the following occur: 1) settings.G.Env does not have an attribute named ‘result_paths’, 2) settings.G.Env.result_paths does not contain the current result_path_key, 3) settings.G.Env.current_task is None

Methods

`format_result`(self)	Do nothing
`save_result`(self)	Save cross-experiment, and hyperparameter keys, and update their tested keys entries

result_path_key = 'tested_keys'¶

required_attributes = ['experiment_id', 'hyperparameter_key', 'cross_experiment_key']¶

format_result(self)¶: Do nothing

save_result(self)¶: Save cross-experiment, and hyperparameter keys, and update their tested keys entries

class hyperparameter_hunter.recorders.LeaderboardEntryRecorder¶

Bases: hyperparameter_hunter.recorders.BaseRecorder

Base class for other classes that record various Experiment result files. Critical attributes of the descendants of :class`recorders.BaseRecorder` are set here, enabling them to function properly

Returns

None: If result_path is None, which means the present result file was blacklisted by the active Environment

Raises

EnvironmentInactiveError: If settings.G.Env is None
EnvironmentInvalidError: If any of the following occur: 1) settings.G.Env does not have an attribute named ‘result_paths’, 2) settings.G.Env.result_paths does not contain the current result_path_key, 3) settings.G.Env.current_task is None

Methods

`format_result`(self)	Read existing global leaderboard, add current entry, then sort the updated leaderboard
`save_result`(self)	Save the updated leaderboard file

result_path_key = 'tested_keys'¶

required_attributes = ['result_paths', 'current_task', 'target_metric', 'metrics']¶

format_result(self)¶: Read existing global leaderboard, add current entry, then sort the updated leaderboard

save_result(self)¶: Save the updated leaderboard file

class hyperparameter_hunter.recorders.UnsortedIDLeaderboardRecorder¶

Bases: hyperparameter_hunter.recorders.BaseRecorder

Base class for other classes that record various Experiment result files. Critical attributes of the descendants of :class`recorders.BaseRecorder` are set here, enabling them to function properly

Returns

None: If result_path is None, which means the present result file was blacklisted by the active Environment

Raises

EnvironmentInactiveError: If settings.G.Env is None
EnvironmentInvalidError: If any of the following occur: 1) settings.G.Env does not have an attribute named ‘result_paths’, 2) settings.G.Env.result_paths does not contain the current result_path_key, 3) settings.G.Env.current_task is None

Methods

`format_result`(self)	Read existing global leaderboard, add current entry, then sort the updated leaderboard
`save_result`(self)	Save the updated leaderboard file

result_path_key = 'unsorted_id_leaderboard'¶

required_attributes = ['result_paths', 'current_task', 'target_metric', 'metrics']¶

format_result(self)¶: Read existing global leaderboard, add current entry, then sort the updated leaderboard

save_result(self)¶: Save the updated leaderboard file

class hyperparameter_hunter.recorders.YAMLDescriptionRecorder¶

Bases: hyperparameter_hunter.recorders.BaseRecorder

Base class for other classes that record various Experiment result files. Critical attributes of the descendants of :class`recorders.BaseRecorder` are set here, enabling them to function properly

Returns

None: If result_path is None, which means the present result file was blacklisted by the active Environment

Raises

EnvironmentInactiveError: If settings.G.Env is None
EnvironmentInvalidError: If any of the following occur: 1) settings.G.Env does not have an attribute named ‘result_paths’, 2) settings.G.Env.result_paths does not contain the current result_path_key, 3) settings.G.Env.current_task is None

Methods

`format_result`(self)	Set `BaseRecorder.result` to the final result object to be saved by `BaseRecorder.save_result()`
`save_result`(self)	Save `BaseRecorder.result` to `BaseRecorder.result_path`, or elsewhere if special case

result_path_key = 'yaml_description'¶

required_attributes = ['result_paths', 'experiment_id']¶

format_result(self)¶: Set BaseRecorder.result to the final result object to be saved by BaseRecorder.save_result()

save_result(self)¶: Save BaseRecorder.result to BaseRecorder.result_path, or elsewhere if special case

hyperparameter_hunter.reporting module¶

class hyperparameter_hunter.reporting.ReportingHandler(heartbeat_path=None, float_format='{:.5f}', console_params=None, heartbeat_params=None, add_frame=False)¶

Bases: object

Class in control of logging methods, log formatting, and initializing Experiment logging

Parameters

heartbeat_path: Str path, or None, default=None: If string and valid heartbeat path, logging messages will also be saved in this file
float_format: String, default=’{:.5f}’: If not default, must be a valid formatting string for floating point values. If invalid, default will be used
console_params: Dict, or None, default=None: Parameters passed to _configure_console_handler()
heartbeat_params: Dict, or None, default=None: Parameters passed to _configure_heartbeat_handler()
add_frame: Boolean, default=False: If True, whenever log() is called, the source of the call will be prepended to the content being logged

Methods

`debug`(self, content, \\kwargs)	Placeholder method before proper initialization
`log`(self, content, \\kwargs)	Placeholder method before proper initialization
`warn`(self, content, \\kwargs)	Placeholder method before proper initialization

log(self, content, **kwargs)¶: Placeholder method before proper initialization

debug(self, content, **kwargs)¶: Placeholder method before proper initialization

warn(self, content, **kwargs)¶: Placeholder method before proper initialization

hyperparameter_hunter.reporting.clean_parameter_names(parameter_names:list) → List[str]¶

Remove unnecessary prefixes or characters from the names of search space dimensions

Parameters

parameter_names: List: Names of the dimensions in a hyperparameter search Space object. Values are usually tuples

Returns

names: List[str]: Cleaned parameter_names, containing stringified values to facilitate logging

hyperparameter_hunter.reporting.get_param_column_sizes(space:list, names:List[str]) → List[int]¶

Determine maximum column sizes for displaying values of each hyperparameter in space

Parameters

space: List: Hyperparameter search space dimensions for the current Optimization Protocol
names: List[str]: Cleaned hyperparameter dimension names

Returns

sizes: List[int]: Column sizes for each of the hyperparameters in names

class hyperparameter_hunter.reporting.OptimizationReporter(space: list, verbose=1, show_experiment_id=8, do_maximize=True)¶

Bases: object

A MixIn class for reporting the results of hyperparameter optimization rounds

Parameters

space: List: Hyperparameter search space dimensions for the current Optimization Protocol
verbose: Int in [0, 1, 2], default=1: If 0, all but critical logging is silenced. If 1, normal logging is performed. If 2, detailed logging is performed
show_experiment_id: Int, or Boolean, default=8: If True, the experiment_id will be printed in each result row. If False, it will not. If int, the first show_experiment_id-many characters of each experiment_id will be printed in each row
do_maximize: Boolean, default=True: If False, smaller metric values will be considered preferred and will be highlighted to stand out. Else larger metric values will be treated as preferred

Methods

`print_header`(self, header, line)	Utility to perform actual printing of headers given formatted inputs
`print_optimization_header`(self)	Print a header signifying that Optimization rounds are starting
`print_random_points_header`(self)	Print a header signifying that random point evaluation rounds are starting
`print_result`(self, hyperparameters, evaluation)	Print a row containing the results of an Experiment just executed
`print_saved_results_header`(self)	Print a header signifying that saved Experiment results are being read
`print_summary`(self)	Print a summary of the results of hyperparameter optimization upon completion
`reset_timer`(self)	Set `start_time`, and `last_round` to the current time

print_saved_results_header(self)¶: Print a header signifying that saved Experiment results are being read

print_random_points_header(self)¶: Print a header signifying that random point evaluation rounds are starting

print_optimization_header(self)¶: Print a header signifying that Optimization rounds are starting

print_header(self, header, line)¶

Utility to perform actual printing of headers given formatted inputs

Parameters

header: String: Specifies the stage of optimization being entered, and the type of results to follow
line: String: The underlining to follow header

print_result(self, hyperparameters, evaluation, experiment_id=None)¶

Print a row containing the results of an Experiment just executed

Parameters

hyperparameters: List: List of hyperparameter values in the same order as parameter_names
evaluation: Float: An evaluation of the performance of hyperparameters
experiment_id: Str, or None, default=None: If not None, should be a string that is the UUID of the Experiment

reset_timer(self)¶: Set start_time, and last_round to the current time

print_summary(self)¶: Print a summary of the results of hyperparameter optimization upon completion

hyperparameter_hunter.reporting.format_frame_source(previous_frame, **kwargs)¶

Construct a string describing the location at which a call was made

Parameters

previous_frame: Frame: A frame depicting the location at which a call was made
**kwargs: Dict: Any additional kwargs to supply to reporting.stringify_frame_source()

Returns

The stringified frame source information of previous_frame

hyperparameter_hunter.reporting.stringify_frame_source(src_file, src_line_no, src_func, src_class, add_line_no=True, max_line_no_size=4, total_max_size=80)¶

Construct a string that neatly displays the location in the code at which a call was made

Parameters

src_file: Str: A filepath
src_line_no: Int: The line number in src_file at which the call was made
src_func: Str: The name of the function in src_file in which the call was made
src_class: Str, or None: If not None, the class in src_file in which the call was made
add_line_no: Boolean, default=False: If True, the line number will be included in the source_content result
max_line_no_size: Int, default=4: Total number (including padding) of characters to be occupied by src_line_no. For example, if src_line_no`=32, and `max_line_no_size`=4, `src_line_no will be padded to become ‘32 ‘ in order to occupy four characters
total_max_size: Int, default=80: Total number (including padding) of characters to be occupied by the source_content result

Returns

source_content: Str: A formatted string containing the location in the code at which a call was made

Examples

>>> stringify_frame_source("reporting.py", 570, "stringify_frame_source", None)
'570  - reporting.stringify_frame_source()                                       '
>>> stringify_frame_source("reporting.py", 12, "bar", "Foo")
'12   - reporting.Foo.bar()                                                      '
>>> stringify_frame_source("reporting.py", 12, "bar", "Foo", add_line_no=False)
'reporting.Foo.bar()                                                             '
>>> stringify_frame_source("reporting.py", 12, "bar", "Foo", total_max_size=60)
'12   - reporting.Foo.bar()                                  '

hyperparameter_hunter.reporting.add_time_to_content(content, add_time=False)¶

Construct a string containing the original content, in addition to the current time

Parameters

content: Str: The original string, to which the current time will be concatenated
add_time: Boolean, default=False: If True, the current time will be concatenated onto the end of content

Returns

content: Str: Str containing original content, along with current time, and additional formatting

hyperparameter_hunter.reporting.format_fold_run(rep=None, fold=None, run=None, mode='concise')¶

Construct a string to display the repetition, fold, and run currently being executed

Parameters

rep: Int, or None, default=None: The repetition number currently being executed
fold: Int, or None, default=None: The fold number currently being executed
run: Int, or None, default=None: The run number currently being executed
mode: {“concise”, “verbose”}, default=”concise”: If “concise”, the result will contain abbreviations for rep/fold/run

Returns

content: Str: A clean display of the current repetition/fold/run

Examples

>>> format_fold_run(rep=0, fold=3, run=2, mode="concise")
'R0-f3-r2'
>>> format_fold_run(rep=0, fold=3, run=2, mode="verbose")
'Rep-Fold-Run: 0-3-2'
>>> format_fold_run(rep=0, fold=3, run="*", mode="concise")
'R0-f3-r*'
>>> format_fold_run(rep=0, fold=3, run=2, mode="foo")
Traceback (most recent call last):
    File "reporting.py", line ?, in format_fold_run
ValueError: Received invalid mode value: 'foo'

hyperparameter_hunter.reporting.format_evaluation(results, separator=' | ', float_format='{:.5f}')¶

Construct a string to neatly display the results of a model evaluation

Parameters

results: Dict: The results of a model evaluation, in which keys represent the dataset type evaluated, and values are dicts containing metrics as keys, and metric values as values
separator: Str, default=’ | ‘: The string used to join all the metric values into a single string
float_format: Str, default=’{:.5f}’: A python string float formatter, applied to floating metric values

Returns

content: Str: The model’s evaluation results

hyperparameter_hunter.result_reader module¶

hyperparameter_hunter.result_reader.finder_selector(module_name)¶

Selects the appropriate ResultFinder to use for module_name

Parameters

module_name: String: Module from whence the algorithm being used came

Returns

Uninitialized ResultFinder, or one of its descendants

Examples

>>> assert finder_selector("Keras") == KerasResultFinder
>>> assert finder_selector("xgboost") == ResultFinder
>>> assert finder_selector("lightgbm") == ResultFinder

hyperparameter_hunter.result_reader.update_match_status(target_attr='match_status') → <built-in function callable>¶

Build a decorator to apply to class instance methods to record inputs/outputs

Parameters

target_attr: String, default=”match_status”: Name of dict attribute in the class instance of the decorated method, in which the decorated method’s inputs and outputs should be recorded. This attribute should be predefined and documented by the class whose method is being decorated

Returns

Callable: Decorator that will save the decorated method’s inputs and outputs to the attribute dict named by target_attr. Decorator assumes that the method will receive at least three positional arguments: “exp_id”, “params”, and “score”. “exp_id” is used as the key added to target_attr, with a dict value, which includes the latter two positional arguments. Each time the decorator is invoked with an “exp_id”, an additional key is added to its dict that is the name of the decorated method, and whose value is the decorated method’s output

See also

ResultFinder: Decorates “does_match…” methods using update_match_status in order to keep a detailed record of the full pool of candidate Experiments in ResultFinder.match_status

Examples

>>> class X:
...     def __init__(self):
...         self.match_status = dict()
...     @update_match_status()
...     def method_a(self, exp_id, params, score):
...         return True
...     @update_match_status()
...     def method_b(self, exp_id, params, score):
...         return False
>>> x = X()
>>> x.match_status
{}
>>> assert x.method_a("foo", None, 0.8) is True
>>> x.match_status  # doctest: +NORMALIZE_WHITESPACE
{'foo': {'params': None,
         'score': 0.8,
         'method_a': True}}
>>> assert x.method_b("foo", None, 0.8) is False
>>> x.match_status  # doctest: +NORMALIZE_WHITESPACE
{'foo': {'params': None,
         'score': 0.8,
         'method_a': True,
         'method_b': False}}
>>> assert x.method_b("bar", "some stuff", 0.5) is False
>>> x.match_status  # doctest: +NORMALIZE_WHITESPACE
{'foo': {'params': None,
         'score': 0.8,
         'method_a': True,
         'method_b': False},
 'bar': {'params': 'some stuff',
         'score': 0.5,
         'method_b': False}}

hyperparameter_hunter.result_reader.does_match_guidelines(candidate_params:dict, space:hyperparameter_hunter.space.space_core.Space, template_params:dict, visitors=(), dims_to_ignore:List[tuple]=None) → bool¶

Check candidate compatibility with template guideline hyperparameters

Parameters

candidate_params: Dict: Candidate Experiment hyperparameters to be compared to template_params after processing
space: Space: Hyperparameter search space constraints for the current template
template_params: Dict: Template hyperparameters to which candidate_params will be compared after processing. Although the name of the function implies that these will all be guideline hyperparameters, this is not a requirement, as any non-guideline hyperparameters will be removed during processing by checking space.names
visitors: Callable, or Tuple[callable] (optional): Extra visit function(s) invoked when remap()-ing both template_params and candidate_params. Can be used to filter out unwanted values, or to pre-process selected values (and more) prior to performing the final compatibility check between the processed candidate_params and guidelines in template_params
dims_to_ignore: List[tuple] (optional): Paths to hyperparameter(s) that should be ignored when comparing candidate_params and template_params. By default, hyperparameters pertaining to verbosity and random states are ignored. Paths should be tuples of the form expected by get_path(). Additionally a path may start with None, which acts as a wildcard, matching any hyperparameters whose terminal path nodes match the value following None. For example, (None, "verbose") would match paths such as ("model_init_params", "a", "verbose") and ("model_extra_params", "b", 2, "verbose")

Returns

Boolean: True if the processed version of candidate_params is equal to the extracted and processed guidelines from template_params. Else, False

hyperparameter_hunter.result_reader.validate_feature_engineer(candidate:Union[dict, hyperparameter_hunter.feature_engineering.FeatureEngineer], template:hyperparameter_hunter.feature_engineering.FeatureEngineer) → Union[bool, dict, hyperparameter_hunter.feature_engineering.FeatureEngineer]¶

Check candidate “feature_engineer” compatibility with template and sanitize candidate. This is mostly a wrapper around validate_fe_steps() to ensure different inputs are handled properly and to return False, rather than raising IncompatibleCandidateError

Parameters

candidate: Dict, or FeatureEngineer: Candidate “feature_engineer” to compare to template. If compatible with template, a sanitized version of candidate will be returned (described below)
template: FeatureEngineer: Template “feature_engineer” to which candidate will be compared after processing

Returns

Boolean, dict, or FeatureEngineer: False if candidate is deemed incompatible with template. Else, a sanitized candidate with reinitialized EngineerStep steps and with RejectedOptional filling in missing Categorical steps that were declared as optional by the template

hyperparameter_hunter.result_reader.validate_fe_steps(candidate:Union[list, hyperparameter_hunter.feature_engineering.FeatureEngineer], template:Union[list, hyperparameter_hunter.feature_engineering.FeatureEngineer]) → list¶

Check candidate “feature_engineer” steps compatibility with template and sanitize candidate

Parameters

candidate: List, or FeatureEngineer: Candidate “feature_engineer” steps to compare to template. If compatible with template, a sanitized version of candidate will be returned (described below)
template: List, or FeatureEngineer: Template “feature_engineer” steps to which candidate will be compared. template is also used to sanitize candidate (described below)

Returns

List: If candidate is compatible with template, returns a list resembling candidate, with the following changes: 1) all step dicts in candidate are reinitialized to proper EngineerStep instances; and 2) wherever candidate was missing a step that was tagged as optional in template, RejectedOptional is added. In the end, if a list is returned, it is built from candidate, guaranteed to be the same length as template and guaranteed to contain only EngineerStep and RejectedOptional instances

Raises

IncompatibleCandidateError

If candidate is incompatible with template. candidate may be incompatible with template for any of the following reasons:

candidate has more steps than template

2. candidate has a step that differs from a concrete (non-Categorical) template step 2. candidate has a step that differs from a concrete (non-Categorical) template step 3. candidate has a step that does not fit in a Categorical template step 4. candidate is missing a concrete step in template 5. candidate is missing a non-optional Categorical step in template

class hyperparameter_hunter.result_reader.ResultFinder(algorithm_name, module_name, cross_experiment_key, target_metric, space, leaderboard_path, descriptions_dir, model_params, sort=None)¶

Bases: object

Locate saved Experiments that are compatible with the given constraints

Parameters

algorithm_name: String

Name of the algorithm whose hyperparameters are being optimized

module_name: String

Name of the module from whence the algorithm being used came

cross_experiment_key: String

hyperparameter_hunter.environment.Environment.cross_experiment_key produced by the current Environment

target_metric: Tuple

Path denoting the metric to be used. The first value should be one of {“oof”, “holdout”, “in_fold”}, and the second value should be the name of a metric supplied in hyperparameter_hunter.environment.Environment.metrics_params

space: Space

Instance of Space, defining hyperparameter search space constraints

leaderboard_path: String

Path to a leaderboard file, whose listed Experiments will be tested for compatibility

descriptions_dir: String

Path to a directory containing the description files of saved Experiments

model_params: Dict

All hyperparameters for the model, both concrete and choice. Common keys include “model_init_params” and “model_extra_params”, both of which can be pointers to dicts of hyperparameters. Additionally, “feature_engineer” may be included with an instance of FeatureEngineer

sort: {“target_asc”, “target_desc”, “chronological”, “reverse_chronological”}, or int

… Experimental… How to sort the experiment results that fit within the given constraints

“target_asc”: Sort from experiments with the lowest value for target_metric to those with the greatest
“target_desc”: Sort from experiments with the highest value for target_metric to those with the lowest
“chronological”: Sort from oldest experiments to newest
“reverse_chronological”: Sort from newest experiments to oldest
int: Random seed with which to shuffle experiments

See also

update_match_status(): Used to decorate “does_match…” methods in order to keep a detailed record of the full pool of candidate Experiments in match_status. Aside from being used to compile the list of finalist similar_experiments, match_status is helpful for debugging purposes, specifically figuring out which aspects of a candidate are incompatible with the template

Attributes

similar_experiments: List[Tuple[dict, Number, str]]

Candidate saved Experiment results that are fully compatible with the template hyperparameters. Each value is a tuple triple of (<hyperparameters>, <target_metric value>, <candidate experiment_id>). similar_experiments is composed of the “finalists” from match_status

match_status: Dict[str, dict]

Record of the hyperparameters and target_metric values for all discovered Experiments, keyed by values of experiment_ids. Each value is a dict containing two keys: “params” (hyperparameter dict), and “score” (target_metric value number). In addition to these two keys, a key may be added for every ResultFinder method decorated by update_match_status(). The exact key will be the name of the method invoked (which will always start with “does_match”, followed by the name of the hyperparameter group being checked). The value for each “does_match…” key is the value returned by that method, whose truthiness dictates whether the candidate Experiment was a successful match to the template hyperparameters for that group. For example, a match_status entry for one Experiment could look like this:

{
    "params": <dict of hyperparameters for candidate>,
    "score": 0.42,  # `target_metric` value for candidate Experiment
    "does_match_init_params_space": True,
    "does_match_init_params_guidelines": False,
    "does_match_extra_params_space": False,
    "does_match_extra_params_guidelines": True,
    "does_match_feature_engineer": <`FeatureEngineer`>,  # Still truthy
}

Note that “model_init_params” and “model_extra_params” both check the compatibility of “space” choices and concrete “guidelines” separately. Conversely, “feature_engineer” is checked in its entirety by the single does_match_feature_engineer(). Also note that “does_match…” values are not restricted to booleans. For instance, “does_match_feature_engineer” may be set to a reinitialized FeatureEngineer, which is still truthy, even though it’s not True. If all of the “does_match…” keys have truthy values, the candidate is a full match and is added to similar_experiments

Methods

find(self)

Execute full result-finding workflow, populating similar_experiments

property experiment_ids¶

Experiment IDs in the target Leaderboard that match algorithm_name and cross_experiment_key

Returns

List[str]: All saved Experiment IDs listed in the Leaderboard at leaderboard_path that match the algorithm_name and cross_experiment_key of the template

property mini_spaces¶

Separate space into subspaces based on model_params keys

Returns

Dict[str, Space]: Dict of subspaces, wherein keys are all keys of model_params. Each key’s corresponding value is a filtered subspace, containing all the dimensions in space whose name tuples start with that key. Keys will usually be one of the core hyperparameter group names (“model_init_params”, “model_extra_params”, “feature_engineer”, “feature_selector”)

Examples

>>> from hyperparameter_hunter import Integer
>>> def es_0(all_inputs):
...     return all_inputs
>>> def es_1(all_inputs):
...     return all_inputs
>>> def es_2(all_inputs):
...     return all_inputs
>>> s = Space([
...     Integer(900, 1500, name=("model_init_params", "max_iter")),
...     Categorical(["svd", "cholesky", "lsgr"], name=("model_init_params", "solver")),
...     Categorical([es_1, es_2], name=("feature_engineer", "steps", 1)),
... ])
>>> rf = ResultFinder(
...     "a", "b", "c", ("oof", "d"), space=s, leaderboard_path="e", descriptions_dir="f",
...     model_params=dict(
...         model_init_params=dict(
...             max_iter=s.dimensions[0], normalize=True, solver=s.dimensions[1],
...         ),
...         feature_engineer=FeatureEngineer([es_0, s.dimensions[2]]),
...     ),
... )
>>> rf.mini_spaces  # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
{'model_init_params': Space([Integer(low=900, high=1500),
                             Categorical(categories=('svd', 'cholesky', 'lsgr'))]),
 'feature_engineer': Space([Categorical(categories=(<function es_1 at ...>,
                                                    <function es_2 at ...>))])}

find(self)¶

Execute full result-finding workflow, populating similar_experiments

See also

update_match_status(): Used to decorate “does_match…” methods in order to keep a detailed record of the full pool of candidate Experiments in match_status. Aside from being used to compile the list of finalist similar_experiments, match_status is helpful for debugging purposes, specifically figuring out which aspects of a candidate are incompatible with the template
does_match_feature_engineer(): Performs special functionality beyond that of the other “does_match…” methods, namely providing an updated “feature_engineer” value for compatible candidates to use. Specifics are documented in does_match_feature_engineer()

does_match_feature_engineer(self, exp_id, params, score) → Union[bool, dict, hyperparameter_hunter.feature_engineering.FeatureEngineer]¶

Check candidate compatibility with feature_engineer template guidelines and space choices. This method is different from the other “does_match…” methods in two important aspects:

It checks both guidelines and choices in a single method
It returns an updated feature_engineer for compatible candidates, rather than True

Parameters

exp_id: String: Candidate Experiment ID
params: Dict: Candidate “feature_engineer” to compare to the template in model_params. This should always be a dict, not an instance of FeatureEngineer, which is not the case for the template “feature_engineer” in model_params
score: Number: Value of the candidate Experiment’s target metric

Returns

Boolean, dict, or FeatureEngineer: Expanding on the second difference noted in the description, False will still be returned if the candidate is deemed incompatible with the template (as is the case with the other “does_match…” methods). The return value differs with compatible candidates in order to provide a feature_engineer with reinitialized EngineerStep steps and to fill in missing Categorical steps that were declared as optional by the template. This updated feature_engineer is the value that then gets included in the candidate’s similar_experiments entry (assuming candidate is a full match)

does_match_init_params_space(self, exp_id, params, score) → bool¶

Check candidate compatibility with model_init_params template space choices

Parameters

exp_id: String: Candidate Experiment ID
params: Dict: Candidate “model_init_params” to compare to the template in model_params
score: Number: Value of the candidate Experiment’s target metric

Returns

Boolean: True if candidate params fit in model_init_params space choices. Else, False

does_match_init_params_guidelines(self, exp_id, params, score, template_params=None) → bool¶

Check candidate compatibility with model_init_params template guidelines

Parameters

exp_id: String: Candidate Experiment ID
params: Dict: Candidate “model_init_params” to compare to the template in model_params
score: Number: Value of the candidate Experiment’s target metric
template_params: Dict (optional): If given, used as the template hyperparameters against which to compare candidate params, rather than the standard guideline template of the “model_init_params” in model_params. This is used by does_match_init_params_guidelines_multi()

Returns

Boolean: True if candidate params match model_init_params guidelines. Else, False

Notes

Template hyperparameters are generally considered “guidelines” if they are declared as concrete values, rather than space choices present in space

does_match_init_params_guidelines_multi(self, exp_id, params, score, location) → bool¶

Check candidate compatibility with model_init_params template guidelines when a guideline hyperparameter is directly affected by another hyperparameter that is given as a space choice

Parameters

exp_id: String: Candidate Experiment ID
params: Dict: Candidate “model_init_params” to compare to the template in model_params
score: Number: Value of the candidate Experiment’s target metric
location: Tuple: Location of the hyperparameter space choice that affects the acceptable guideline values of a particular hyperparameter. In other words, this is the path of a hyperparameter, which, if changed, would change the expected default value of another hyperparameter

Returns

Boolean: True if candidate params match model_init_params guidelines. Else, False

Notes

This is used for Keras Experiments when the optimizer value in a model’s compile_params is given as a hyperparameter space choice. Each possible value of optimizer prescribes different default values for the optimizer_params argument, so special measures need to be taken to ensure the correct Experiments are declared to fit within the constraints

does_match_extra_params_space(self, exp_id, params, score) → bool¶

Check candidate compatibility with model_extra_params template space choices

Parameters

exp_id: String: Candidate Experiment ID
params: Dict: Candidate “model_extra_params” to compare to the template in model_params
score: Number: Value of the candidate Experiment’s target metric

Returns

Boolean: True if candidate params fit in model_extra_params space choices. Else, False

does_match_extra_params_guidelines(self, exp_id, params, score) → bool¶

Check candidate guideline compatibility with model_extra_params template

Parameters

exp_id: String: Candidate Experiment ID
params: Dict: Candidate “model_extra_params” to compare to the template in model_params
score: Number: Value of the candidate Experiment’s target metric

Returns

Boolean: True if candidate params match model_extra_params guidelines. Else, False

class hyperparameter_hunter.result_reader.KerasResultFinder(algorithm_name, module_name, cross_experiment_key, target_metric, space, leaderboard_path, descriptions_dir, model_params, sort=None)¶

Bases: hyperparameter_hunter.result_reader.ResultFinder

ResultFinder for locating saved Keras Experiments compatible with the given constraints

Parameters

algorithm_name: String

Name of the algorithm whose hyperparameters are being optimized

module_name: String

Name of the module from whence the algorithm being used came

cross_experiment_key: String

hyperparameter_hunter.environment.Environment.cross_experiment_key produced by the current Environment

target_metric: Tuple

Path denoting the metric to be used. The first value should be one of {“oof”, “holdout”, “in_fold”}, and the second value should be the name of a metric supplied in hyperparameter_hunter.environment.Environment.metrics_params

space: Space

Instance of Space, defining hyperparameter search space constraints

leaderboard_path: String

Path to a leaderboard file, whose listed Experiments will be tested for compatibility

descriptions_dir: String

Path to a directory containing the description files of saved Experiments

model_params: Dict

Concrete hyperparameters for the model. Common keys include “model_init_params” and “model_extra_params”, both of which can be pointers to dicts of hyperparameters. Additionally, “feature_engineer” may be included with an instance of FeatureEngineer

sort: {“target_asc”, “target_desc”, “chronological”, “reverse_chronological”}, or int

… Experimental… How to sort the experiment results that fit within the given constraints

“target_asc”: Sort from experiments with the lowest value for target_metric to those with the greatest
“target_desc”: Sort from experiments with the highest value for target_metric to those with the lowest
“chronological”: Sort from oldest experiments to newest
“reverse_chronological”: Sort from newest experiments to oldest
int: Random seed with which to shuffle experiments

Attributes

experiment_ids: Experiment IDs in the target Leaderboard that match algorithm_name and
mini_spaces: Separate space into subspaces based on model_params keys

Methods

`does_match_extra_params_guidelines`(self, …)	Check candidate guideline compatibility with model_extra_params template
`does_match_extra_params_space`(self, exp_id, …)	Check candidate compatibility with model_extra_params template space choices
`does_match_feature_engineer`(self, exp_id, …)	Check candidate compatibility with feature_engineer template guidelines and space choices.
`does_match_init_params_guidelines`(self, …)	Check candidate compatibility with model_init_params template guidelines
`does_match_init_params_guidelines_multi`(…)	Check candidate compatibility with model_init_params template guidelines when a guideline hyperparameter is directly affected by another hyperparameter that is given as a space choice
`does_match_init_params_space`(self, exp_id, …)	Check candidate compatibility with model_init_params template space choices
`find`(self)	Execute full result-finding workflow, populating `similar_experiments`

hyperparameter_hunter.result_reader.has_experiment_result_file(results_dir, experiment_id, result_type=None)¶

Check if the specified result files exist in results_dir for Experiment experiment_id

Parameters

results_dir: String: HyperparameterHunterAssets directory in which to search for Experiment result files
experiment_id: String, or BaseExperiment: ID of the Experiment whose result files should be searched for in results_dir. If not string, should be an instance of a descendant of BaseExperiment with an “experiment_id” attribute
result_type: List, or string (optional): Result file types for which to check. Valid values include any subdirectory name that can be included in “HyperparameterHunterAssets/Experiments” by default: [“Descriptions”, “Heartbeats”, “PredictionsOOF”, “PredictionsHoldout”, “PredictionsTest”, “ScriptBackups”]. If string, should be one of the aforementioned strings, or “ALL” to use all of the results. If list, should be a subset of the aforementioned list of valid values. Else, default is [“Descriptions”, “Heartbeats”, “PredictionsOOF”, “ScriptBackups”]. The returned boolean signifies whether ALL of the result_type files were found, not whether ANY were found

Returns

Boolean: True if all result files specified by result_type exist in results_dir for the Experiment specified by experiment_id. Else, False

hyperparameter_hunter.sentinels module¶

This module defines Sentinel objects that are used to represent data that is not yet available. For example, hyperparameter_hunter.sentinels.DatasetSentinel is used in hyperparameter_hunter.environment.Environment to enable a user to pass the fold validation dataset as an argument on Experiment initialization. At the point that the sentinel is provided, the training dataset has not yet been split into folds, which is why the Sentinel is necessary

Related¶

hyperparameter_hunter.environment: hyperparameter_hunter.environment.Environment has the following properties that utilize hyperparameter_hunter.sentinels.DatasetSentinel: [train_input, train_target, validation_input, validation_target, holdout_input, holdout_target]. These properties can be passed as arguments to Experiment or OptimizationProtocol initialization in order to provide the dataset to a Model’s fit call, for example
hyperparameter_hunter.experiments: This is one of the points at which one might want to use the Sentinels exposed by hyperparameter_hunter.environment.Environment, specifically as values in the model_init_params and model_extra_params arguments to a descendant of hyperparameter_hunter.experiments.BaseExperiment
hyperparameter_hunter.optimization.protocol_core: This is a second point at which one might use the Sentinels exposed by hyperparameter_hunter.environment.Environment. In this case, they could be provided as values in the model_init_params and model_extra_params arguments in a call to hyperparameter_hunter.optimization.protocol_core.BaseOptPro.forge_experiment(), the structure of which intentionally mirrors that of hyperparameter_hunter.experiments.BaseExperiment.__init__()
hyperparameter_hunter.models: This is ultimately where Sentinel instances will be converted to the actual values that they represent via calls to hyperparameter_hunter.sentinels.locate_sentinels()

class hyperparameter_hunter.sentinels.Sentinel(*args, **kwargs)¶

Bases: object

Base class for Sentinels representing data that is not yet available. Subclasses should call super().__init__() at the end of their __init__ methods

Parameters

*args: List: Extra arguments for subclasses of Sentinel
**kwargs: Dict: Extra keyword arguments for subclasses of Sentinel

Attributes

sentinel: Retrieve Sentinel._sentinel

Methods

retrieve_by_sentinel(self)

Retrieve the actual object represented by the sentinel

property sentinel¶

Retrieve Sentinel._sentinel

Returns

Str: The value of Sentinel._sentinel

abstract retrieve_by_sentinel(self) → object¶

Retrieve the actual object represented by the sentinel

Returns

object: The object for which the sentinel was being used as a placeholder

hyperparameter_hunter.sentinels.locate_sentinels(parameters)¶

Produce a mirrored parameters dict, wherein Sentinel values are converted to the objects they represent

Parameters

parameters: Dict: Dict of parameters, which may contain nested Sentinel values

Returns

Dict: Mirror of parameters, except where a Sentinel was found, the value it represents is returned instead

class hyperparameter_hunter.sentinels.DatasetSentinel(dataset_type, dataset_hash, cv_type=None, global_random_seed=None, random_seeds=None)¶

Bases: hyperparameter_hunter.sentinels.Sentinel

Class to create sentinels representing dataset input/target values

Parameters

dataset_type: Str: Dataset type, suffixed with ‘_input’, or ‘_target’, for which a sentinel should be created. Acceptable values are as follows: [‘train_input’, ‘train_target’, ‘validation_input’, ‘validation_target’, ‘holdout_input’, ‘holdout_target’]
dataset_hash: Str: The hash of the dataset for which a sentinel should be created that was generated while creating hyperparameter_hunter.environment.Environment.cross_experiment_key
cv_type: Str, or None, default=None: If None, dataset_type should be one of [‘holdout_input’, ‘holdout_target’]. Else, should be a string that is one of the following: 1) a string attribute of sklearn.model_selection._split, or 2) a hash produced while creating hyperparameter_hunter.environment.Environment.cross_experiment_key
global_random_seed: Int, or None, default=None: If None, dataset_type should be one of [‘holdout_input’, ‘holdout_target’]. If int, should be hyperparameter_hunter.environment.Environment.global_random_seed
random_seeds: List, or None, default=None: If None, dataset_type should be one of [‘holdout_input’, ‘holdout_target’]. If list, should be hyperparameter_hunter.environment.Environment.random_seeds

Attributes

sentinel: Retrieve Sentinel._sentinel

Methods

retrieve_by_sentinel(self)

Retrieve the actual dataset represented by the sentinel

retrieve_by_sentinel(self)¶

Retrieve the actual dataset represented by the sentinel

Returns

object: The dataset for which the sentinel was being used as a placeholder

hyperparameter_hunter.settings module¶

This module is the doorway for other modules to access the information set by the active hyperparameter_hunter.environment.Environment, and to access the appropriate logging methods. Specifically, other modules will most often use hyperparameter_hunter.settings.G to access the aforementioned information. Additionally, this module defines several variables to assist in navigating the ‘HyperparameterHunterAssets’ directory structure

Related¶

hyperparameter_hunter.environment: This module sets hyperparameter_hunter.settings.G.Env to itself, creating the primary gateway used by other modules to access the active Environment’s information

class hyperparameter_hunter.settings.G¶

Bases: object

This class defines global attributes that are set upon instantiation of environment.Environment. All attributes contained herein are class variables (not instance variables) because the expectation is for the attributes of this class to be set only once, then referenced by operations that may be executed after instantiating a environment.Environment. This allows functions to be called or classes to be initiated without passing a reference to the currently active Environment, because they check the attributes of this class, instead

Attributes

Env: None

This is set to “self” in environment.Environment.__init__(). This fact allows other modules to check if settings.G.Env is None. If None, a environment.Environment has not yet been instantiated. If not None, any attributes or methods of the instantiated Env may be called

save_transformed_predictions: False

Declares format in which a model’s predictions should be saved, with regard to feature_engineering.FeatureEngineer transformations. If no transformation of the target variable takes place (either through feature_engineering.FeatureEngineer, feature_engineering.EngineerStep, or otherwise), then this setting can be ignored.

If save_transformed_predictions is True, and target transformation does occur, then experiment predictions are saved in the same form as the transformed target, which is the form returned directly by a fitted model’s predict method. For example, if target data is label-encoded, and an feature_engineering.EngineerStep is used to one-hot encode the target, then one-hot-encoded predictions will be saved.

Conversely, if save_transformed_predictions is False (default), and target transformation does occur, then experiment predictions are saved in the inverted form of the transformed target, which is the same form as the original target data. Continuing the example of label-encoded target data, and an feature_engineering.EngineerStep to one-hot encode the target, in this case, label-encoded predictions will be saved.

priority_callbacks: Tuple

Intended for internal use only. The contents of this tuple are inserted at the front of an Experiment’s list of callback bases via experiment_core.ExperimentMeta, ahead of even the Experiment’s original base classes. This is used primarily for testing callbacks, but it can also be used if you absolutely need a callback to be placed before the Experiment’s other ancestors in its MRO

log_: print

…

debug_: print

…

warn_: print

…

import_hooks: List

…

sentinel_registry: List

…

Methods

`debug`(content, \args, \\*kwargs)	Set in `environment.Environment.initialize_reporting()` to the updated version of `reporting.ReportingHandler.debug()`
`debug_`(value, …[, sep, end, file, flush])	Prints the values to a stream, or to sys.stdout by default.
`log`(content, \args, \\*kwargs)	Set in `environment.Environment.initialize_reporting()` to the updated version of `reporting.ReportingHandler.log()`
`log_`(value, …[, sep, end, file, flush])	Prints the values to a stream, or to sys.stdout by default.
`reset_attributes`()	Return the attributes of `settings.G` to their original values
`warn`(content, \args, \\*kwargs)	Set in `environment.Environment.initialize_reporting()` to the updated version of `reporting.ReportingHandler.warn()`
`warn_`()	Issue a warning, or maybe ignore it or raise an exception.

Env = None¶

save_transformed_predictions = False¶

priority_callbacks = ()¶

static log(content, *args, **kwargs)¶: Set in environment.Environment.initialize_reporting() to the updated version of reporting.ReportingHandler.log()

static debug(content, *args, **kwargs)¶: Set in environment.Environment.initialize_reporting() to the updated version of reporting.ReportingHandler.debug()

static warn(content, *args, **kwargs)¶: Set in environment.Environment.initialize_reporting() to the updated version of reporting.ReportingHandler.warn()

log_(value, ..., sep=' ', end='n', file=sys.stdout, flush=False)¶: Prints the values to a stream, or to sys.stdout by default. Optional keyword arguments: file: a file-like object (stream); defaults to the current sys.stdout. sep: string inserted between values, default a space. end: string appended after the last value, default a newline. flush: whether to forcibly flush the stream.

debug_(value, ..., sep=' ', end='n', file=sys.stdout, flush=False)¶: Prints the values to a stream, or to sys.stdout by default. Optional keyword arguments: file: a file-like object (stream); defaults to the current sys.stdout. sep: string inserted between values, default a space. end: string appended after the last value, default a newline. flush: whether to forcibly flush the stream.

warn_()¶: Issue a warning, or maybe ignore it or raise an exception.

import_hooks = ['keras_layer', 'keras_initializer', 'keras_variance_scaling']¶

sentinel_registry = []¶

classmethod reset_attributes()¶: Return the attributes of settings.G to their original values

hyperparameter_hunter.tracers module¶

This module defines metaclasses used to trace the parameters passed through operation-critical classes that are members of other libraries. These are only used in cases where it is impractical or impossible to effectively retrieve the arguments explicitly provided by a user, as well as the default arguments for the classes being traced. Generally, tracer metaclasses will aim to add some attributes to the class, that will collect default values, and provided arguments on the class’s creation, and an instance’s call

Related¶

hyperparameter_hunter.importer: This module handles the interception of certain imports in order to inject the tracer metaclasses defined in hyperparameter_hunter.tracers into the inheritance structure of objects that need to be traced

class hyperparameter_hunter.tracers.ArgumentTracer¶

Bases: type

Metaclass to trace the default arguments and explicitly provided arguments of its descendants. It also has special provisions for instantiating dummy models if directed to

Methods

`__call__`(cls, \args, \\*kwargs)	Call self as a function.
`mro`()	return a type’s method resolution order

class hyperparameter_hunter.tracers.LocationTracer¶

Bases: hyperparameter_hunter.tracers.ArgumentTracer

Metaclass to trace the origin of the call to initialize the descending class

Methods

`__call__`(cls, \args, \\*kwargs)	Call self as a function.
`mro`()	return a type’s method resolution order

Module contents¶

class hyperparameter_hunter.Environment(train_dataset, environment_params_path=None, *, results_path=None, metrics=None, holdout_dataset=None, test_dataset=None, target_column=None, id_column=None, do_predict_proba=None, prediction_formatter=None, metrics_params=None, cv_type=None, runs=None, global_random_seed=None, random_seeds=None, random_seed_bounds=None, cv_params=None, verbose=None, file_blacklist=None, reporting_params=None, to_csv_params=None, do_full_save=None, experiment_callbacks=None, experiment_recorders=None, save_transformed_metrics=None)¶

Bases: object

Class to organize the parameters that allow Experiments/OptPros to be fairly compared

Environment is the collective starting point for all of HyperparameterHunter’s biggest and best toys: Experiments and OptimizationProtocols. Without an Environment, neither of these will work.

The Environment is where we declare all the parameters that transcend traditional “hyperparameters”. It houses the stuff without which machine learning can’t even really start. Specifically, Environment cares about 1) The data used for fitting/predicting, 2) The cross-validation scheme used to split the data and fit models; and 3) How to evaluate the predictions made on that data. There are plenty of other goodies documented below, but the absolutely mission-critical parameters concerned with the above tasks are train_dataset, cv_type, cv_params, and metrics. Additionally, it’s important to provide results_path, so Experiment/OptPro results can be saved, which is kind of what HyperparameterHunter is all about

Parameters

train_dataset: Pandas.DataFrame, or str path

The training data for the experiment. Will be split into train/holdout data, if applicable, and train/validation data if cross-validation is to be performed. If str, will attempt to read file at path via pandas.read_csv(). For more information on which columns will be used during fitting/predicting, see the “Dataset columns” note in the “Notes” section below

environment_params_path: String path, or None, default=None

If not None and is valid .json filepath containing an object (dict), the file’s contents are treated as the default values for all keys that match any of the below kwargs used to initialize Environment

results_path: String path, or None, default=None

If valid directory path and the results directory has not yet been created, it will be created here. If this does not end with <ASSETS_DIRNAME>, it will be appended. If <ASSETS_DIRNAME> already exists at this path, new results will also be stored here. If None or invalid, results will not be stored

metrics: Dict, List, or None, default=None

Iterable describing the metrics to be recorded, along with a means to compute the value of each metric. Should be of one of the two following forms:

List Form:

[“<metric name>”, “<metric name>”, …]: Where each value is a string that names an attribute in sklearn.metrics
[Metric, Metric, …]: Where each value of the list is an instance of metrics.Metric
[(<name>, <metric_function>, [<direction>]), (<*args>), …]: Where each value of the list is a tuple of arguments that will be used to instantiate a metrics.Metric. Arguments given in tuples must be in order expected by metrics.Metric: (name, metric_function, direction)

Dict Form:

{“<metric name>”: <metric_function>, …}: Where each key is a name for the corresponding metric callable, which is used to compute the value of the metric
{“<metric name>”: (<metric_function>, <direction>), …}: Where each key is a name for the corresponding metric callable and direction, all of which are used to instantiate a metrics.Metric
{“<metric name>”: “<sklearn metric name>”, …}: Where each key is a name for the metric, and each value is the name of the attribute in sklearn.metrics for which the corresponding key is an alias
{“<metric name>”: None, …}: Where each key is the name of the attribute in sklearn.metrics
{“<metric name>”: Metric, …}: Where each key names an instance of metrics.Metric. This is the internally-used format to which all other formats will be converted

Metric callable functions should expect inputs of form (target, prediction), and should return floats. See the documentation of metrics.Metric for information regarding expected parameters and types

holdout_dataset: Pandas.DataFrame, callable, str path, or None, default=None

If pd.DataFrame, this is the holdout dataset. If callable, expects a function that takes (self.train: DataFrame, self.target_column: str) as input and returns the new (self.train: DataFrame, self.holdout: DataFrame). If str, will attempt to read file at path via pandas.read_csv(). Else, there is no holdout set. For more information on which columns will be used during fitting/predicting, see the “Dataset columns” note in the “Notes” section below

test_dataset: Pandas.DataFrame, str path, or None, default=None

The testing data for the experiment. Structure should be identical to that of train_dataset, except its target_column column can be empty or non-existent, because test_dataset predictions will never be evaluated. If str, will attempt to read file at path via pandas.read_csv(). For more information on which columns will be used during fitting/predicting, see the “Dataset columns” note in the “Notes” section below

target_column: Str, or list, default=’target’

If str, denotes the column name in all provided datasets (except test) that contains the target output. If list, should be a list of strs designating multiple target columns. For example, in a multi-class classification dataset like UCI’s hand-written digits, target_column would be a list containing ten strings. In this example, the target_column data would be sparse, with a 1 to signify that a sample is a written example of a digit (0-9). For a working example, see ‘hyperparameter_hunter/examples/lib_keras_multi_classification_example.py’

id_column: Str, or None, default=None

If not None, str denoting the column name in all provided datasets containing sample IDs

do_predict_proba: Boolean, or int, default=False

If False, models.Model.fit() will call models.Model.model.predict()
If True, it will call models.Model.model.predict_proba(), and the values in all columns will be used as the actual prediction values
If do_predict_proba is an int, models.Model.fit() will call models.Model.model.predict_proba(), as is the case when do_predict_proba is True, but the int supplied as do_predict_proba declares the column index to use as the actual prediction values
For example, for a model to call the predict method, do_predict_proba=False (default). For a model to call the predict_proba method, and use all of the class probabilities, do_predict_proba=True. To call the predict_proba method, and use the class probabilities in the first column, do_predict_proba=0. To use the second column (index 1) of the result, do_predict_proba=1 - This often corresponds to the positive class’s probabilities in binary classification problems. To use the third column do_predict_proba=2, and so on

prediction_formatter: Callable, or None, default=None

If callable, expected to have same signature as utils.result_utils.format_predictions(). That is, the callable will receive (raw_predictions: np.array, dataset_df: pd.DataFrame, target_column: str, id_column: str or None) as input and should return a properly formatted prediction DataFrame. The callable uses raw_predictions as the content, dataset_df to provide any id column, and target_column to identify the column in which to place raw_predictions

metrics_params: Dict, or None, default=dict()

Dictionary of extra parameters to provide to metrics.ScoringMixIn.__init__(). metrics must be provided either 1) as an input kwarg to Environment.__init__() (see metrics), or 2) as a key in metrics_params, but not both. An Exception will be raised if both are given, or if neither is given

cv_type: Class or str, default=’KFold’

The class to define cross-validation splits. If str, it must be an attribute of sklearn.model_selection._split, and it must be a cross-validation class that inherits one of the following sklearn classes: BaseCrossValidator, or _RepeatedSplits. Valid str values include ‘KFold’, and ‘RepeatedKFold’, although there are many more. It must implement the following methods: [__init__, split]. If using a custom class, see the following tested sklearn classes for proper implementations: [KFold, StratifiedKFold, RepeatedKFold, RepeatedStratifiedKFold]. The arguments provided to cv_type.__init__() will be Environment.cv_params, which should include the following: [‘n_splits’ <int>, ‘n_repeats’ <int> (if applicable)]. cv_type.split() will receive the following arguments: [BaseExperiment.train_input_data, BaseExperiment.train_target_data]

runs: Int, default=1

The number of times to fit a model within each fold to perform multiple-run-averaging with different random seeds

global_random_seed: Int, default=32

The initial random seed used just before generating an Experiment’s random_seeds. This ensures consistency for random_seeds between Experiments, without having to explicitly provide it here

random_seeds: None, or List, default=None

If None, random_seeds of the appropriate shape will be created automatically. Else, must be a list of ints of shape (cv_params[‘n_repeats’], cv_params[‘n_splits’], runs). If cv_params does not have the key n_repeats (because standard cross-validation is being used), the value will default to 1. See experiments.BaseExperiment._random_seed_initializer() for info on expected shape

random_seed_bounds: List, default=[0, 100000]

A list containing two integers: the lower and upper bounds, respectively, for generating an Experiment’s random seeds in experiments.BaseExperiment._random_seed_initializer(). Generally, leave this kwarg alone

cv_params: dict, or None, default=dict()

Parameters provided upon initialization of cv_type. Keys may be any args accepted by cv_type.__init__(). Number of fold splits must be provided via “n_splits”, and number of repeats (if applicable for cv_type) must be provided via “n_repeats”

verbose: Int, boolean, default=3

Verbosity of printing for any experiments performed while this Environment is active

Higher values indicate more frequent logging. Logs are still recorded in the heartbeat file regardless of verbosity level. verbose only dictates which logs are visible in the console. The following table illustrates which types of logging messages will be visible with each verbosity level:

| Verbosity | Keys/IDs | Final Score | Repetitions* | Folds | Runs* | Run Starts* | Result Files | Other |
|:---------:|:--------:|:-----------:|:------------:|:-----:|:-----:|:-----------:|:------------:|:-----:|
|     0     |          |             |              |       |       |             |              |       |
|     1     |    Yes   |     Yes     |              |       |       |             |              |       |
|     2     |    Yes   |     Yes     |      Yes     |  Yes  |       |             |              |       |
|     3     |    Yes   |     Yes     |      Yes     |  Yes  |  Yes  |             |              |       |
|     4     |    Yes   |     Yes     |      Yes     |  Yes  |  Yes  |     Yes     |      Yes     |  Yes  |

*: If such logging is deemed appropriate with the given cross-validation parameters. In other words, repetition/run logging will only be verbose if Environment was given more than one repetition/run, respectively

file_blacklist: List of str, or None, or ‘ALL’, default=None

If list of str, the result files named within are not saved to their respective directory in “<ASSETS_DIRNAME>/Experiments”. If None, all result files are saved. If ‘ALL’, nothing at all will be saved for the Experiments. If the path of the file that initializes an Experiment does not end with a “.py” extension, the Experiment proceeds as if “script_backup” had been added to file_blacklist. This means that backup files will not be created for Jupyter notebooks (or any other non-“.py” files). For info on acceptable values, see validate_file_blacklist()

reporting_params: Dict, default=dict()

Parameters passed to initialize reporting.ReportingHandler

to_csv_params: Dict, default=dict()

Parameters passed to the calls to pandas.frame.DataFrame.to_csv() in recorders. In particular, this is where an Experiment’s final prediction files are saved, so the values here will affect the format of the .csv prediction files. Warning: If to_csv_params contains the key “path_or_buf”, it will be removed. Otherwise, all items are supplied directly to to_csv(), including kwargs it might not be expecting if they are given

do_full_save: None, or callable, default=:func:`utils.result_utils.default_do_full_save`

If callable, expected to take an Experiment’s result description dict as input and return a boolean. If None, treated as a callable that returns True. This parameter is used by recorders.DescriptionRecorder to determine whether the Experiment result files following the description should also be created. If do_full_save returns False, result file-saving is stopped early, and only the description is saved. If do_full_save returns True, all files not in file_blacklist are saved normally. This allows you to skip creation of an Experiment’s predictions, logs, and heartbeats if its score does not meet some threshold you set, for example. do_full_save receives the Experiment description dict as input, so for help setting do_full_save, just look into one of your Experiment descriptions

experiment_callbacks: `LambdaCallback`, or list of `LambdaCallback` (optional)

Callbacks injected directly into Experiments, adding new functionality, or customizing existing processes. Should be a LambdaCallback or a list of such classes. LambdaCallback can be created using callbacks.bases.lambda_callback(), which documents the options for creating callbacks. experiment_callbacks will be added to the MRO of the executed Experiment class by experiment_core.ExperimentMeta at __call__ time, making experiment_callbacks new base classes of the Experiment. See callbacks.bases.lambda_callback() for more information. Note that the Experiments conducted by OptPros will still benefit from experiment_callbacks. The presence of LambdaCallbacks will affect neither Environment keys, nor Experiment keys. In other words, for the purposes of Experiment matching/recording, all other factors being equal, an Experiment with experiment_callbacks is considered identical to an Experiment without, despite whatever custom functionality was added by the LambdaCallbacks

experiment_recorders: List, None, default=None

If not None, may be a list whose values are tuples of (<recorders.BaseRecorder descendant>, <str result_path>). The result_path str should be a path relative to results_path that specifies the directory/file in which the product of the custom recorder should be saved. The contents of experiment_recorders will be provided to recorders.RecorderList upon completion of an Experiment, and, if the subclassing documentation in recorders is followed properly, will create or update a result file for the just-executed Experiment

save_transformed_metrics: Boolean (optional)

Declares manner in which a model’s predictions should be evaluated through the provided metrics, with regard to target data transformations. This setting can be ignored if no transformation of the target variable takes place (either through FeatureEngineer, EngineerStep, or otherwise).

The default value of save_transformed_metrics depends on the dtype of the target data in train_dataset. If all target columns are numeric, save_transformed_metrics`=False, meaning metric evaluation should use the original/inverted targets and predictions. Else if any target column is non-numeric, `save_transformed_metrics`=True, meaning evaluation should use the transformed targets and predictions because most metrics require numeric inputs. This is described further in :attr:`save_transformed_metrics. A more descriptive name for this may be “calculate_metrics_using_transformed_predictions”, but that’s a bit verbose–even by my standards

Other Parameters

cross_validation_type: …

Alias for cv_type *

cross_validation_params: …

Alias for cv_params *

metrics_map: …

Alias for metrics *

reporting_handler_params: …

Alias for reporting_params *

root_results_path: …

Alias for results_path *

Notes

Dataset columns: In order to specify the columns to be used by the three dataset kwargs (train_dataset, holdout_dataset, test_dataset) during fitting and predicting, a few attributes can be used. On Environment initialization, the columns specified by the following kwargs will be separated from the rest of the dataset during training/predicting: 1) target_column, which names the column containing the target output labels for the input data; and 2) id_column, which (if given) represents the name of the column that contains identifying information for each data sample, and should otherwise have no relation to the actual data. Additionally, the feature_selector kwarg of the descendants of hyperparameter_hunter.experiments.BaseExperiment (like hyperparameter_hunter.experiments.CVExperiment) is used to filter out columns of the given datasets prior to fitting. See its documentation for more information, but it can effectively be used to remove any columns from the datasets

Overriding default kwargs at environment_params_path: If you have any of the above kwargs specified in the .json file at environment_params_path (except environment_params_path, which will be ignored), you can override its value by passing it as a kwarg when initializing Environment. The contents at environment_params_path are only used when the matching kwarg supplied at initialization is None. See “/examples/environment_params_path_example.py” for details

The order of precedence for determining the value of each parameter is as follows, with items at the top having the highest priority, and deferring only to the items below if their own value is None:

1)kwargs passed directly to Environment.__init__() on initialization,
2)keys of the file at environment_params_path (if valid .json object),
3)keys of hyperparameter_hunter.environment.Environment.DEFAULT_PARAMS

do_predict_proba: Because this parameter can be either a boolean or an integer, it is important to explicitly pass booleans rather than truthy or falsey values. Similarly, only pass integers if you intend for the value to be used as a column index. Do not pass 0 to mean False, or 1 to mean True

Attributes

train_input: DatasetSentinel: Sentinel replaced with current train input data during Model fitting/predicting. Commonly given in the model_extra_params kwargs of hyperparameter_hunter.experiments.BaseExperiment or hyperparameter_hunter.optimization.protocol_core.BaseOptPro.forge_experiment() for eval_set-like hyperparameters. Importantly, the actual value of this Sentinel is determined after performing cross-validation data splitting, and after executing FeatureEngineer
train_target: DatasetSentinel: Like train_input, except for current train target data
validation_input: DatasetSentinel: Like train_input, except for current validation input data
validation_target: DatasetSentinel: Like train_input, except for current validation target data
holdout_input: DatasetSentinel: Like train_input, except for current holdout input data
holdout_target: DatasetSentinel: Like train_input, except for current holdout target data

Methods

`environment_workflow`(self)	Execute all methods required to validate the environment and run Experiments
`format_result_paths`(self)	Remove paths contained in file_blacklist, and format others to prepare for saving results
`generate_cross_experiment_key`(self)	Generate a key to describe the current Environment’s cross-experiment parameters
`initialize_reporting`(self)	Initialize reporting for the Environment and Experiments conducted during its lifetime
`update_custom_environment_params`(self)	Try to update null parameters from environment_params_path, or DEFAULT_PARAMS
`validate_parameters`(self)	Ensure the provided parameters are valid and properly formatted

DEFAULT_PARAMS = {'cv_params': {}, 'cv_type': 'KFold', 'do_full_save': <function default_do_full_save>, 'do_predict_proba': False, 'environment_params_path': None, 'file_blacklist': None, 'global_random_seed': 32, 'id_column': None, 'metrics': None, 'metrics_params': {}, 'prediction_formatter': <function format_predictions>, 'random_seed_bounds': [0, 100000], 'random_seeds': None, 'reporting_params': {'console_params': None, 'float_format': '{:.5f}', 'heartbeat_params': None, 'heartbeat_path': None}, 'results_path': None, 'runs': 1, 'save_transformed_metrics': None, 'target_column': 'target', 'to_csv_params': {}, 'verbose': 3}¶

property results_path¶

property target_column¶

property train_dataset¶

property test_dataset¶

property holdout_dataset¶

property file_blacklist¶

property cv_type¶

property to_csv_params¶

property cross_experiment_params¶

property experiment_callbacks¶

property save_transformed_metrics¶

If save_transformed_metrics is True, and target transformation does occur, then experiment metrics are calculated using the transformed targets and predictions, which is the form returned directly by a fitted model’s predict method. For example, if target data is label-encoded, and an feature_engineering.EngineerStep is used to one-hot encode the target, then metrics functions will receive the following as input: (one-hot-encoded targets, one-hot-encoded predictions).

Conversely, if save_transformed_metrics is False, and target transformation does occur, then experiment metrics are calculated using the inverse of the transformed targets and predictions, which is same form as the original target data. Continuing the example of label-encoded target data, and an feature_engineering.EngineerStep to one-hot encode the target, in this case, metrics functions will receive the following as input: (label-encoded targets, label-encoded predictions)

environment_workflow(self)¶: Execute all methods required to validate the environment and run Experiments

validate_parameters(self)¶: Ensure the provided parameters are valid and properly formatted

format_result_paths(self)¶: Remove paths contained in file_blacklist, and format others to prepare for saving results

update_custom_environment_params(self)¶: Try to update null parameters from environment_params_path, or DEFAULT_PARAMS

generate_cross_experiment_key(self)¶: Generate a key to describe the current Environment’s cross-experiment parameters

initialize_reporting(self)¶: Initialize reporting for the Environment and Experiments conducted during its lifetime

property train_input¶

Get a DatasetSentinel representing an Experiment’s fold_train_input

Returns

DatasetSentinel:: A Sentinel that will be converted to hyperparameter_hunter.experiments.BaseExperiment.fold_train_input upon Model initialization

property train_target¶

Get a DatasetSentinel representing an Experiment’s fold_train_target

Returns

DatasetSentinel:: A Sentinel that will be converted to hyperparameter_hunter.experiments.BaseExperiment.fold_train_target upon Model initialization

property validation_input¶

Get a DatasetSentinel representing an Experiment’s fold_validation_input

Returns

DatasetSentinel:: A Sentinel that will be converted to hyperparameter_hunter.experiments.BaseExperiment.fold_validation_input upon Model initialization

property validation_target¶

Get a DatasetSentinel representing an Experiment’s fold_validation_target

Returns

DatasetSentinel:: A Sentinel that will be converted to hyperparameter_hunter.experiments.BaseExperiment.fold_validation_target upon Model initialization

property holdout_input¶

Get a DatasetSentinel representing an Experiment’s holdout_input_data

Returns

DatasetSentinel:: A Sentinel that will be converted to hyperparameter_hunter.experiments.BaseExperiment.holdout_input_data upon Model initialization

property holdout_target¶

Get a DatasetSentinel representing an Experiment’s holdout_target_data

Returns

DatasetSentinel:: A Sentinel that will be converted to hyperparameter_hunter.experiments.BaseExperiment.holdout_target_data upon Model initialization

class hyperparameter_hunter.CVExperiment(model_initializer, model_init_params=None, model_extra_params=None, feature_engineer=None, feature_selector=None, notes=None, do_raise_repeated=False, auto_start=True, target_metric=None, callbacks=None)¶

Bases: hyperparameter_hunter.experiments.BaseCVExperiment

Attributes

source_script

Methods

`cross_validation_workflow`(self)	Execute workflow for cross-validation process, consisting of the following tasks: 1) Create train and validation split indices for all folds, 2) Iterate through folds, performing cv_fold_workflow for each, 3) Average accumulated predictions over fold splits, 4) Evaluate final predictions, 5) Format final predictions to prepare for saving
`cv_fold_workflow`(self)	Execute workflow for individual fold, consisting of the following tasks: Execute overridden `on_fold_start()` tasks, 2) Perform cv_run_workflow for each run, 3) Execute overridden `on_fold_end()` tasks
`cv_run_workflow`(self)	Execute run workflow, consisting of: 1) Execute overridden `on_run_start()` tasks, 2) Initialize and fit Model, 3) Execute overridden `on_run_end()` tasks
`evaluate`(self, data_type, target, prediction)	Apply metric(s) to the given data to calculate the value of the prediction
`execute`(self)	Execute the fitting protocol for the Experiment, comprising the following: instantiation of learners for each run, preprocessing of data as appropriate, training learners, making predictions, and evaluating and aggregating those predictions and other stats/metrics for later use
`experiment_workflow`(self)	Define the actual experiment process, including execution, result saving, and cleanup
`on_exp_start`(self)	Prepare data prior to executing fitting protocol (cross-validation), by 1) Initializing formal `datasets` attributes, 2) Invoking feature_engineer to perform “pre_cv”-stage preprocessing, and 3) Updating datasets to include their (transformed) counterparts in feature_engineer
`on_fold_start`(self)	Override `on_fold_start()` tasks set by `experiment_core.ExperimentMeta`, consisting of: 1) Split train/validation data, 2) Make copies of holdout/test data for current fold (for feature engineering), 3) Log start, 4) Execute original tasks
`on_run_start`(self)	Override `on_run_start()` tasks organized by `experiment_core.ExperimentMeta`, consisting of: 1) Set random seed and update model parameters according to current seed, 2) Log run start, 3) Execute original tasks
`preparation_workflow`(self)	Execute all tasks that must take place before the experiment is actually started.

source_script = None¶

class hyperparameter_hunter.BayesianOptPro(target_metric=None, iterations=1, verbose=1, read_experiments=True, reporter_parameters=None, warn_on_re_ask=False, base_estimator='GP', n_initial_points=10, acquisition_function='gp_hedge', acquisition_optimizer='auto', random_state=32, acquisition_function_kwargs=None, acquisition_optimizer_kwargs=None, n_random_starts='DEPRECATED', callbacks=None, base_estimator_kwargs=None)¶

Bases: hyperparameter_hunter.optimization.protocol_core.SKOptPro

Bayesian optimization with Gaussian Processes

Attributes

search_space_size: The number of different hyperparameter permutations possible given the current
source_script

Methods

`forge_experiment`(self, model_initializer[, …])	Define hyperparameter search scaffold for building Experiments during optimization
`get_ready`(self)	Prepare for optimization by finalizing hyperparameter space and identifying similar Experiments.
`go`(self[, force_ready])	Execute hyperparameter optimization, building an Experiment for each iteration
`set_dimensions`(self)	Locate given hyperparameters that are space choice declarations and add them to `dimensions`
`set_experiment_guidelines`(self, \*args, …)	Deprecated since version 3.0.0a2.

source_script = None¶

class hyperparameter_hunter.GradientBoostedRegressionTreeOptPro(target_metric=None, iterations=1, verbose=1, read_experiments=True, reporter_parameters=None, warn_on_re_ask=False, base_estimator='GBRT', n_initial_points=10, acquisition_function='EI', acquisition_optimizer='sampling', random_state=32, acquisition_function_kwargs=None, acquisition_optimizer_kwargs=None, n_random_starts='DEPRECATED', callbacks=None, base_estimator_kwargs=None)¶

Bases: hyperparameter_hunter.optimization.protocol_core.SKOptPro

Sequential optimization with gradient boosted regression trees

Attributes

search_space_size: The number of different hyperparameter permutations possible given the current
source_script

Methods

`forge_experiment`(self, model_initializer[, …])	Define hyperparameter search scaffold for building Experiments during optimization
`get_ready`(self)	Prepare for optimization by finalizing hyperparameter space and identifying similar Experiments.
`go`(self[, force_ready])	Execute hyperparameter optimization, building an Experiment for each iteration
`set_dimensions`(self)	Locate given hyperparameters that are space choice declarations and add them to `dimensions`
`set_experiment_guidelines`(self, \*args, …)	Deprecated since version 3.0.0a2.

source_script = None¶

hyperparameter_hunter.GBRT¶: alias of hyperparameter_hunter.optimization.backends.skopt.protocols.GradientBoostedRegressionTreeOptPro

class hyperparameter_hunter.RandomForestOptPro(target_metric=None, iterations=1, verbose=1, read_experiments=True, reporter_parameters=None, warn_on_re_ask=False, base_estimator='RF', n_initial_points=10, acquisition_function='EI', acquisition_optimizer='sampling', random_state=32, acquisition_function_kwargs=None, acquisition_optimizer_kwargs=None, n_random_starts='DEPRECATED', callbacks=None, base_estimator_kwargs=None)¶

Bases: hyperparameter_hunter.optimization.protocol_core.SKOptPro

Sequential optimization with random forest regressor decision trees

Attributes

search_space_size: The number of different hyperparameter permutations possible given the current
source_script

Methods

`forge_experiment`(self, model_initializer[, …])	Define hyperparameter search scaffold for building Experiments during optimization
`get_ready`(self)	Prepare for optimization by finalizing hyperparameter space and identifying similar Experiments.
`go`(self[, force_ready])	Execute hyperparameter optimization, building an Experiment for each iteration
`set_dimensions`(self)	Locate given hyperparameters that are space choice declarations and add them to `dimensions`
`set_experiment_guidelines`(self, \*args, …)	Deprecated since version 3.0.0a2.

source_script = None¶

hyperparameter_hunter.RF¶: alias of hyperparameter_hunter.optimization.backends.skopt.protocols.RandomForestOptPro

class hyperparameter_hunter.ExtraTreesOptPro(target_metric=None, iterations=1, verbose=1, read_experiments=True, reporter_parameters=None, warn_on_re_ask=False, base_estimator='ET', n_initial_points=10, acquisition_function='EI', acquisition_optimizer='sampling', random_state=32, acquisition_function_kwargs=None, acquisition_optimizer_kwargs=None, n_random_starts='DEPRECATED', callbacks=None, base_estimator_kwargs=None)¶

Bases: hyperparameter_hunter.optimization.protocol_core.SKOptPro

Sequential optimization with extra trees regressor decision trees

Attributes

search_space_size: The number of different hyperparameter permutations possible given the current
source_script

Methods

`forge_experiment`(self, model_initializer[, …])	Define hyperparameter search scaffold for building Experiments during optimization
`get_ready`(self)	Prepare for optimization by finalizing hyperparameter space and identifying similar Experiments.
`go`(self[, force_ready])	Execute hyperparameter optimization, building an Experiment for each iteration
`set_dimensions`(self)	Locate given hyperparameters that are space choice declarations and add them to `dimensions`
`set_experiment_guidelines`(self, \*args, …)	Deprecated since version 3.0.0a2.

source_script = None¶

hyperparameter_hunter.ET¶: alias of hyperparameter_hunter.optimization.backends.skopt.protocols.ExtraTreesOptPro

class hyperparameter_hunter.DummyOptPro(target_metric=None, iterations=1, verbose=1, read_experiments=True, reporter_parameters=None, warn_on_re_ask=False, base_estimator='DUMMY', n_initial_points=10, acquisition_function='EI', acquisition_optimizer='sampling', random_state=32, acquisition_function_kwargs=None, acquisition_optimizer_kwargs=None, n_random_starts='DEPRECATED', callbacks=None, base_estimator_kwargs=None)¶

Bases: hyperparameter_hunter.optimization.protocol_core.SKOptPro

Random search by uniform sampling

Attributes

search_space_size: The number of different hyperparameter permutations possible given the current
source_script

Methods

`forge_experiment`(self, model_initializer[, …])	Define hyperparameter search scaffold for building Experiments during optimization
`get_ready`(self)	Prepare for optimization by finalizing hyperparameter space and identifying similar Experiments.
`go`(self[, force_ready])	Execute hyperparameter optimization, building an Experiment for each iteration
`set_dimensions`(self)	Locate given hyperparameters that are space choice declarations and add them to `dimensions`
`set_experiment_guidelines`(self, \*args, …)	Deprecated since version 3.0.0a2.

source_script = None¶

class hyperparameter_hunter.Real(low, high, prior='uniform', transform='identity', name=None)¶

Bases: hyperparameter_hunter.space.dimensions.NumericalDimension

Search space dimension that can assume any real value in a given range

Parameters

low: Float: Lower bound (inclusive)
high: Float: Upper bound (inclusive)
prior: {“uniform”, “log-uniform”}, default=”uniform”: Distribution to use when sampling random points for this dimension. If “uniform”, points are sampled uniformly between the lower and upper bounds. If “log-uniform”, points are sampled uniformly between log10(lower) and log10(upper)
transform: {“identity”, “normalize”}, default=”identity”: Transformation to apply to the original space. If “identity”, the transformed space is the same as the original space. If “normalize”, the transformed space is scaled between 0 and 1
name: String, tuple, or None, default=None: A name associated with the dimension

Attributes

distribution: rv_generic: See documentation of _make_distribution() or distribution()
transform_: String: Original value passed through the transform kwarg - Because transform() exists
transformer: Transformer: See documentation of _make_transformer() or transformer()

Methods

`distance`(self, a, b)	Calculate distance between two points in the dimension’s bounds
`get_params`(self)	Get dict of parameters used to initialize the Real, or their defaults
`inverse_transform`(self, data_t)	Inverse transform samples from the warped space back to the original space
`rvs`(self[, n_samples, random_state])	Draw random samples.
`transform`(self, data)	Transform samples from the original space into a warped space

inverse_transform(self, data_t)¶

Inverse transform samples from the warped space back to the original space

Parameters

data_t: List: Samples to inverse transform. Should be of shape (<# samples>, transformed_size)

Returns

List: Samples transformed back to original space. Will be shape (<# samples>, size)

property transformed_bounds¶

Dimension bounds in the warped space

Returns

low: Float: 0.0 if transform_`="normalize". If :attr:`transform_`="identity" and :attr:`prior`="uniform", then :attr:`low. Else log10(low)
high: Float: 1.0 if transform_`="normalize". If :attr:`transform_`="identity" and :attr:`prior`="uniform", then :attr:`high. Else log10(high)

get_params(self) → dict¶: Get dict of parameters used to initialize the Real, or their defaults

class hyperparameter_hunter.Integer(low, high, transform='identity', name=None)¶

Bases: hyperparameter_hunter.space.dimensions.NumericalDimension

Search space dimension that can assume any integer value in a given range

Parameters

low: Int: Lower bound (inclusive)
high: Int: Upper bound (inclusive)
transform: {“identity”, “normalize”}, default=”identity”: Transformation to apply to the original space. If “identity”, the transformed space is the same as the original space. If “normalize”, the transformed space is scaled between 0 and 1
name: String, tuple, or None, default=None: A name associated with the dimension

Attributes

distribution: rv_generic: See documentation of _make_distribution() or distribution()
transform_: String: Original value passed through the transform kwarg - Because transform() exists
transformer: Transformer: See documentation of _make_transformer() or transformer()

Methods

`distance`(self, a, b)	Calculate distance between two points in the dimension’s bounds
`get_params`(self)	Get dict of parameters used to initialize the Integer, or their defaults
`inverse_transform`(self, data_t)	Inverse transform samples from the warped space back to the original space
`rvs`(self[, n_samples, random_state])	Draw random samples.
`transform`(self, data)	Transform samples from the original space into a warped space

inverse_transform(self, data_t)¶

Inverse transform samples from the warped space back to the original space

Parameters

data_t: List: Samples to inverse transform. Should be of shape (<# samples>, transformed_size)

Returns

List: Samples transformed back to original space. Will be shape (<# samples>, size)

property transformed_bounds¶

Dimension bounds in the warped space

Returns

low: Int: 0 if transform_`="normalize", else :attr:`low
high: Int: 1 if transform_`="normalize", else :attr:`high

get_params(self) → dict¶: Get dict of parameters used to initialize the Integer, or their defaults

class hyperparameter_hunter.Categorical(categories: list, prior: list = None, transform='onehot', optional=False, name=None)¶

Bases: hyperparameter_hunter.space.dimensions.Dimension

Search space dimension that can assume any categorical value in a given list

Parameters

categories: List: Sequence of possible categories of shape (n_categories,)
prior: List, or None, default=None: If list, prior probabilities for each category of shape (categories,). By default all categories are equally likely
transform: {“onehot”, “identity”}, default=”onehot”: Transformation to apply to the original space. If “identity”, the transformed space is the same as the original space. If “onehot”, the transformed space is a one-hot encoded representation of the original space
optional: Boolean, default=False: Intended for use by FeatureEngineer when optimizing an EngineerStep. Specifically, this enables searching through a space in which an EngineerStep either may or may not be used. This is contrary to Categorical’s usual function of creating a space comprising multiple categories. When optional = True, the space created will represent any of the values in categories either being included in the entire FeatureEngineer process, or being skipped entirely. Internally, a value excluded by optional is represented by a sentinel value that signals it should be removed from the containing list, so optional will not work for choosing between a single value and None, for example
name: String, tuple, or None, default=None: A name associated with the dimension

Attributes

categories: Tuple: Original value passed through the categories kwarg, cast to a tuple. If optional is True, then an instance of RejectedOptional will be appended to categories
distribution: rv_generic: See documentation of _make_distribution() or distribution()
optional: Boolean: Original value passed through the optional kwarg
prior: List, or None: Original value passed through the prior kwarg
prior_actual: List: Calculated prior value, initially equivalent to prior, but then set to a default array if None
transform_: String: Original value passed through the transform kwarg - Because transform() exists
transformer: Transformer: See documentation of _make_transformer() or transformer()

Methods

`distance`(self, a, b)	Calculate distance between two points in the dimension’s bounds
`get_params`(self)	Get dict of parameters used to initialize the Categorical, or their defaults
`inverse_transform`(self, data_t)	Inverse transform samples from the warped space back to the original space
`rvs`(self[, n_samples, random_state])	Draw random samples.
`transform`(self, data)	Transform samples from the original space into a warped space

rvs(self, n_samples=None, random_state=None)¶

Draw random samples. Samples are in the original (untransformed) space. They must be transformed before being passed to a model or minimizer via transform()

Parameters

n_samples: Int (optional): Number of samples to be drawn. If not given, a single sample will be returned
random_state: Int, RandomState, or None, default=None: Set random state to something other than None for reproducible results

Returns

List: Randomly drawn samples from the original space

property transformed_size¶

Size of the transformed space for the dimension

Returns

Int

1 if transform_ == “identity”
1 if transform_ == “onehot” and length of categories is 1 or 2
Length of categories in all other cases

property bounds¶

Dimension bounds in the original space

Returns

Tuple: categories

property transformed_bounds¶

Dimension bounds in the warped space

Returns

Tuple, or list: If transformed_size == 1, then a tuple of (0.0, 1.0). Otherwise, returns a list containing transformed_size-many tuples of (0.0, 1.0)

Notes

transformed_size == 1 when the length of categories == 2, so if there are two items in categories, (0.0, 1.0) is returned. If there are three items in categories, [(0.0, 1.0), (0.0, 1.0), (0.0, 1.0)] is returned, and so on.

Because transformed_bounds uses transformed_size, it is affected by transform_. Specifically, the returns described above are for transform_ == “onehot” (default).

Examples

>>> Categorical(["a", "b"]).transformed_bounds
(0.0, 1.0)
>>> Categorical(["a", "b", "c"]).transformed_bounds
[(0.0, 1.0), (0.0, 1.0), (0.0, 1.0)]
>>> Categorical(["a", "b", "c", "d"]).transformed_bounds
[(0.0, 1.0), (0.0, 1.0), (0.0, 1.0), (0.0, 1.0)]

distance(self, a, b) → int¶

Calculate distance between two points in the dimension’s bounds

Parameters

a: First category
b: Second category

Returns

Int: 0 if a == b. Else 1 (because categories have no order)

get_params(self) → dict¶: Get dict of parameters used to initialize the Categorical, or their defaults

hyperparameter_hunter.lambda_callback(on_exp_start=None, on_exp_end=None, on_rep_start=None, on_rep_end=None, on_fold_start=None, on_fold_end=None, on_run_start=None, on_run_end=None, agg_name=None, do_reshape_aggs=True, method_agg_keys=False, on_experiment_start=<object object at 0x7fe8726edbf0>, on_experiment_end=<object object at 0x7fe8726edbf0>, on_repetition_start=<object object at 0x7fe8726edbf0>, on_repetition_end=<object object at 0x7fe8726edbf0>)¶

Utility for creating custom callbacks to be declared by Environment and used by Experiments. The callable “on_<…>_<start/end>” parameters provided will receive as input whichever attributes of the Experiment are included in the signature of the given callable. If **kwargs is given in the callable’s signature, a dict of all of the Experiment’s attributes will be provided. This can be helpful for trying to figure out how to build a custom callback, but should not be used unless absolutely necessary. If the Experiment does not have an attribute specified in the callable’s signature, the following placeholder will be given: “INVALID KWARG”

Parameters

on_exp_start: Callable, or None, default=None: Callable that receives Experiment’s values for parameters in the signature at Experiment start
on_exp_end: Callable, or None, default=None: Callable that receives Experiment’s values for parameters in the signature at Experiment end
on_rep_start: Callable, or None, default=None: Callable that receives Experiment’s values for parameters in the signature at repetition start
on_rep_end: Callable, or None, default=None: Callable that receives Experiment’s values for parameters in the signature at repetition end
on_fold_start: Callable, or None, default=None: Callable that receives Experiment’s values for parameters in the signature at fold start
on_fold_end: Callable, or None, default=None: Callable that receives Experiment’s values for parameters in the signature at fold end
on_run_start: Callable, or None, default=None: Callable that receives Experiment’s values for parameters in the signature at run start
on_run_end: Callable, or None, default=None: Callable that receives Experiment’s values for parameters in the signature at run end
agg_name: Str, default=uuid.uuid4: This parameter is only used if the callables are behaving like AggregatorCallbacks by returning values (see the “Notes” section below for details on this). If the callables do return values, they will be stored under a key named (“_” + agg_name) in a dict in hyperparameter_hunter.experiments.BaseExperiment.stat_aggregates. The purpose of this parameter is to make it easier to understand an Experiment’s description file, as agg_name will default to a UUID if it is not given
do_reshape_aggs: Boolean, default=True: Whether to reshape the aggregated values to reflect the nested repetitions/folds/runs structure used for other aggregated values. If False, lists of aggregated values are left in their original shapes. This parameter is only used if the callables are behaving like AggregatorCallbacks (see the “Notes” section below and agg_name for details on this)
method_agg_keys: Boolean, default=False: If True, the aggregate keys for the items added to the dict at agg_name are equivalent to the names of the “on_<…>_<start/end>” pseudo-methods whose values are being aggregated. In other words, the pool of all possible aggregate keys goes from [“runs”, “folds”, “reps”, “final”] to the names of the eight “on_<…>_<start/end>” kwargs of lambda_callback(). See the “Notes” section below for further details and a rough outline
on_experiment_start: …: Deprecated since version 3.0.0: Renamed to on_exp_start. Will be removed in 3.2.0
on_experiment_end: …: Deprecated since version 3.0.0: Renamed to on_exp_end. Will be removed in 3.2.0
on_repetition_start: …: Deprecated since version 3.0.0: Renamed to on_rep_start. Will be removed in 3.2.0
on_repetition_end: …: Deprecated since version 3.0.0: Renamed to on_rep_end. Will be removed in 3.2.0

Returns

LambdaCallback: LambdaCallback: Uninitialized class, whose methods are the callables of the corresponding “on…” kwarg

Notes

For all of the “on_<…>_<start/end>” callables provided as input to lambda_callback, consider the following guidelines (for example function “f”, which can represent any of the callables):

All input parameters in the signature of “f” are attributes of the Experiment being executed
- If “**kwargs” is a parameter, a dict of all the Experiment’s attributes will be provided
“f” will be treated as a method of a parent class of the Experiment
- Take care when modifying attributes, as changes are reflected in the Experiment itself
If “f” returns something, it will automatically behave like an AggregatorCallback (see hyperparameter_hunter.callbacks.aggregators). Specifically, the following will occur:
- A new key (named by agg_name if given, else a UUID) with a dict value is added to hyperparameter_hunter.experiments.BaseExperiment.stat_aggregates
  This new dict can have up to four keys: “runs” (list), “folds” (list), “reps” (list), and “final” (object)
- If “f” is an “on_run…” function, the returned value is appended to the “runs” list in the new dict
- Similarly, if “f” is an “on_fold…” or “on_rep…” function, the returned value is appended to the “folds”, or “reps” list, respectively
- If “f” is an “on_exp…” function, the “final” key in the new dict is set to the returned value
- If values were aggregated in the aforementioned manner, the lists of collected values will be reshaped according to runs/folds/reps on Experiment end
- The aggregated values will be saved in the Experiment’s description file
  This is because hyperparameter_hunter.experiments.BaseExperiment.stat_aggregates is saved in its entirety

What follows is a rough outline of the structure produced when using an aggregator-like callback that automatically populates experiments.BaseExperiment.stat_aggregates with results of the functions used as arguments to lambda_callback():

BaseExperiment.stat_aggregates = dict(
    ...,
    <`agg_name`>=dict(
        <agg_key "runs">  = [...],
        <agg_key "folds"> = [...],
        <agg_key "reps">  = [...],
        <agg_key "final"> = object(),
        ...
    ),
    ...
)

In the above outline, the actual agg_key`s included in the dict at `agg_name depend on which “on_<…>_<start/end>” callables are behaving like aggregators. For example, if neither on_run_start nor on_run_end explicitly returns something, then the “runs” agg_key is not included in the agg_name dict. Similarly, if, for example, neither on_exp_start nor on_exp_end is provided, then the “final” agg_key is not included. If method_agg_keys=True, then the agg keys used in the dict are modified to be named after the method called. For example, if method_agg_keys=True and on_fold_start and on_fold_end are both callables returning values to be aggregated, then the agg_key`s used for each will be “on_fold_start” and “on_fold_end”, respectively. In this example, if `method_agg_keys=False (default) and do_reshape_aggs=False, then the single “folds” agg_key would contain the combined contents returned by both methods in the order in which they were returned

For examples using lambda_callback to create custom callbacks, see hyperparameter_hunter.callbacks.recipes

Examples

>>> from hyperparameter_hunter.environment import Environment
>>> def printer_helper(_rep, _fold, _run, last_evaluation_results):
...     print(f"{_rep}.{_fold}.{_run}   {last_evaluation_results}")
>>> my_lambda_callback = lambda_callback(
...     on_exp_end=printer_helper,
...     on_rep_end=printer_helper,
...     on_fold_end=printer_helper,
...     on_run_end=printer_helper,
... )
... # env = Environment(
... #     train_dataset="i am a dataset",
... #     results_path="path/to/HyperparameterHunterAssets",
... #     metrics=["roc_auc_score"],
... #     experiment_callbacks=[my_lambda_callback]
... # )
... # ... Now execute an Experiment, or an Optimization Protocol...

See hyperparameter_hunter.examples.lambda_callback_example for more information

class hyperparameter_hunter.FeatureEngineer(steps=None, do_validate=False, **datasets)¶

Bases: object

Class to organize feature engineering step callables steps (EngineerStep instances) and the datasets that the steps request and return.

Parameters

steps: List, or None, default=None

List of arbitrary length, containing any of the following values:

EngineerStep instance,

Function to provide as input to EngineerStep, or

Categorical, with categories comprising a selection of the previous two steps values (optimization only)

The third value can only be used during optimization. The feature_engineer provided to CVExperiment, for example, may only contain the first two values. To search a space optionally including an EngineerStep, use the optional kwarg of Categorical.

See EngineerStep for information on properly formatted EngineerStep functions. Additional engineering steps may be added via add_step()

do_validate: Boolean, or “strict”, default=False

… Experimental… Whether to validate the datasets resulting from feature engineering steps. If True, hashes of the new datasets will be compared to those of the originals to ensure they were actually modified. Results will be logged. If do_validate = “strict”, an exception will be raised if any anomalies are found, rather than logging a message. If do_validate = False, no validation will be performed

**datasets: DFDict

This is not expected to be provided on initialization and is offered primarily for debugging/testing. Mapping of datasets necessary to perform feature engineering steps

See also

EngineerStep: For proper formatting of non-Categorical values of steps

Notes

If steps does include any instances of hyperparameter_hunter.space.dimensions.Categorical, this FeatureEngineer instance will not be usable by Experiments. It can only be used by Optimization Protocols. Furthermore, the FeatureEngineer that the Optimization Protocol actually ends up using will not pass identity checks against the original FeatureEngineer that contained Categorical steps

Examples

>>> from sklearn.preprocessing import StandardScaler, MinMaxScaler, QuantileTransformer
>>> # Define some engineer step functions to play with
>>> def s_scale(train_inputs, non_train_inputs):
...     s = StandardScaler()
...     train_inputs[train_inputs.columns] = s.fit_transform(train_inputs.values)
...     non_train_inputs[train_inputs.columns] = s.transform(non_train_inputs.values)
...     return train_inputs, non_train_inputs
>>> def mm_scale(train_inputs, non_train_inputs):
...     s = MinMaxScaler()
...     train_inputs[train_inputs.columns] = s.fit_transform(train_inputs.values)
...     non_train_inputs[train_inputs.columns] = s.transform(non_train_inputs.values)
...     return train_inputs, non_train_inputs
>>> def q_transform(train_targets, non_train_targets):
...     t = QuantileTransformer(output_distribution="normal")
...     train_targets[train_targets.columns] = t.fit_transform(train_targets.values)
...     non_train_targets[train_targets.columns] = t.transform(non_train_targets.values)
...     return train_targets, non_train_targets, t
>>> def sqr_sum(all_inputs):
...     all_inputs["square_sum"] = all_inputs.agg(
...         lambda row: np.sqrt(np.sum([np.square(_) for _ in row])), axis="columns"
...     )
...     return all_inputs

FeatureEngineer steps wrapped by `EngineerStep` == raw function steps - as long as the `EngineerStep` is using the default parameters

>>> # FeatureEngineer steps wrapped by `EngineerStep` == raw function steps
>>> #   ... As long as the `EngineerStep` is using the default parameters
>>> fe_0 = FeatureEngineer([sqr_sum, s_scale])
>>> fe_1 = FeatureEngineer([EngineerStep(sqr_sum), EngineerStep(s_scale)])
>>> fe_0.steps == fe_1.steps
True
>>> fe_2 = FeatureEngineer([sqr_sum, EngineerStep(s_scale), q_transform])

`Categorical` can be used during optimization and placed anywhere in `steps`. `Categorical` can also handle either `EngineerStep` categories or raw functions. Use the `optional` kwarg of `Categorical` to test some questionable steps

>>> fe_3 = FeatureEngineer([sqr_sum, Categorical([s_scale, mm_scale]), q_transform])
>>> fe_4 = FeatureEngineer([Categorical([sqr_sum], optional=True), s_scale, q_transform])
>>> fe_5 = FeatureEngineer([
...     Categorical([sqr_sum], optional=True),
...     Categorical([EngineerStep(s_scale), mm_scale]),
...     q_transform
... ])

Attributes

steps: Feature engineering steps to execute in sequence on FeatureEngineer.__call__()

Methods

`__call__`(self, stage, \\datasets, …)	Execute all feature engineering steps in `steps` for stage, with datasets datasets as inputs
`add_step`(self, step, …)	Add an engineering step to `steps` to be executed with the other contents of `steps` on `FeatureEngineer.__call__()`
`get_key_data`(self)	Produce a dict of critical attributes describing the `FeatureEngineer` instance for use by key-making classes
`inverse_transform`(self, data)	Perform the inverse transformation for all engineer steps in `steps` in sequence on data

inverse_transform(self, data)¶

Perform the inverse transformation for all engineer steps in steps in sequence on data

Parameters

data: Array-like: Data to inverse transform with any inversions present in steps

Returns

Array-like: Result of sequentially calling inverse transformations in steps on data. If any step has EngineerStep.inversion = None, data is unmodified for that step, and proceeds to next engineer step inversion

property steps¶: Feature engineering steps to execute in sequence on FeatureEngineer.__call__()

get_key_data(self) → dict¶

Produce a dict of critical attributes describing the FeatureEngineer instance for use by key-making classes

Returns

Dict: Important attributes describing this FeatureEngineer instance

add_step(self, step:Union[Callable, hyperparameter_hunter.space.dimensions.Categorical], stage:str=None, name:str=None, before:str=<class 'hyperparameter_hunter.feature_engineering.EMPTY_SENTINEL'>, after:str=<class 'hyperparameter_hunter.feature_engineering.EMPTY_SENTINEL'>, number:int=<class 'hyperparameter_hunter.feature_engineering.EMPTY_SENTINEL'>)¶

Add an engineering step to steps to be executed with the other contents of steps on FeatureEngineer.__call__()

Parameters

step: Callable, or `EngineerStep`, or `Categorical`: If EngineerStep instance, will be added directly to steps. Otherwise, must be a feature engineering step callable that requests, modifies, and returns datasets, which will be used to instantiate a EngineerStep to add to steps. If Categorical, categories should contain EngineerStep instances or callables
stage: String in {“pre_cv”, “intra_cv”}, or None, default=None: Feature engineering stage during which the callable step will be executed
name: String, or None, default=None: Identifier for the transformation applied by this engineering step. If None and step is not an EngineerStep, will be inferred during EngineerStep instantiation
before: String, default=EMPTY_SENTINEL: … Experimental…
after: String, default=EMPTY_SENTINEL: … Experimental…
number: String, default=EMPTY_SENTINEL: … Experimental…

class hyperparameter_hunter.EngineerStep(f: Callable, stage=None, name=None, params=None, do_validate=False)¶

Bases: object

Container for individual FeatureEngineer step functions

Compartmentalizes functions of singular engineer steps and allows for greater customization than a raw engineer step function

Parameters

f: Callable

Feature engineering step function that requests, modifies, and returns datasets params

Step functions should follow these guidelines:

Request as input a subset of the 11 data strings listed in params

Do whatever you want to the DataFrames given as input

Return new DataFrame values of the input parameters in same order as requested

If performing a task like target transformation, causing predictions to be transformed, it is often desirable to inverse-transform the predictions to be of the expected form. This can easily be done by returning an extra value from f (after the datasets) that is either a callable, or a transformer class that was fitted during the execution of f and implements an inverse_transform method. This is the only instance in which it is acceptable for f to return values that don’t mimic its input parameters. See the engineer function definition using SKLearn’s QuantileTransformer in the Examples section below for an actual inverse-transformation-compatible implementation

stage: String in {“pre_cv”, “intra_cv”}, or None, default=None

Feature engineering stage during which the callable f will be given the datasets params to modify and return. If None, will be inferred based on params.

“pre_cv” functions are applied only once in the experiment: when it starts

“intra_cv” functions are reapplied for each fold in the cross-validation splits

If stage is left to be inferred, “pre_cv” will usually be selected. However, if any params (or parameters in the signature of f) are prefixed with “validation…” or “non_train…”, then stage will inferred as “intra_cv”. See the Notes section below for suggestions on the stage to use for different functions

name: String, or None, default=None

Identifier for the transformation applied by this engineering step. If None, f.__name__ will be used

params: Tuple[str], or None, default=None

Dataset names requested by feature engineering step callable f. If None, will be inferred by parsing the signature of f. Must be a subset of the following 11 strings:

Input Data

“train_inputs”
“validation_inputs”
“holdout_inputs”
“test_inputs”
“all_inputs”
("train_inputs" + ["validation_inputs"] + "holdout_inputs" + "test_inputs")
“non_train_inputs”
(["validation_inputs"] + "holdout_inputs" + "test_inputs")

Target Data

“train_targets”
“validation_targets”
“holdout_targets”
“all_targets” ("train_targets" + ["validation_targets"] + "holdout_targets")
“non_train_targets” (["validation_targets"] + "holdout_targets")

As an alternative to the above list, just remember that the first half of all parameter names should be one of {“train”, “validation”, “holdout”, “test”, “all”, “non_train”}, and the second half should be either “inputs” or “targets”. The only exception to this rule is “test_targets”, which doesn’t exist.

Inference of “validation” params is affected by stage. During the “pre_cv” stage, the validation dataset has not yet been created and is still a part of the train dataset. During the “intra_cv” stage, the validation dataset is created by removing a portion of the train dataset, and their values passed to f reflect this fact. This also means that the values of the merged (“all”/”non_train”-prefixed) datasets may or may not contain “validation” data depending on the stage; however, this is all handled internally, so you probably don’t need to worry about it.

params may not include multiple references to the same dataset, either directly or indirectly. This means (“train_inputs”, “train_inputs”) is invalid due to duplicate direct references. Less obviously, (“train_inputs”, “all_inputs”) is invalid because “all_inputs” includes “train_inputs”

do_validate: Boolean, or “strict”, default=False

… Experimental… Whether to validate the datasets resulting from feature engineering steps. If True, hashes of the new datasets will be compared to those of the originals to ensure they were actually modified. Results will be logged. If do_validate = “strict”, an exception will be raised if any anomalies are found, rather than logging a message. If do_validate = False, no validation will be performed

See also

FeatureEngineer: The container for EngineerStep instances - EngineerStep`s should always be provided to HyperparameterHunter through a `FeatureEngineer
Categorical: Can be used during optimization to search through a group of EngineerStep`s given as `categories. The optional kwarg of Categorical designates a FeatureEngineer step that may be one of the EngineerStep`s in `categories, or may be omitted entirely
get_engineering_step_stage(): More information on stage inference and situations where overriding it may be prudent

Notes

stage: Generally, feature engineering conducted in the “pre_cv” stage should regard each sample/row as independent entities. For example, steps like converting a string day of the week to one-hot encoded columns, or imputing missing values by replacement with -1 might be conducted “pre_cv”, since they are unlikely to introduce an information leakage. Conversely, steps like scaling/normalization, whose results for the data in one row are affected by the data in other rows should be performed “intra_cv” in order to recalculate the final values of the datasets for each cross validation split and avoid information leakage.

params: In the list of the 11 valid params strings, “test_inputs” is notably missing the “…_targets” counterpart accompanying the other datasets. The “targets” suffix is missing because test data targets are never given. Note that although “test_inputs” is still included in both “all_inputs” and “non_train_inputs”, its lack of a target column means that “all_targets” and “non_train_targets” may have different lengths than their “inputs”-suffixed counterparts

Examples

>>> from sklearn.preprocessing import StandardScaler, QuantileTransformer
>>> def s_scale(train_inputs, non_train_inputs):
...     s = StandardScaler()
...     train_inputs[train_inputs.columns] = s.fit_transform(train_inputs.values)
...     non_train_inputs[train_inputs.columns] = s.transform(non_train_inputs.values)
...     return train_inputs, non_train_inputs
>>> # Sensible parameter defaults inferred based on `f`
>>> es_0 = EngineerStep(s_scale)
>>> es_0.stage
'intra_cv'
>>> es_0.name
's_scale'
>>> es_0.params
('train_inputs', 'non_train_inputs')
>>> # Override `stage` if you want to fit your scaler on OOF data like a crazy person
>>> es_1 = EngineerStep(s_scale, stage="pre_cv")
>>> es_1.stage
'pre_cv'

Watch out for multiple requests to the same data

>>> es_2 = EngineerStep(s_scale, params=("train_inputs", "all_inputs"))
Traceback (most recent call last):
    File "feature_engineering.py", line ? in validate_dataset_names
ValueError: Requested params include duplicate references to `train_inputs` by way of:
   - ('all_inputs', 'train_inputs')
   - ('train_inputs',)
Each dataset may only be requested by a single param for each function

Error is the same if `(train_inputs, all_inputs)` is in the actual function signature

EngineerStep functions aren’t just limited to transformations. Make your own features!

>>> def sqr_sum(all_inputs):
...     all_inputs["square_sum"] = all_inputs.agg(
...         lambda row: np.sqrt(np.sum([np.square(_) for _ in row])), axis="columns"
...     )
...     return all_inputs
>>> es_3 = EngineerStep(sqr_sum)
>>> es_3.stage
'pre_cv'
>>> es_3.name
'sqr_sum'
>>> es_3.params
('all_inputs',)

Inverse-transformation Implementation:

>>> def q_transform(train_targets, non_train_targets):
...     t = QuantileTransformer(output_distribution="normal")
...     train_targets[train_targets.columns] = t.fit_transform(train_targets.values)
...     non_train_targets[train_targets.columns] = t.transform(non_train_targets.values)
...     return train_targets, non_train_targets, t
>>> # Note that `train_targets` and `non_train_targets` must still be returned in order,
>>> #   but they are followed by `t`, an instance of `QuantileTransformer` we just fitted,
>>> #   whose `inverse_transform` method will be called on predictions
>>> es_4 = EngineerStep(q_transform)
>>> es_4.stage
'intra_cv'
>>> es_4.name
'q_transform'
>>> es_4.params
('train_targets', 'non_train_targets')
>>> # `params` does not include any returned transformers - Only data requested as input

Attributes

f: Feature engineering step callable that requests, modifies, and returns datasets
name: Identifier for the transformation applied by this engineering step
params: Dataset names requested by feature engineering step callable f.
stage: Feature engineering stage during which the EngineerStep will be executed

Methods

`__call__`(self, \\datasets, …)	Apply `f` to datasets to produce updated datasets.
`get_comparison_attrs`(step_obj, dict])	Build a dict of critical `EngineerStep` attributes
`get_datasets_for_f`(self, datasets, …)	Produce a dict of DataFrames containing only the merged datasets and standard datasets requested in `params`.
`get_key_data`(self)	Produce a dict of critical attributes describing the `EngineerStep` instance for use by key-making classes
`honorary_step_from_dict`(step_dict, dimension)	Get an EngineerStep from dimension that is equal to its dict form, step_dict
`inverse_transform`(self, data)	Perform the inverse transformation for this engineer step (if it exists)
`stringify`(self)	Make a stringified representation of self, compatible with `EngineerStep.__eq__()`

inverse_transform(self, data)¶

Perform the inverse transformation for this engineer step (if it exists)

Parameters

data: Array-like: Data to inverse transform with inversion or inversion.inverse_transform

Returns

Array-like: If inversion is None, return data unmodified. Else, return the result of inversion or inversion.inverse_transform, given data

get_datasets_for_f(self, datasets:Dict[str, pandas.core.frame.DataFrame]) → Dict[str, pandas.core.frame.DataFrame]¶

Produce a dict of DataFrames containing only the merged datasets and standard datasets requested in params. In other words, add the requested merged datasets and remove unnecessary standard datasets

Parameters

datasets: DFDict: Original dict of datasets, containing all datasets provided to EngineerStep.__call__(), some of which may be superfluous, or may require additional processing to resolve merged/coupled datasets

Returns

DFDict: Updated version of datasets, in which unnecessary datasets have been filtered out, and the requested merged datasets have been added

get_key_data(self) → dict¶

Produce a dict of critical attributes describing the EngineerStep instance for use by key-making classes

Returns

Dict: Important attributes describing this EngineerStep instance

property f¶: Feature engineering step callable that requests, modifies, and returns datasets

property name¶: Identifier for the transformation applied by this engineering step

property params¶: Dataset names requested by feature engineering step callable f. See documentation in EngineerStep.__init__() for more information/restrictions

property stage¶: Feature engineering stage during which the EngineerStep will be executed

static get_comparison_attrs(step_obj:Union[_ForwardRef('EngineerStep'), dict]) → dict¶

Build a dict of critical EngineerStep attributes

Parameters

step_obj: EngineerStep, dict: Object for which critical EngineerStep attributes should be collected

Returns

attr_vals: Dict: Critical EngineerStep attributes. If step_obj does not have a necessary attribute (for EngineerStep) or a necessary key (for dict), its value in attr_vals will be a placeholder object. This is to facilitate comparison, while also ensuring missing values will always be considered unequal to other values

Examples

>>> def dummy_f(train_inputs, non_train_inputs):
...     return train_inputs, non_train_inputs
>>> es_0 = EngineerStep(dummy_f)
>>> EngineerStep.get_comparison_attrs(es_0)  # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
{'name': 'dummy_f',
 'f': <function dummy_f at ...>,
 'params': ('train_inputs', 'non_train_inputs'),
 'stage': 'intra_cv',
 'do_validate': False}
>>> EngineerStep.get_comparison_attrs(
...     dict(foo="hello", f=dummy_f, params=["all_inputs", "all_targets"], stage="pre_cv")
... )  # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE
{'name': <object object at ...>,
 'f': <function dummy_f at ...>,
 'params': ('all_inputs', 'all_targets'),
 'stage': 'pre_cv',
 'do_validate': <object object at ...>}

stringify(self) → str¶

Make a stringified representation of self, compatible with EngineerStep.__eq__()

Returns

String: String describing all critical attributes of the EngineerStep instance. This value is not particularly human-friendly due to both its length and the fact that EngineerStep.f is represented by its hash

Examples

>>> def dummy_f(train_inputs, non_train_inputs):
...     return train_inputs, non_train_inputs
>>> EngineerStep(dummy_f).stringify()  # doctest: +ELLIPSIS
"EngineerStep(dummy_f, ..., ('train_inputs', 'non_train_inputs'), intra_cv, False)"
>>> EngineerStep(dummy_f, stage="pre_cv").stringify()  # doctest: +ELLIPSIS
"EngineerStep(dummy_f, ..., ('train_inputs', 'non_train_inputs'), pre_cv, False)"

classmethod honorary_step_from_dict(step_dict:dict, dimension:hyperparameter_hunter.space.dimensions.Categorical)¶

Get an EngineerStep from dimension that is equal to its dict form, step_dict

Parameters

step_dict: Dict

Dict of form saved in Experiment description files for EngineerStep. Expected to have following keys, with values of the given types:

“name”: String
“f”: String (SHA256 hash)
“params”: List[str], or Tuple[str, …]
“stage”: String in {“pre_cv”, “intra_cv”}
“do_validate”: Boolean

dimension: Categorical

Categorical instance expected to contain the EngineerStep equivalent of step_dict in its categories

Returns

EngineerStep: From dimension.categories if it is the EngineerStep equivalent of step_dict

Raises

ValueError: If dimension.categories does not contain an EngineerStep matching step_dict

class hyperparameter_hunter.BayesianOptimization(**kwargs)¶

Bases: hyperparameter_hunter.optimization.backends.skopt.protocols.BayesianOptPro

Deprecated since version 3.0.0a2: Will be removed in 3.2.0. Renamed to BayesianOptPro

Attributes

search_space_size: The number of different hyperparameter permutations possible given the current
source_script

Methods

`forge_experiment`(self, model_initializer[, …])	Define hyperparameter search scaffold for building Experiments during optimization
`get_ready`(self)	Prepare for optimization by finalizing hyperparameter space and identifying similar Experiments.
`go`(self[, force_ready])	Execute hyperparameter optimization, building an Experiment for each iteration
`set_dimensions`(self)	Locate given hyperparameters that are space choice declarations and add them to `dimensions`
`set_experiment_guidelines`(self, \*args, …)	Deprecated since version 3.0.0a2.

source_script = None¶

class hyperparameter_hunter.GradientBoostedRegressionTreeOptimization(**kwargs)¶

Bases: hyperparameter_hunter.optimization.backends.skopt.protocols.GradientBoostedRegressionTreeOptPro

Deprecated since version 3.0.0a2: Will be removed in 3.2.0. Renamed to GradientBoostedRegressionTreeOptPro

Attributes

search_space_size: The number of different hyperparameter permutations possible given the current
source_script

Methods

`forge_experiment`(self, model_initializer[, …])	Define hyperparameter search scaffold for building Experiments during optimization
`get_ready`(self)	Prepare for optimization by finalizing hyperparameter space and identifying similar Experiments.
`go`(self[, force_ready])	Execute hyperparameter optimization, building an Experiment for each iteration
`set_dimensions`(self)	Locate given hyperparameters that are space choice declarations and add them to `dimensions`
`set_experiment_guidelines`(self, \*args, …)	Deprecated since version 3.0.0a2.

source_script = None¶

class hyperparameter_hunter.RandomForestOptimization(**kwargs)¶

Bases: hyperparameter_hunter.optimization.backends.skopt.protocols.RandomForestOptPro

Deprecated since version 3.0.0a2: Will be removed in 3.2.0. Renamed to RandomForestOptPro

Attributes

search_space_size: The number of different hyperparameter permutations possible given the current
source_script

Methods

`forge_experiment`(self, model_initializer[, …])	Define hyperparameter search scaffold for building Experiments during optimization
`get_ready`(self)	Prepare for optimization by finalizing hyperparameter space and identifying similar Experiments.
`go`(self[, force_ready])	Execute hyperparameter optimization, building an Experiment for each iteration
`set_dimensions`(self)	Locate given hyperparameters that are space choice declarations and add them to `dimensions`
`set_experiment_guidelines`(self, \*args, …)	Deprecated since version 3.0.0a2.

source_script = None¶

class hyperparameter_hunter.ExtraTreesOptimization(**kwargs)¶

Bases: hyperparameter_hunter.optimization.backends.skopt.protocols.ExtraTreesOptPro

Deprecated since version 3.0.0a2: Will be removed in 3.2.0. Renamed to ExtraTreesOptPro

Attributes

search_space_size: The number of different hyperparameter permutations possible given the current
source_script

Methods

`forge_experiment`(self, model_initializer[, …])	Define hyperparameter search scaffold for building Experiments during optimization
`get_ready`(self)	Prepare for optimization by finalizing hyperparameter space and identifying similar Experiments.
`go`(self[, force_ready])	Execute hyperparameter optimization, building an Experiment for each iteration
`set_dimensions`(self)	Locate given hyperparameters that are space choice declarations and add them to `dimensions`
`set_experiment_guidelines`(self, \*args, …)	Deprecated since version 3.0.0a2.

source_script = None¶

class hyperparameter_hunter.DummySearch(**kwargs)¶

Bases: hyperparameter_hunter.optimization.backends.skopt.protocols.DummyOptPro

Deprecated since version 3.0.0a2: Will be removed in 3.2.0. Renamed to DummyOptPro

Attributes

search_space_size: The number of different hyperparameter permutations possible given the current
source_script

Methods

`forge_experiment`(self, model_initializer[, …])	Define hyperparameter search scaffold for building Experiments during optimization
`get_ready`(self)	Prepare for optimization by finalizing hyperparameter space and identifying similar Experiments.
`go`(self[, force_ready])	Execute hyperparameter optimization, building an Experiment for each iteration
`set_dimensions`(self)	Locate given hyperparameters that are space choice declarations and add them to `dimensions`
`set_experiment_guidelines`(self, \*args, …)	Deprecated since version 3.0.0a2.

source_script = None¶