hyperparameter_hunter.data package

Submodules

hyperparameter_hunter.data.data_core module

This module defines mechanisms for managing an experiment’s various datasets, and each datasets’s inputs, targets, and predictions.

Important Contents

In order to maintain the states of different datasets across all divisions of an experiment and amid transformations that may be applied to the data via feature_engineering, two main classes are defined herein:

  1. BaseDataChunk:

    • Logical separations between “columns” of data for a given BaseDataset

    • Held and maintained by BaseDataset and its descendants

    • Three primary descendants of BaseDataChunk:

      1. InputChunk: Maintains a dataset’s input data (and transformations)

      2. TargetChunk: Maintains a dataset’s target data (and transformations)

      3. PredictionChunk: Maintains a dataset’s predictions (and transformations)

    • Descendants of BaseDataChunk should implement the eight “on_<division>_<point>” callback methods defined by BaseCallback

      • Because BaseDataChunk subclasses are isolated from the experiment, these methods need not invoke their super methods, although they are allowed to if necessary

    • NullDataChunk does nothing but mimic the normal BaseDataChunk child structure

      • Used for BaseDataset subclasses lacking a particular data chunk, such as:

        1. TestDataset’s TargetChunk, because the targets for a test dataset are unknown, or

        2. TrainDataset’s PredictionChunk, because predictions are not made on training data

  2. BaseDataset:

    # TODO: …

Dataset Attribute Syntax

The intricate subclass network bolstering the module’s predominant BaseDataset subclasses may be intimidating at first, but don’t worry; there’s a shortcut. Follow these steps to ensure proper syntax and a valid result when accessing data from a CVExperiment:

  1. {data_train, data_oof, data_holdout, data_test} - Dataset attribute

  2. {input, target, prediction} - Data chunk

  3. [T] - Optional transformation

  4. {d, run, fold, rep, final} - Division, initial (d) or final data

By stacking three values (four if following optional step “3”) from the above formula, you can access all of the interesting stuff stored in the datasets from the comfort of your experiment or lambda_callback().

hyperparameter_hunter.data.datasets module

class hyperparameter_hunter.data.datasets.TrainDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters
data: pd.DataFrame, or None, default=None

Initial whole dataset, comprising both input and target data

feature_selector: List, or None, default=None

Column names to include as input data for the dataset

target_column: List, or None, default=None

Column name(s) in the dataset that contain the target output data

require_data: Boolean, default=False

If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end

on_fold_end

on_fold_start

on_rep_end

on_rep_start

on_run_end

on_run_start

class hyperparameter_hunter.data.datasets.OOFDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters
data: pd.DataFrame, or None, default=None

Initial whole dataset, comprising both input and target data

feature_selector: List, or None, default=None

Column names to include as input data for the dataset

target_column: List, or None, default=None

Column name(s) in the dataset that contain the target output data

require_data: Boolean, default=False

If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end

on_fold_end

on_fold_start

on_rep_end

on_rep_start

on_run_end

on_run_start

class hyperparameter_hunter.data.datasets.HoldoutDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters
data: pd.DataFrame, or None, default=None

Initial whole dataset, comprising both input and target data

feature_selector: List, or None, default=None

Column names to include as input data for the dataset

target_column: List, or None, default=None

Column name(s) in the dataset that contain the target output data

require_data: Boolean, default=False

If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end

on_fold_end

on_fold_start

on_rep_end

on_rep_start

on_run_end

on_run_start

class hyperparameter_hunter.data.datasets.TestDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters
data: pd.DataFrame, or None, default=None

Initial whole dataset, comprising both input and target data

feature_selector: List, or None, default=None

Column names to include as input data for the dataset

target_column: List, or None, default=None

Column name(s) in the dataset that contain the target output data

require_data: Boolean, default=False

If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end

on_fold_end

on_fold_start

on_rep_end

on_rep_start

on_run_end

on_run_start

Module contents

class hyperparameter_hunter.data.TrainDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters
data: pd.DataFrame, or None, default=None

Initial whole dataset, comprising both input and target data

feature_selector: List, or None, default=None

Column names to include as input data for the dataset

target_column: List, or None, default=None

Column name(s) in the dataset that contain the target output data

require_data: Boolean, default=False

If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end

on_fold_end

on_fold_start

on_rep_end

on_rep_start

on_run_end

on_run_start

class hyperparameter_hunter.data.OOFDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters
data: pd.DataFrame, or None, default=None

Initial whole dataset, comprising both input and target data

feature_selector: List, or None, default=None

Column names to include as input data for the dataset

target_column: List, or None, default=None

Column name(s) in the dataset that contain the target output data

require_data: Boolean, default=False

If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end

on_fold_end

on_fold_start

on_rep_end

on_rep_start

on_run_end

on_run_start

class hyperparameter_hunter.data.HoldoutDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters
data: pd.DataFrame, or None, default=None

Initial whole dataset, comprising both input and target data

feature_selector: List, or None, default=None

Column names to include as input data for the dataset

target_column: List, or None, default=None

Column name(s) in the dataset that contain the target output data

require_data: Boolean, default=False

If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end

on_fold_end

on_fold_start

on_rep_end

on_rep_start

on_run_end

on_run_start

class hyperparameter_hunter.data.TestDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters
data: pd.DataFrame, or None, default=None

Initial whole dataset, comprising both input and target data

feature_selector: List, or None, default=None

Column names to include as input data for the dataset

target_column: List, or None, default=None

Column name(s) in the dataset that contain the target output data

require_data: Boolean, default=False

If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end

on_fold_end

on_fold_start

on_rep_end

on_rep_start

on_run_end

on_run_start