hyperparameter_hunter.data package¶

Subpackages¶

hyperparameter_hunter.data.data_chunks package

Submodules¶

hyperparameter_hunter.data.data_core module¶

This module defines mechanisms for managing an experiment’s various datasets, and each datasets’s inputs, targets, and predictions.

Important Contents

In order to maintain the states of different datasets across all divisions of an experiment and amid transformations that may be applied to the data via feature_engineering, two main classes are defined herein:

BaseDataChunk:
- Logical separations between “columns” of data for a given BaseDataset
- Held and maintained by BaseDataset and its descendants
- Three primary descendants of BaseDataChunk:
  InputChunk: Maintains a dataset’s input data (and transformations)
  
  TargetChunk: Maintains a dataset’s target data (and transformations)
  
  PredictionChunk: Maintains a dataset’s predictions (and transformations)
- Descendants of BaseDataChunk should implement the eight “on_<division>_<point>” callback methods defined by BaseCallback
  Because BaseDataChunk subclasses are isolated from the experiment, these methods need not invoke their super methods, although they are allowed to if necessary
- NullDataChunk does nothing but mimic the normal BaseDataChunk child structure
  Used for BaseDataset subclasses lacking a particular data chunk, such as:
  
  TestDataset’s TargetChunk, because the targets for a test dataset are unknown, or
  
  TrainDataset’s PredictionChunk, because predictions are not made on training data
BaseDataset:

# TODO: …

Dataset Attribute Syntax

The intricate subclass network bolstering the module’s predominant BaseDataset subclasses may be intimidating at first, but don’t worry; there’s a shortcut. Follow these steps to ensure proper syntax and a valid result when accessing data from a CVExperiment:

{data_train, data_oof, data_holdout, data_test} - Dataset attribute
{input, target, prediction} - Data chunk
[T] - Optional transformation
{d, run, fold, rep, final} - Division, initial (d) or final data

By stacking three values (four if following optional step “3”) from the above formula, you can access all of the interesting stuff stored in the datasets from the comfort of your experiment or lambda_callback().

hyperparameter_hunter.data.datasets module¶

class hyperparameter_hunter.data.datasets.TrainDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters

data: pd.DataFrame, or None, default=None: Initial whole dataset, comprising both input and target data
feature_selector: List, or None, default=None: Column names to include as input data for the dataset
target_column: List, or None, default=None: Column name(s) in the dataset that contain the target output data
require_data: Boolean, default=False: If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start

class hyperparameter_hunter.data.datasets.OOFDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters

data: pd.DataFrame, or None, default=None: Initial whole dataset, comprising both input and target data
feature_selector: List, or None, default=None: Column names to include as input data for the dataset
target_column: List, or None, default=None: Column name(s) in the dataset that contain the target output data
require_data: Boolean, default=False: If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start

class hyperparameter_hunter.data.datasets.HoldoutDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters

data: pd.DataFrame, or None, default=None: Initial whole dataset, comprising both input and target data
feature_selector: List, or None, default=None: Column names to include as input data for the dataset
target_column: List, or None, default=None: Column name(s) in the dataset that contain the target output data
require_data: Boolean, default=False: If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start

class hyperparameter_hunter.data.datasets.TestDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters

data: pd.DataFrame, or None, default=None: Initial whole dataset, comprising both input and target data
feature_selector: List, or None, default=None: Column names to include as input data for the dataset
target_column: List, or None, default=None: Column name(s) in the dataset that contain the target output data
require_data: Boolean, default=False: If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start

Module contents¶

class hyperparameter_hunter.data.TrainDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters

data: pd.DataFrame, or None, default=None: Initial whole dataset, comprising both input and target data
feature_selector: List, or None, default=None: Column names to include as input data for the dataset
target_column: List, or None, default=None: Column name(s) in the dataset that contain the target output data
require_data: Boolean, default=False: If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start

class hyperparameter_hunter.data.OOFDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters

data: pd.DataFrame, or None, default=None: Initial whole dataset, comprising both input and target data
feature_selector: List, or None, default=None: Column names to include as input data for the dataset
target_column: List, or None, default=None: Column name(s) in the dataset that contain the target output data
require_data: Boolean, default=False: If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start

class hyperparameter_hunter.data.HoldoutDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters

data: pd.DataFrame, or None, default=None: Initial whole dataset, comprising both input and target data
feature_selector: List, or None, default=None: Column names to include as input data for the dataset
target_column: List, or None, default=None: Column name(s) in the dataset that contain the target output data
require_data: Boolean, default=False: If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start

class hyperparameter_hunter.data.TestDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶

Bases: hyperparameter_hunter.data.data_core.BaseDataset

Base class for organizing entire datasets into three BaseDataChunk subclasses

Parameters

data: pd.DataFrame, or None, default=None: Initial whole dataset, comprising both input and target data
feature_selector: List, or None, default=None: Column names to include as input data for the dataset
target_column: List, or None, default=None: Column name(s) in the dataset that contain the target output data
require_data: Boolean, default=False: If True, data must be provided as a pandas DataFrame

Notes

Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a BaseDataChunk subclass in order to establish callback method behavior for the data chunk attributes. Note that NullDataChunk is also an acceptable value for any of the chunk initializers

Methods

on_exp_start(self, \*args, \*\*kwargs)

on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start

hyperparameter_hunter.data package¶

Subpackages¶

Submodules¶

hyperparameter_hunter.data.data_core module¶

Related¶

hyperparameter_hunter.data.datasets module¶

Module contents¶