hyperparameter_hunter.data package¶
Subpackages¶
Submodules¶
hyperparameter_hunter.data.data_core module¶
This module defines mechanisms for managing an experiment’s various datasets, and each datasets’s inputs, targets, and predictions.
Important Contents
In order to maintain the states of different datasets across all divisions of an experiment and
amid transformations that may be applied to the data via
feature_engineering, two main classes are defined herein:
-
Logical separations between “columns” of data for a given
BaseDatasetHeld and maintained by
BaseDatasetand its descendantsThree primary descendants of
BaseDataChunk:InputChunk: Maintains a dataset’s input data (and transformations)TargetChunk: Maintains a dataset’s target data (and transformations)PredictionChunk: Maintains a dataset’s predictions (and transformations)
Descendants of
BaseDataChunkshould implement the eight “on_<division>_<point>” callback methods defined byBaseCallbackBecause
BaseDataChunksubclasses are isolated from the experiment, these methods need not invoke their super methods, although they are allowed to if necessary
NullDataChunkdoes nothing but mimic the normalBaseDataChunkchild structureUsed for
BaseDatasetsubclasses lacking a particular data chunk, such as:TestDataset’s TargetChunk, because the targets for a test dataset are unknown, or
TrainDataset’s PredictionChunk, because predictions are not made on training data
-
# TODO: …
Dataset Attribute Syntax
The intricate subclass network bolstering the module’s predominant BaseDataset subclasses
may be intimidating at first, but don’t worry; there’s a shortcut. Follow these steps to ensure
proper syntax and a valid result when accessing data from a
CVExperiment:
{data_train, data_oof, data_holdout, data_test} - Dataset attribute
{input, target, prediction} - Data chunk
[T] - Optional transformation
{d, run, fold, rep, final} - Division, initial (d) or final data
By stacking three values (four if following optional step “3”) from the above formula, you can
access all of the interesting stuff stored in the datasets from the comfort of your experiment or
lambda_callback().
hyperparameter_hunter.data.datasets module¶
-
class
hyperparameter_hunter.data.datasets.TrainDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDatasetBase class for organizing entire datasets into three
BaseDataChunksubclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunksubclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunkis also an acceptable value for any of the chunk initializersMethods
on_exp_start(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
-
class
hyperparameter_hunter.data.datasets.OOFDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDatasetBase class for organizing entire datasets into three
BaseDataChunksubclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunksubclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunkis also an acceptable value for any of the chunk initializersMethods
on_exp_start(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
-
class
hyperparameter_hunter.data.datasets.HoldoutDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDatasetBase class for organizing entire datasets into three
BaseDataChunksubclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunksubclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunkis also an acceptable value for any of the chunk initializersMethods
on_exp_start(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
-
class
hyperparameter_hunter.data.datasets.TestDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDatasetBase class for organizing entire datasets into three
BaseDataChunksubclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunksubclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunkis also an acceptable value for any of the chunk initializersMethods
on_exp_start(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
Module contents¶
-
class
hyperparameter_hunter.data.TrainDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDatasetBase class for organizing entire datasets into three
BaseDataChunksubclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunksubclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunkis also an acceptable value for any of the chunk initializersMethods
on_exp_start(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
-
class
hyperparameter_hunter.data.OOFDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDatasetBase class for organizing entire datasets into three
BaseDataChunksubclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunksubclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunkis also an acceptable value for any of the chunk initializersMethods
on_exp_start(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
-
class
hyperparameter_hunter.data.HoldoutDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDatasetBase class for organizing entire datasets into three
BaseDataChunksubclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunksubclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunkis also an acceptable value for any of the chunk initializersMethods
on_exp_start(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
-
class
hyperparameter_hunter.data.TestDataset(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDatasetBase class for organizing entire datasets into three
BaseDataChunksubclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunksubclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunkis also an acceptable value for any of the chunk initializersMethods
on_exp_start(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start