hyperparameter_hunter.data package¶
Subpackages¶
Submodules¶
hyperparameter_hunter.data.data_core module¶
This module defines mechanisms for managing an experiment’s various datasets, and each datasets’s inputs, targets, and predictions.
Important Contents
In order to maintain the states of different datasets across all divisions of an experiment and
amid transformations that may be applied to the data via
feature_engineering
, two main classes are defined herein:
-
Logical separations between “columns” of data for a given
BaseDataset
Held and maintained by
BaseDataset
and its descendantsThree primary descendants of
BaseDataChunk
:InputChunk
: Maintains a dataset’s input data (and transformations)TargetChunk
: Maintains a dataset’s target data (and transformations)PredictionChunk
: Maintains a dataset’s predictions (and transformations)
Descendants of
BaseDataChunk
should implement the eight “on_<division>_<point>” callback methods defined byBaseCallback
Because
BaseDataChunk
subclasses are isolated from the experiment, these methods need not invoke their super methods, although they are allowed to if necessary
NullDataChunk
does nothing but mimic the normalBaseDataChunk
child structureUsed for
BaseDataset
subclasses lacking a particular data chunk, such as:TestDataset’s TargetChunk, because the targets for a test dataset are unknown, or
TrainDataset’s PredictionChunk, because predictions are not made on training data
-
# TODO: …
Dataset Attribute Syntax
The intricate subclass network bolstering the module’s predominant BaseDataset
subclasses
may be intimidating at first, but don’t worry; there’s a shortcut. Follow these steps to ensure
proper syntax and a valid result when accessing data from a
CVExperiment
:
{data_train, data_oof, data_holdout, data_test} - Dataset attribute
{input, target, prediction} - Data chunk
[T] - Optional transformation
{d, run, fold, rep, final} - Division, initial (d) or final data
By stacking three values (four if following optional step “3”) from the above formula, you can
access all of the interesting stuff stored in the datasets from the comfort of your experiment or
lambda_callback()
.
hyperparameter_hunter.data.datasets module¶
-
class
hyperparameter_hunter.data.datasets.
TrainDataset
(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDataset
Base class for organizing entire datasets into three
BaseDataChunk
subclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunk
subclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunk
is also an acceptable value for any of the chunk initializersMethods
on_exp_start
(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
-
class
hyperparameter_hunter.data.datasets.
OOFDataset
(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDataset
Base class for organizing entire datasets into three
BaseDataChunk
subclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunk
subclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunk
is also an acceptable value for any of the chunk initializersMethods
on_exp_start
(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
-
class
hyperparameter_hunter.data.datasets.
HoldoutDataset
(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDataset
Base class for organizing entire datasets into three
BaseDataChunk
subclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunk
subclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunk
is also an acceptable value for any of the chunk initializersMethods
on_exp_start
(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
-
class
hyperparameter_hunter.data.datasets.
TestDataset
(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDataset
Base class for organizing entire datasets into three
BaseDataChunk
subclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunk
subclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunk
is also an acceptable value for any of the chunk initializersMethods
on_exp_start
(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
Module contents¶
-
class
hyperparameter_hunter.data.
TrainDataset
(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDataset
Base class for organizing entire datasets into three
BaseDataChunk
subclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunk
subclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunk
is also an acceptable value for any of the chunk initializersMethods
on_exp_start
(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
-
class
hyperparameter_hunter.data.
OOFDataset
(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDataset
Base class for organizing entire datasets into three
BaseDataChunk
subclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunk
subclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunk
is also an acceptable value for any of the chunk initializersMethods
on_exp_start
(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
-
class
hyperparameter_hunter.data.
HoldoutDataset
(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDataset
Base class for organizing entire datasets into three
BaseDataChunk
subclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunk
subclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunk
is also an acceptable value for any of the chunk initializersMethods
on_exp_start
(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start
-
class
hyperparameter_hunter.data.
TestDataset
(data: Optional[pandas.core.frame.DataFrame] = None, feature_selector: List[str] = None, target_column: List[str] = None, require_data: bool = False)¶ Bases:
hyperparameter_hunter.data.data_core.BaseDataset
Base class for organizing entire datasets into three
BaseDataChunk
subclasses- Parameters
- data: pd.DataFrame, or None, default=None
Initial whole dataset, comprising both input and target data
- feature_selector: List, or None, default=None
Column names to include as input data for the dataset
- target_column: List, or None, default=None
Column name(s) in the dataset that contain the target output data
- require_data: Boolean, default=False
If True, data must be provided as a pandas DataFrame
Notes
Subclasses of BaseDataset should override the three chunk initializer attributes (_input_type, _target_type, _prediction_type) to a
BaseDataChunk
subclass in order to establish callback method behavior for the data chunk attributes. Note thatNullDataChunk
is also an acceptable value for any of the chunk initializersMethods
on_exp_start
(self, \*args, \*\*kwargs)on_exp_end
on_fold_end
on_fold_start
on_rep_end
on_rep_start
on_run_end
on_run_start