ML-ENSEMBLE
author: | Sebastian Flennerhag |
---|---|
copyright: | 2017-2018 |
license: | MIT |
Computational graph module for memory-neutral parallel processing of deep general-purpose ensembles.
Implements backend graph managers, base classes for interacting with graph managers, and job managers for preprocessing pipelines and estimators, as well as handles for multiple instances and wrappers for standard parallel job calls.
mlens.parallel¶
Graph Nodes¶
Layer¶
-
class
mlens.parallel.layer.
Layer
(name=None, propagate_features=None, shuffle=False, random_state=None, verbose=False, stack=None, **kwargs)[source]¶ Bases:
mlens.parallel.base.OutputMixin
,mlens.parallel.base.IndexMixin
,mlens.parallel.base.BaseStacker
Layer of preprocessing pipes and estimators.
Layer is an internal class that holds a layer and its associated data including an estimation procedure. It behaves as an estimator from an Scikit-learn API point of view.
Parameters: - propagate_features (list, range, optional) – Features to propagate from the input array to the output array. Carries input features to the output of the layer, useful for propagating original data through several stacked layers. Propagated features are stored in the left-most columns.
- verbose (int or bool (default = False)) –
level of verbosity.
verbose = 0
silent (same asverbose = False
)verbose = 1
messages at start and finish (same asverbose = True
)verbose = 2
messages for preprocessing and estimatorsverbose = 3
messages for completed job
If
verbose >= 10
prints tosys.stderr
, elsesys.stdout
. - shuffle (bool (default = False)) – Whether to shuffle data before fitting layer.
- random_state (obj, int, optional) – Random seed number to use for shuffling inputs
- **kwargs (optional) – optional arguments to
BaseParallel
.
-
data
¶ Cross validated scores
-
indexers
¶ Check indexer
-
learners
¶ Generator for learners in layer
-
raw_data
¶ Cross validated scores
-
set_output_columns
(X, y, job, n_left_concats=0)[source]¶ Compatibility method for setting learner output columns
-
transformers
¶ Generator for learners in layer
Learner¶
-
class
mlens.parallel.learner.
Learner
(estimator, indexer=None, name=None, preprocess=None, attr=None, scorer=None, proba=False, **kwargs)[source]¶ Bases:
mlens.parallel.base.ProbaMixin
,mlens.parallel.learner.BaseNode
Wrapper for base learners.
Parameters: - estimator (obj) – estimator to construct learner from
- preprocess (str, obj) – preprocess transformer. Pass either the string
cache reference or the transformer instance. If the latter,
the
preprocess
will refer to the transformer name. - name (str) – name of learner. If
preprocess
is notNone
, the name will be prepended topreprocess__name
. - attr (str (default='predict')) – predict attribute, typically one of ‘predict’ and ‘predict_proba’
- scorer (func) – function to use for scoring predictions during cross-validated fitting.
- output_columns (dict, optional) – mapping of prediction feature columns from learner to columns in
output array. Normally, this map is
{0: x}
, but if theindexer
creates partitions, each partition needs to be mapped:{0: x, 1: x + 1}
. Note that ifoutput_columns
are not given at initialization, theset_output_columns
method must be called before running estimations. - verbose (bool, int (default = False)) – whether to report completed fits.
- **kwargs (bool (default=True)) – Optional ParallelProcessing arguments. See
BaseParallel
.
-
scorer
¶ Copy of scorer
Transformer¶
-
class
mlens.parallel.learner.
Transformer
(estimator, indexer=None, name=None, **kwargs)[source]¶ Bases:
mlens.parallel.learner.BaseNode
Preprocessing handler.
Wrapper for transformation pipeline.
Parameters: - indexer (obj, None) – indexer to use for generating fits.
Set to
None
to fit only on all data. - estimator (obj) – transformation pipeline to construct learner from
- name (str) – name of learner. If
preprocess
is notNone
, the name will be prepended topreprocess__name
. - output_columns (dict, optional) – If transformer is to be used to output data, need to
set
output_columns
. Normally, this map is{0: x}
, but if theindexer
creates partitions, each partition needs to be mapped:{0: x, 1: x + 1}
. - verbose (bool, int (default = False)) – whether to report completed fits.
- raise_on_exception (bool (default=True)) – whether to warn on non-fatal exceptions or raise an error.
- indexer (obj, None) – indexer to use for generating fits.
Set to
EvalLearner¶
-
class
mlens.parallel.learner.
EvalLearner
(estimator, preprocess, name, attr, scorer, error_score=None, verbose=False, **kwargs)[source]¶ Bases:
mlens.parallel.learner.Learner
EvalLearner is a derived class from Learner used for cross-validated scoring of an estimator.
Parameters: - estimator (obj) – estimator to construct learner from
- preprocess (str) – preprocess cache refernce
- indexer (obj, None) – indexer to use for generating fits.
Set to
None
to fit only on all data. - name (str) – name of learner. If
preprocess
is notNone
, the name will be prepended topreprocess__name
. - attr (str (default='predict')) – predict attribute, typically one of ‘predict’ and ‘predict_proba’
- scorer (func) – function to use for scoring predictions during cross-validated fitting.
- error_score (int, float, None (default = None)) – score to set if cross-validation fails. Set to
None
to raise error. - verbose (bool, int (default = False)) – whether to report completed fits.
- raise_on_exception (bool (default=True)) – whether to warn on non-fatal exceptions or raise an error.
EvalTransformer¶
-
class
mlens.parallel.learner.
EvalTransformer
(estimator, indexer=None, name=None, **kwargs)[source]¶ Bases:
mlens.parallel.learner.Transformer
Evaluator version of the Transformer.
Derived class from Transformer adapted to cross-validated grid-search. See
Transformer
for more details.
BaseNode¶
-
class
mlens.parallel.learner.
BaseNode
(name, estimator, indexer=None, verbose=False, **kwargs)[source]¶ Bases:
mlens.parallel.base.OutputMixin
,mlens.parallel.base.IndexMixin
,mlens.parallel.base.BaseEstimator
Base computational node inherited by job generators.
Common API for job generators. A class that inherits the base need to set a
__subtype__
in the constructor. The sub-type should be the class that runs estimations and must implement a__call__
,fit
,transform
andpredict
method.-
cloned_estimator
¶ Copy of estimator
-
collect
(path=None)[source]¶ Load fitted estimator from cache
Parameters: path (str, list, optional) – path to cache.
-
data
¶ Dictionary with aggregated data from fitting sub-learners.
-
gen_fit
(X, y, P=None)[source]¶ Routine for generating fit jobs conditional on refit
Parameters: - X (array-like of shape [n_samples, n_features]) – input array
- y (array-like of shape [n_samples,]) – targets
- P (array-like of shape [n_samples, n_prediction_features], optional) – output array to populate. Must be writeable. Only pass if predictions are desired.
-
gen_predict
(X, P=None)[source]¶ Generate predicting jobs
Parameters: - X (array-like of shape [n_samples, n_features]) – input array
- y (array-like of shape [n_samples,]) – targets
- P (array-like of shape [n_samples, n_prediction_features], optional) – output array to populate. Must be writeable. Only pass if predictions are desired.
-
gen_transform
(X, P=None)[source]¶ Generate cross-validated predict jobs
Parameters: - X (array-like of shape [n_samples, n_features]) – input array
- y (array-like of shape [n_samples,]) – targets
- P (array-like of shape [n_samples, n_prediction_features], optional) – output array to populate. Must be writeable. Only pass if predictions are desired.
-
learner
¶ Generator for learner fitted on full data
-
raw_data
¶ List of data collected from each sub-learner during fitting.
-
set_indexer
(indexer)[source]¶ Set indexer and auxiliary attributes
Parameters: indexer (obj) – indexer to build instance with.
-
set_output_columns
(X=None, y=None, job=None, n_left_concats=0)[source]¶ Set the output_columns attribute
-
sublearners
¶ Generator for learner fitted on folds
-
times
¶ Fit and predict times for the final learners
-
SubLearner¶
SubTransformer¶
EvalSubTransformer¶
IndexedEstimator¶
Handles¶
Group¶
-
class
mlens.parallel.handles.
Group
(indexer=None, learners=None, transformers=None, name=None, **kwargs)[source]¶ Bases:
mlens.parallel.base.BaseEstimator
A handle for learners and transformers that share a common indexer.
Lightweight class for pairing a set of independent learners with a set of transformers that all share the same cross-validation strategy. A
Group
instance is an acceptable caller toParallelProcessing
.New in version 0.2.0.
Note
All instances will share the same indexer. If instances have a different indexer, that indexer will be replaced.
Parameters: - indexer (inst, optional) – A
index
indexer to build learner and transformers on. If not passed, the first indexer of the learners will be enforced on all instances. - learners (list, inst, optional) –
Learner
instance(s) attached to indexer. Note thatGroup
overrides previousindexer
parameter settings. - transformers (list, inst, optional) –
Transformer
instance(s) attached to indexer. Note thatGroup
overrides previousindexer
parameter settings. - name (str, optional) – name of group
- **kwargs (optional) – Optional keyword arguments to the
BaseParallel
backend.
- indexer (inst, optional) – A
Pipeline¶
-
class
mlens.parallel.handles.
Pipeline
(pipeline, name=None, return_y=False)[source]¶ Bases:
mlens.externals.sklearn.base.BaseEstimator
Transformer pipeline
Pipeline class for wrapping a preprocessing pipeline of transformers.
Parameters: - pipeline (list, instance) –
A
Transformer
instance or a list ofTransformer
instances. Accepted input formats:option_1 = transformer_1 option_2 = [transformer_1, transformer_2] option_3 = [("tr-1", transformer_1), ("tr-2", transformer_2)] option_4 = [transformer_1, ("tr-2", transformer_2)]
- name (str, optional) – name of pipeline.
- return_y (bool, default = False) – If True, both X and y will be returned in a
transform()
call.
-
fit
(X, y=None)[source]¶ Fit pipeline.
Note that the
Pipeline
accepts both X and y arguments, and can return both X and y, depending on the transformers. The pipeline itself does no checks on the input.Parameters: - X (array-like of shape [n_samples, n_features]) – Input data
- y (array-like of shape [n_samples, ]) – Targets
Returns: self – Fitted pipeline
Return type: instance
-
fit_transform
(X, y=None)[source]¶ Fit and transform pipeline.
Note that the
Pipeline
accepts both X and y arguments, and can return both X and y, depending on the transformers. The pipeline itself does no checks on the input.Parameters: - X (array-like of shape [n_samples, n_features]) – Input data
- y (array-like of shape [n_samples, ]) – Targets
Returns: - X_processed (array-like of shape [n_samples, n_preprocessed_features]) – Preprocessed input data
- y (array-like of shape [n_samples, ], optional) – Preprocessed targets
-
get_params
(deep=True)[source]¶ Get parameters for this estimator. :param deep: If True, will return the parameters for this estimator and
contained subobjects that are estimators.Returns: params – Parameter names mapped to their values. Return type: mapping of string to any
-
transform
(X, y=None)[source]¶ Transform pipeline.
Note that the
Pipeline
accepts both X and y arguments, and can return both X and y, depending on the transformers. Pipeline itself does not checks the input.Parameters: - X (array-like of shape [n_samples, n_features]) – Input data
- y (array-like of shape [n_samples, ]) – Targets
Returns: - X_processed (array-like of shape [n_samples, n_preprocessed_features]) – Preprocessed input data
- y (array-like of shape [n_samples, ], optional) – Original or preprocessed targets, depending on the transformers.
- pipeline (list, instance) –
make_group¶
-
mlens.parallel.handles.
make_group
(indexer, estimators, preprocessing, learner_kwargs=None, transformer_kwargs=None, name=None)[source]¶ Creating a
Group
from a set learners and transformersUtility function for creating mapping a set of estimators and preprocessing pipelines to a
Group
ofLearner
andTransformer
instances.Parameters: - indexer (instance or None, default = None) – Indexer instance to use. See
index
for details. - estimators (dict of lists or list of estimators.) –
If
preprocessing
isNone
orlist
,estimators
should be alist
. The list can either contain estimator instances, named tuples of estimator instances, or a combination of both.option_1 = [estimator_1, estimator_2] option_2 = [("est-1", estimator_1), ("est-2", estimator_2)] option_3 = [estimator_1, ("est-2", estimator_2)]
If different preprocessing pipelines are desired, a dictionary that maps estimators to preprocessing pipelines must be passed. The names of the estimator dictionary must correspond to the names of the estimator dictionary.
preprocessing_cases = {"case-1": [trans_1, trans_2]. "case-2": [alt_trans_1, alt_trans_2]} estimators = {"case-1": [est_a, est_b]. "case-2": [est_c, est_d]}
The lists for each dictionary entry can be any of
option_1
,option_2
andoption_3
. - preprocessing (dict of lists or list, optional, default = None) –
preprocessing pipelines for given layer. If the same preprocessing applies to all estimators,
preprocessing
should be a list of transformer instances. The list can contain the instances directly, named tuples of transformers, or a combination of both.option_1 = [transformer_1, transformer_2] option_2 = [("trans-1", transformer_1), ("trans-2", transformer_2)] option_3 = [transformer_1, ("trans-2", transformer_2)]
If different preprocessing pipelines are desired, a dictionary that maps preprocessing pipelines must be passed. The names of the preprocessing dictionary must correspond to the names of the estimator dictionary.
preprocessing_cases = {"case-1": [trans_1, trans_2]. "case-2": [alt_trans_1, alt_trans_2]} estimators = {"case-1": [est_a, est_b]. "case-2": [est_c, est_d]}
The lists for each dictionary entry can be any of
option_1
,option_2
andoption_3
. - transformer_kwargs (dict, optional) – Keyword arguments to pass to the
Transformer
instances. - learner_kwargs (dict, optional) – Keyword arguments to pass to the
Learner
instances. - name (str, optional) – Name of group. Should be unique.
- indexer (instance or None, default = None) – Indexer instance to use. See
Wrappers¶
EstimatorMixin¶
-
class
mlens.parallel.wrapper.
EstimatorMixin
[source]¶ Bases:
object
Estimator mixin
Mixin class to build an estimator from a
mlens.parallel
backend class. The backend class should be set as the_backend
attribute of the estimator during afit
call via a_build
method. E.g:Foo(EstimatorMixin, Learner): def __init__(self, ...): self._backend = None def _build(self): self._backend = Learner(...)
It is recommended to combine
EstimatorMixin
with parallel.base.ParamMixin.-
fit
(X, y, proba=False, refit=True)[source]¶ Fit
Fit estimator.
Parameters: - X (array of size [n_samples, n_features]) – input data
- y (array of size [n_features,]) – targets
- proba (bool, optional) – whether to fit for later predict_proba calls. Will register number of classes to expect in later predict and transform calls.
- refit (bool (default = True)) – Whether to refit already fitted sub-learners.
Returns: self – fitted estimator.
Return type: instance
-
fit_transform
(X, y, proba=False, refit=True)[source]¶ Fit
Fit estimator and return cross-validated predictions.
Parameters: - X (array of size [n_samples, n_features]) – input data
- y (array of size [n_features,]) – targets
- proba (bool, optional) – whether to fit for later predict_proba calls. Will register number of classes to expect in later predict and transform calls.
- refit (bool (default = True)) – Whether to refit already fitted sub-learners.
Returns: P – prediction generated by cross-validation.
Return type: array of size [n_samples, n_prediction_features]
-
predict
(X, proba=False)[source]¶ Predict
Predict using full-fold estimator (fitted on all data).
Parameters: - X (array of size [n_samples, n_features]) – input data
- proba (bool, optional) – whether to predict class probabilities
Returns: P – prediction with full-fold estimator.
Return type: array of size [n_samples, n_prediction_features]
-
transform
(X, proba=False)[source]¶ Transform
Use cross-validated estimators to generate predictions.
Parameters: - X (array of size [n_samples, n_features]) – input data
- proba (bool, optional) – whether to predict class probabilities
Returns: P – prediction generated by cross-validation.
Return type: array of size [n_samples, n_prediction_features]
-
run¶
-
mlens.parallel.wrapper.
run
(caller, job, X, y=None, map=True, **kwargs)[source]¶ Utility for running a ParallelProcessing job on a set of callers.
Run is a utility mapping for setting up a ParallelProcessing job and executing across a set of callers. By default run executes:
out = mgr.map(caller, job, X, y, **kwargs)
run()
handles temporary parameter changes, for instance running a learner withproba=True
that hasproba=False
as default. Similarly, instances destined to not produce output can be forced to yield predictions by passingreturn_preds=True
as a keyword argument.Note
To run a learner with a
preprocessing
dependency, the instances need to be wrapped in aGroup
run(Group(learner, transformer), 'predict', X, y)
Parameters: - caller (instance, list) – A runnable instance, or a list of instances.
- job (str) – type of job to run. One of
'fit'
,'transform'
,'predict'
. - X (array-like) – input
- y (array-like, optional) – targets
- map (bool (default=True)) – whether to run a
ParallelProcessing.map()
job. IfFalse
, will instead run aParallelProcessing.stack()
job. - **kwargs (optional) – Keyword arguments.
run()
searches forproba
andreturn_preds
to temporarily update callers to run desired job and return desired output. Otherkwargs
are passed to eithermap
orstack
.
Backend¶
BaseProcessor¶
-
class
mlens.parallel.backend.
BaseProcessor
(backend=None, n_jobs=None, verbose=None)[source]¶ Bases:
object
Parallel processing base class.
Base class for parallel processing engines.
Parameters: -
initialize
(job, X, y, path, warm_start=False, return_preds=False, **kwargs)[source]¶ Initialize processing engine.
Set up the job parameters before an estimation call. Calling
clear()
undoes initialization.Parameters: - job (str) – type of job to complete with each task. One of
'fit'
,'predict'
and'transform'
. - X (array-like of shape [n_samples, n_features]) – Input data
- y (array-like of shape [n_samples,], optional.) – targets. Required for fit, should not be passed to predict or transform jobs.
- path (str or dict, optional) – Custom estimation cache. Pass a string to force use of persistent
cache on disk. Pass a
dict
for in-memory cache (requiresbackend != 'multiprocessing'
. - return_preds (bool or list, optional) – whether to return prediction ouput. If
True
, final prediction is returned. Alternatively, pass a list of task names for which output should be returned. - warm_start (bool, optional) – whether to re-use previous input data initialization. Useful if repeated jobs are made on the same input arrays.
- **kwargs (optional) – optional keyword arguments to pass onto the task’s call method.
Returns: out – An output parameter dictionary to pass to pass to an estimation method. Either
None
(no output), or{'final':True}
for only final prediction, or{'final': False, 'return_names': return_preds}
if a list of task-specific output was passed.Return type: - job (str) – type of job to complete with each task. One of
-
ParallelProcessing¶
-
class
mlens.parallel.backend.
ParallelProcessing
(*args, **kwargs)[source]¶ Bases:
mlens.parallel.backend.BaseProcessor
Parallel processing engine.
Engine for running computational graph.
ParallelProcessing
is a manager for executing a sequence of tasks in a given caller, where each task is run sequentially, but assumed to be parallelized internally. The main responsibility ofParallelProcessing
is to handle memory-mapping, estimation cache updates, input and output array updates and output collection.Parameters: - caller (obj) – the caller of the job. Either a Layer or a meta layer class such as Sequential.
- *args (optional) – Optional arguments to
BaseProcessor
- **kwargs (optional) – Optional keyword arguments to
BaseProcessor
.
-
get_preds
(dtype=None, order='C')[source]¶ Return prediction matrix.
Parameters: - dtype (numpy dtype object, optional) – data type to return
- order (str (default = 'C')) – data order. See
numpy.asarray
for details.
Returns: P – Prediction array
Return type: array-like
-
map
(caller, job, X, y=None, path=None, return_preds=False, wart_start=False, split=False, **kwargs)[source]¶ Parallel task mapping.
Run independent tasks in caller in parallel.
Warning
By default, the :~mlens.parallel.backend.ParallelProcessing.map` runs on a shallow cache, where all tasks share the same cache. As such, the user must ensure that each task has a unique name, or cache retrieval will be corrupted. To commit a seperate sub-cache to each task, set
split=True
.Parameters: - caller (iterable) – Iterable that generates accepted task instances. Caller should be
a child of the
BaseBackend
class, and tasks need to implement an appropriate call method. - job (str) – type of job to complete with each task. One of
'fit'
,'predict'
and'transform'
. - X (array-like of shape [n_samples, n_features]) – Input data
- y (array-like of shape [n_samples,], optional.) – targets. Required for fit, should not be passed to predict or transform jobs.
- path (str or dict, optional) – Custom estimation cache. Pass a string to force use of persistent
cache on disk. Pass a
dict
for in-memory cache (requiresbackend != 'multiprocessing'
. - return_preds (bool or list, optional) – whether to return prediction ouput. If
True
, final prediction is returned. Alternatively, pass a list of task names for which output should be returned. - warm_start (bool, optional) – whether to re-use previous input data initialization. Useful if repeated jobs are made on the same input arrays.
- split (bool, default = False) – whether to commit a separate sub-cache to each task.
- **kwargs (optional) – optional keyword arguments to pass onto each task.
Returns: out – Prediction array(s).
Return type: array-like, list, optional
- caller (iterable) – Iterable that generates accepted task instances. Caller should be
a child of the
-
process
(caller, out, **kwargs)[source]¶ Process job.
Main method for processing a caller. Requires the instance to be setup by a prior call to
initialize()
.Parameters: - caller (iterable) – Iterable that generates accepted task instances. Caller should be
a child of the
BaseBackend
class, and tasks need to implement an appropriate call method. - out (dict) – A dictionary with output parameters. Pass an empty dict for no
output. See
initialize()
for more details.
Returns: out – Prediction array(s).
Return type: array-like, list, optional
- caller (iterable) – Iterable that generates accepted task instances. Caller should be
a child of the
-
stack
(caller, job, X, y=None, path=None, return_preds=False, warm_start=False, split=True, **kwargs)[source]¶ Stacked parallel task mapping.
Run stacked tasks in caller in parallel.
This method runs a stack of tasks as a stack, where the output of each task is the input to the next.
Warning
By default, the
stack()
method runs on a deep cache, where each tasks has a separate cache. As such, the user must ensure that tasks don’t depend on data cached by previous tasks. To run all tasks in a single sub-cache, setsplit=False
.Parameters: - caller (iterable) – Iterable that generates accepted task instances. Caller should be
a child of the
BaseBackend
class, and tasks need to implement an appropriate call method. - job (str) – type of job to complete with each task. One of
'fit'
,'predict'
and'transform'
. - X (array-like of shape [n_samples, n_features]) – Input data
- y (array-like of shape [n_samples,], optional.) – targets. Required for fit, should not be passed to predict or transform jobs.
- path (str or dict, optional) – Custom estimation cache. Pass a string to force use of persistent
cache on disk. Pass a
dict
for in-memory cache (requiresbackend != 'multiprocessing'
. - return_preds (bool or list, optional) – whether to return prediction output. If
True
, final prediction is returned. Alternatively, pass a list of task names for which output should be returned. - warm_start (bool, optional) – whether to re-use previous input data initialization. Useful if repeated jobs are made on the same input arrays.
- split (bool, default = True) – whether to commit a separate sub-cache to each task.
- **kwargs (optional) – optional keyword arguments to pass onto each task.
Returns: out – Prediction array(s).
Return type: array-like, list, optional
- caller (iterable) – Iterable that generates accepted task instances. Caller should be
a child of the
ParallelEvaluation¶
-
class
mlens.parallel.backend.
ParallelEvaluation
(*args, **kwargs)[source]¶ Bases:
mlens.parallel.backend.BaseProcessor
Parallel cross-validation engine.
Minimal parallel processing engine. Similar to
ParallelProcessing
, but offers less features, only fits the callers indexer, and excepts no task output.-
process
(caller, case, X, y, path=None, **kwargs)[source]¶ Process caller.
Parameters: - caller (iterable) – Iterable for evaluation job.s Expected caller is a
Evaluator
instance. - case (str) – evaluation case to run on the evaluator. One of
'preprocess'
and'evaluate'
. - X (array-like of shape [n_samples, n_features]) – Input data
- y (array-like of shape [n_samples,], optional.) – targets. Required for fit, should not be passed to predict or transform jobs.
- path (str or dict, optional) – Custom estimation cache. Pass a string to force use of persistent
cache on disk. Pass a
dict
for in-memory cache (requiresbackend != 'multiprocessing'
.
- caller (iterable) – Iterable for evaluation job.s Expected caller is a
-
Job¶
-
class
mlens.parallel.backend.
Job
(job, stack, split, dir=None, tmp=None, predict_in=None, targets=None, predict_out=None)[source]¶ Bases:
object
Container class for holding and managing job data.
Job
is intended as a on-the-fly job handler that keeps track of input data, predictions, and manages estimation caches.Changed in version 0.2.0.
See also
Parameters: - job (str) – Type of job to run. One of
'fit'
,'transform'
,'predict'
. - stack (bool) – Whether to stack outputs when calls to
update()
are made. This will make thepredict_out
array becomepredict_in
. - split (bool) – Whether to create a new sub-cache when the
args
property is called. - dir (str, dict, optional) – estimation cache. Pass dictionary for use with multiprocessing or a string pointing to the disk directory to create the cache in
- tmp (obj, optional) – a Tempfile object for temporary directories
- targets (array-like of shape [n_in_samples,], optional) – input targets
- predict_in (array-like of shape [n_in_samples, n_in_features], optional) – input data
- predict_out (array_like of shape [n_out_samples, n_out_features], optional) – prediction output array
-
args
(**kwargs)[source]¶ Produce args dict
New in version 0.2.0.
Returns the arguments dictionary passed to a task of a parallel processing manager. Output dictionary has the following form:
out = {'auxiliary': {'X': self.predict_in, 'P': self.predict_out}, 'main': {'X': self.predict_in, 'P': self.predict_out}, 'dir': self.subdir(), 'job': self.job }
Parameters: **kwargs (optional) – Optional keyword arguments to pass to the task. Returns: args – Arguments dictionary Return type: dict
-
rebase
()[source]¶ Rebase output labels to input indexing.
Some indexers that only generate predictions for subsets of the training data require the targets to be rebased. Since indexers operate in a strictly sequential manner, rebase simply drop the first
n
observations in the target vector until number of observations remaining coincide.See also
BlendIndex
-
shuffle
(random_state)[source]¶ Shuffle inputs.
Permutes the indexing of
predict_in
andy
arrays.Parameters: random_state (int, obj) – Random seed number or generator to use.
-
subdir
()[source]¶ Return a cache subdirectory
If
split
is en force, a new sub-cache will be created in the main cache. Otherwise the same sub-cache as used in previous call will be returned.New in version 0.2.0.
Returns: cache – Either a string pointing to a cache persisted to disk, or an in-memory cache in the form of a list. Return type: str, list
- job (str) – Type of job to run. One of
Base classes¶
Schedulers for global setups:
Order | Setup types | Function calls |
---|---|---|
|
Independent of other features | IndexMixin._setup_0_index |
|
Reserved for aggregating classes | BaseStacker._setup_1_global |
|
Setups Dependents on 0 | ProbaMixin.__setup_2_multiplier |
|
Setups Dependents on 0, 2 | OutputMixin.__setup_3__output_columns |
Note that base classes and setup schedulers are experimental and may change without a deprecation cycle.
BaseBackend¶
-
class
mlens.parallel.base.
BaseBackend
(backend=None, n_jobs=-1, dtype=None, raise_on_exception=True)[source]¶ Bases:
object
Base class for parallel backend
Implements default backend settings.
-
__init__
(backend=None, n_jobs=-1, dtype=None, raise_on_exception=True)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
__weakref__
¶ list of weak references to the object (if defined)
-
BaseParallel¶
-
class
mlens.parallel.base.
BaseParallel
(name, *args, **kwargs)[source]¶ Bases:
mlens.parallel.base.BaseBackend
Base class for parallel objects
Parameters: - name (str) – name of instance. Should be unique.
- backend (str or object (default = 'threading')) – backend infrastructure to use during call to
mlens.externals.joblib.Parallel
. See Joblib for further documentation. To set global backend, seeset_backend()
. - raise_on_exception (bool (default = True)) – whether to issue warnings on soft exceptions or raise error.
Examples include lack of layers, bad inputs, and failed fit of an
estimator in a layer. If set to
False
, warnings are issued instead but estimation continues unless exception is fatal. Note that this can result in unexpected behavior unless the exception is anticipated. - verbose (int or bool (default = False)) – level of verbosity.
- n_jobs (int (default = -1)) – Degree of concurrency in estimation. Set to -1 to maximize, 1 runs on a single process (or thread).
- dtype (obj (default = np.float32)) – data type to use, must be compatible with a numpy array dtype.
BaseEstimator¶
-
class
mlens.parallel.base.
BaseEstimator
(*args, **kwargs)[source]¶ Bases:
mlens.parallel.base.ParamMixin
,mlens.externals.sklearn.base.BaseEstimator
,mlens.parallel.base.BaseParallel
Base Parallel Estimator class
Modified Scikit-learn class to handle backend params that we want to protect from changes.
-
__fitted__
¶ Fit status
-
BaseStacker¶
-
class
mlens.parallel.base.
BaseStacker
(stack=None, verbose=False, *args, **kwargs)[source]¶ Bases:
mlens.parallel.base.BaseEstimator
Base class for instanes that stack job estimators
-
__fitted__
¶ Fitted status
-
__init__
(stack=None, verbose=False, *args, **kwargs)[source]¶ Initialize self. See help(type(self)) for accurate signature.
-
__stack__
¶ Check stack
-
get_params
(deep=True)[source]¶ Get parameters for this estimator.
Parameters: deep (boolean, optional) – whether to return nested parameters.
-
verbose
¶ Verbosity
-
Mixins¶
ParamMixin¶
-
class
mlens.parallel.base.
ParamMixin
[source]¶ Bases:
mlens.externals.sklearn.base.BaseEstimator
,object
Parameter Mixin
Mixin for protecting static parameters from changes after fitting.
Note
To use this mixin the instance inheriting it must set
__static__=list()
and_static_fit_params_=dict()
in__init__
.
IndexMixin¶
-
class
mlens.parallel.base.
IndexMixin
[source]¶ Bases:
object
Indexer mixin
Mixin for handling indexers.
Note
- To use this mixin the instance inheriting it must set the
indexer
orindexers
attribute in__init__
(not both).
-
__indexer__
¶ Flag for existence of indexer
-
__weakref__
¶ list of weak references to the object (if defined)
OutputMixin¶
-
class
mlens.parallel.base.
OutputMixin
[source]¶ Bases:
mlens.parallel.base.IndexMixin
Output Mixin
Mixin class for interfacing with ParallelProcessing when outputs are desired.
Note
To use this mixin the instance inheriting it must set the
feature_span
attribute and__no_output__
flag in__init__
.
ProbaMixin¶
-
class
mlens.parallel.base.
ProbaMixin
[source]¶ Bases:
object
“Probability Mixin
Mixin for probability features on objects interfacing with
ParallelProcessing
Note
To use this mixin the instance inheriting it must set the
proba
and the_classes(=None)``attribute in ``__init__
.-
__weakref__
¶ list of weak references to the object (if defined)
-
classes_
¶ Prediction classes during proba
-