ML-ENSEMBLE

author:Sebastian Flennerhag
copyright:2017-2018
license:MIT

Metric utilities and functions.

mlens.metrics

Data

class mlens.metrics.Data(data=None, padding=2, decimals=2)[source]

Bases: collections.OrderedDict

Wrapper class around dict to get pretty prints

Data is an ordered dictionary that implements a dedicated pretty print method for a nested dictionary. Printing a Data dictionary provides a human-readable table. The input dictionary is expected to have two levels: the first level gives the columns and the second level the rows. Rows names are parsed as [OUTER]/[MIDDLE].[INNER]--[IDX], where IDX has to be an integer. All entries are optional.

Warning

Data is an internal class that expects a particular functions. This class cannot be used as a general drop-in replacement for the standard dict class.

Examples

>>> from mlens.metrics import Data
>>> d = [('row-idx-1.row-idx-2.0.0', {'column-1': 0.1, 'column-2': 0.1})]
>>> data = Data(d)
>>> print(data)
                        column-a  column-b
row-idx-1  row-idx-2        0.10      0.20

assemble_data

mlens.metrics.assemble_data(data_list)[source]

Build a data dictionary out of a list of entries and data dicts

Given a list named tuples of dictionaries, assemble_data() returns a nested ordered dictionary with data keys as outer keys and tuple names as inner keys. The returned dictionary can be printed in tabular format by assemble_table().

Examples

>>> from mlens.metrics import assemble_data, assemble_table
>>> d = [('row-idx-1.row-idx-2.a.b', {'column-1': 0.1, 'column-2': 0.1})]
>>> print(assemble_table(assemble_data(d)))
                        column-2-m  column-2-s  column-1-m  column-1-s
row-idx-1  row-idx-2          0.10        0.00        0.10        0.00

assemble_table

mlens.metrics.assemble_table(data, padding=2, decimals=2)[source]

Construct data table from input dict

Given a nested dictionary formed by assemble_data(), assemble_table() returns a string that prints the contents of the input in tabular format. The input dictionary is expected to have two levels: the first level gives the columns and the second level the rows. Rows names are parsed as [OUTER]/[MIDDLE].[INNER]--[IDX], where IDX must be an integer. All entries are optional.

See also

Data, assemble_data()

Examples

>>> from mlens.metrics import assemble_data, assemble_table
>>> d = [('row-idx-1.row-idx-2.a.b', {'column-1': 0.1, 'column-2': 0.1})]
>>> print(assemble_table(assemble_data(d)))
                        column-2-m  column-2-s  column-1-m  column-1-s
row-idx-1  row-idx-2          0.10        0.00        0.10        0.00

make_scorer

mlens.metrics.make_scorer(score_func, greater_is_better=True, needs_proba=False, needs_threshold=False, **kwargs)[source]

Make a scorer from a performance metric or loss function.

This factory function wraps scoring functions for use in GridSearchCV and cross_val_score. It takes a score function, such as accuracy_score, mean_squared_error, adjusted_rand_index or average_precision and returns a callable that scores an estimator’s output.

Read more in the User Guide.

Parameters:
  • score_func (callable,) – Score function (or loss function) with signature score_func(y, y_pred, **kwargs).
  • greater_is_better (boolean, default=True) – Whether score_func is a score function (default), meaning high is good, or a loss function, meaning low is good. In the latter case, the scorer object will sign-flip the outcome of the score_func.
  • needs_proba (boolean, default=False) – Whether score_func requires predict_proba to get probability estimates out of a classifier.
  • needs_threshold (boolean, default=False) –

    Whether score_func takes a continuous decision certainty. This only works for binary classification using estimators that have either a decision_function or predict_proba method.

    For example average_precision or the area under the roc curve can not be computed using discrete predictions alone.

  • **kwargs (additional arguments) – Additional parameters to be passed to score_func.
Returns:

scorer – Callable object that returns a scalar score; greater is better.

Return type:

callable

Examples

>>> from sklearn.metrics import fbeta_score, make_scorer
>>> ftwo_scorer = make_scorer(fbeta_score, beta=2)
>>> ftwo_scorer
make_scorer(fbeta_score, beta=2)
>>> from sklearn.model_selection import GridSearchCV
>>> from sklearn.svm import LinearSVC
>>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]},
...                     scoring=ftwo_scorer)

rmse

mlens.metrics.rmse(y, p)[source]

Root Mean Square Error.

\[RMSE(\mathbf{y}, \mathbf{p}) = \sqrt{MSE(\mathbf{y}, \mathbf{p})},\]

with

\[MSE(\mathbf{y}, \mathbf{p}) = |S| \sum_{i \in S} (y_i - p_i)^2\]
Parameters:
  • y (array-like of shape [n_samples, ]) – ground truth.
  • p (array-like of shape [n_samples, ]) – predicted labels.
Returns:

z – root mean squared error.

Return type:

float

mape

mlens.metrics.mape(y, p)[source]

Mean Average Percentage Error.

\[MAPE(\mathbf{y}, \mathbf{p}) = |S| \sum_{i \in S} | \frac{y_i - p_i}{y_i} |\]
Parameters:
  • y (array-like of shape [n_samples, ]) – ground truth.
  • p (array-like of shape [n_samples, ]) – predicted labels.
Returns:

z – mean average percentage error.

Return type:

float

wape

mlens.metrics.wape(y, p)[source]

Weighted Mean Average Percentage Error.

\[WAPE(\mathbf{y}, \mathbf{p}) = \frac{\sum_{i \in S} | y_i - p_i|}{ \sum_{i \in S} |y_i|}\]
Parameters:
  • y (array-like of shape [n_samples, ]) – ground truth.
  • p (array-like of shape [n_samples, ]) – predicted labels.
Returns:

z – weighted mean average percentage error.

Return type:

float