ML-ENSEMBLE

author:Sebastian Flennerhag
copyright:2017-2018
license:MIT

mlens.visualization

corrmat

mlens.visualization.corrmat(corr, figsize=(11, 9), annotate=True, inflate=True, linewidths=0.5, cbar_kws='default', show=True, ax=None, title='Correlation Matrix', title_font_size=14, **kwargs)[source]

Function for generating color-coded correlation triangle.

Parameters:
  • corr (array-like of shape = [n_features, n_features]) – Input correlation matrix. Pass a pandas DataFrame for axis labels.
  • figsize (tuple (default = (11, 9))) – Size of printed figure.
  • annotate (bool (default = True)) – Whether to print the correlation coefficients.
  • inflate (bool (default = True)) – Whether to inflate correlation coefficients to a 0-100 scale. Avoids decimal points in the figure, which often appears very cluttered otherwise.
  • linewidths (float) – with of line separating each coordinate square.
  • cbar_kws (dict, str (default = 'default')) – Optional arguments to color bar. The default options, ‘default’, passes the shrink parameter to fit colorbar standard figure frame.
  • show (bool (default = True)) – whether to print figure using matplotlib.pyplot.show.
  • title (str) – figure title if shown.
  • title_font_size (int) – title font size.
  • ax (object, optional) – axis to attach plot to.
  • **kwargs (optional) – Other optional arguments to sns heatmap.
Returns:

ax – axis object.

Return type:

object

clustered_corrmap

mlens.visualization.clustered_corrmap(corr, cls, label_attr_name='labels_', figsize=(10, 8), annotate=False, inflate=False, linewidths=0.5, cbar_kws='default', show=True, title_fontsize=14, title_name='Clustered correlation heatmap', ax=None, **kwargs)[source]

Function for plotting a clustered correlation heatmap.

Parameters:
  • corr (array-like of shape = [n_features, n_features]) – Input correlation matrix. Pass a pandas DataFrame for axis labels.
  • cls (instance) – cluster estimator with a fit method and cluster labels stored as an attribute as specified by the label_attr_name parameter.
  • label_attr_name (str) – name of attribute that contains cluster labels.
  • figsize (tuple (default = (10, 8))) – Size of figure.
  • annotate (bool (default = True)) – Whether to print the correlation coefficients.
  • inflate (bool (default = True)) – Whether to inflate correlation coefficients to a 0-100 scale. Avoids decimal points in the figure, which often appears very cluttered otherwise.
  • linewidths (float (default = .5)) – with of line separating each coordinate square.
  • cbar_kws (dict, str (default = 'default')) – Optional arguments to color bar.
  • title_name (str) – Figure title.
  • title_fontsize (int) – size of title.
  • show (bool (default = True)) – whether to print figure using matplotlib.pyplot.show.
  • ax (object, optional) – axis to attach plot to.
  • **kwargs (optional) – Other optional arguments to sns heatmap.

corr_X_y

mlens.visualization.corr_X_y(X, y, top=5, figsize=(10, 8), fontsize=12, hspace=None, no_ticks=True, label_rotation=0, show=True)[source]

Function for plotting input feature correlations with output.

Output figure shows all correlations as well as top pos and neg.

Parameters:
  • X (pandas DataFrame of shape = [n_samples, n_features]) – Input data.
  • y (pandas Series of shape = [n_samples,]) – training labels.
  • top (int) – number of features to show in top pos and neg graphs.
  • figsize (tuple (default = (10, 8))) – Size of figure.
  • hspace (float, optional) – whitespace between top row of figures and bottom figure.
  • fontsize (int) – font size of subplot titles.
  • no_ticks (bool (default = False)) – whether to remove ticklabels from full correlation plot.
  • label_rotation (float (default = 0)) – rotation of labels
  • show (bool (default = True)) – whether to print figure using matplotlib.pyplot.show.
Returns:

ax – axis object.

Return type:

object

pca_plot

mlens.visualization.pca_plot(X, estimator, y=None, cmap=None, figsize=(10, 8), title='Principal Components Analysis', title_font_size=14, show=True, ax=None, **kwargs)[source]

Function to plot a PCA analysis of 1, 2, or 3 dims.

Parameters:
  • X (array-like of shape = [n_samples, n_features]) – matrix to perform PCA analysis on.
  • estimator (instance) – PCA estimator. Assumes a Scikit-learn API.
  • y (array-like of shape = [n_samples, ] or None (default = None)) – training labels to be used for color highlighting.
  • cmap (object, optional) – cmap object to pass to matplotlib.pyplot.scatter.
  • figsize (tuple (default = (10, 8))) – Size of figure.
  • title (str) – figure title if shown.
  • title_font_size (int) – title font size.
  • show (bool (default = True)) – whether to print figure matplotlib.pyplot.show.
  • ax (object, optional) – axis to attach plot to.
  • **kwargs (optional) – arguments to pass to matplotlib.pyplot.scatter.
Returns:

ax – if ax was specified, returns ax with plot attached.

Return type:

optional

pca_comp_plot

mlens.visualization.pca_comp_plot(X, y=None, figsize=(10, 8), title='Principal Components Comparison', title_font_size=14, show=True, **kwargs)[source]

Function for comparing PCA analysis.

Function compares across 2 and 3 dimensions and linear and rbf kernels.

Parameters:
  • X (array-like of shape = [n_samples, n_features]) – input matrix to be used for prediction.
  • y (array-like of shape = [n_samples, ] or None (default = None)) – training labels to be used for color highlighting.
  • figsize (tuple (default = (10, 8))) – Size of figure.
  • title (str) – figure title if shown.
  • title_font_size (int) – title font size.
  • show (bool (default = True)) – whether to print figure matplotlib.pyplot.show.
  • **kwargs (optional) – optional arguments to pass to mlens.visualization.pca_plot.
Returns:

axis object.

Return type:

ax

exp_var_plot

mlens.visualization.exp_var_plot(X, estimator, figsize=(10, 8), buffer=0.01, set_labels=True, title='Explained variance ratio', title_font_size=14, show=True, ax=None, **kwargs)[source]

Function to plot the explained variance using PCA.

Parameters:
  • X (array-like of shape = [n_samples, n_features]) – input matrix to be used for prediction.
  • estimator (class) – PCA estimator, not initiated, assumes a Scikit-learn API.
  • figsize (tuple (default = (10, 8))) – Size of figure.
  • buffer (float (default = 0.01)) – For creating a buffer around the edges of the graph. The buffer added is calculated as num_components * buffer, where num_components determine the length of the x-axis.
  • set_labels (bool) – whether to set axis labels.
  • title (str) – figure title if shown.
  • title_font_size (int) – title font size.
  • show (bool (default = True)) – whether to print figure using matplotlib.pyplot.show.
  • ax (object, optional) – axis to attach plot to.
  • **kwargs (optional) – optional arguments passed to the matplotlib.pyplot.step function.
Returns:

ax – if ax was specified, returns ax with plot attached.

Return type:

optional