Troubleshooting¶
Known potential issues. Raise an issue if your problem is not addressed here.
Bad interaction with third-party packages¶
ML-Ensemble itself is thread-safe, but third-party packages may not be. A known issue with Scikit-learn (resolved as of 0.19.2) is that cloning is not thread-safe, estimators that clones internally (e.g. decision trees) can occasionally trigger an error. If
IndexError: Pop from empty list
happens, try using multiprocessing
instead.
With multiprocessing, be mindful of the start_method
used.
Due to how Python forks the main process when running multiprocessing,
workers can receive corrupted thread states prompting them to acquiring more threads than are available,
with the resulting of a deadlock. Due to this limitation and the additional overhead of multiprocessing,
If experiencing problems, try:
- ensure all estimators has
n_jobs
ornthread
equal to1
,- try changing the
backend
to eitherthreading
ormultiprocessing
,- if using
multiprocessing
, try varying the start method viaset_start_method()
.
Changing the start_method
from the default (fork
) barrs the use of interactively defined
functions and classes (all functions and classes passed to an mlens
object must be imported, not defined
the running script). For more information on multiprocessing issues see the Scikit-learn FAQ.
Array copying during fitting¶
When the number of folds is greater than 2, it is not possible to slice the full data in such a way as to return a view of that array (i.e. without copying any data). Hence for fold numbers larger than 2, each worker copies a subset of the training data at estimation. If you experience memory-bound issues, please consider using fewer folds during fitting. For further information on avoiding copying data during estimation, see Memory consumption.