Troubleshooting¶
Known potential issues. Raise an issue if your problem is not addressed here.
Bad interaction with third-party packages¶
ML-Ensemble itself is thread-safe, but third-party packages may not be. A known issue with Scikit-learn (resolved as of 0.19.2) is that cloning is not thread-safe, estimators that clones internally (e.g. decision trees) can occasionally trigger an error. If
IndexError: Pop from empty list
happens, try using multiprocessing instead.
With multiprocessing, be mindful of the start_method used.
Due to how Python forks the main process when running multiprocessing,
workers can receive corrupted thread states prompting them to acquiring more threads than are available,
with the resulting of a deadlock. Due to this limitation and the additional overhead of multiprocessing,
If experiencing problems, try:
- ensure all estimators has
n_jobsornthreadequal to1,- try changing the
backendto eitherthreadingormultiprocessing,- if using
multiprocessing, try varying the start method viaset_start_method().
Changing the start_method from the default (fork) barrs the use of interactively defined
functions and classes (all functions and classes passed to an mlens object must be imported, not defined
the running script). For more information on multiprocessing issues see the Scikit-learn FAQ.
Array copying during fitting¶
When the number of folds is greater than 2, it is not possible to slice the full data in such a way as to return a view of that array (i.e. without copying any data). Hence for fold numbers larger than 2, each worker copies a subset of the training data at estimation. If you experience memory-bound issues, please consider using fewer folds during fitting. For further information on avoiding copying data during estimation, see Memory consumption.