Monday, January 3, 2022

[SOLVED] Solving conda environment stuck

January 03, 2022 conda, ubuntu, virtual-environment

Issue

I'm trying to install conda environment using the command:

conda env create -f devenv.yaml

My .yaml file is

name: myname
channels:
  - conda-forge
  - bioconda
dependencies:
  # Package creation and environment management
  - conda-build
  # Automation control (command line interface, workflow and multi-process management)
  - python-dotenv
  - click
  - snakemake-minimal
  - joblib
  - numba
  # Workspace
  - notebook
  # Visualization
  - plotly
  - plotly-orca
  - matplotlib
  - seaborn
  - shap
  - openpyxl
  - ipywidgets
  - tensorboard
  # Data manipulation
  - numpy
  - pandas
  - pyarrow
  # Functional style tools
  - more-itertools
  - toolz
  # Machine learning
  - scikit-learn
  - imbalanced-learn
  - scikit-image
  - statsmodels
  - catboost
  - hyperopt
  - tsfresh
  # Deep learning
  - pytorch
  # code checking and formatting
  - pylint
  - black
  - flake8
  - mypy
  # Python base
  - python
  - pip
  - pip:

I've tried to update conda but it doesn't help. It just stuck on solving the environment.

conda version: 4.11.0 c OS: Ubuntu 18.04.5 LTS

The exact same environment works fine on my mac, but not on that server. What could be the issue? I appreciate any suggestions. Thx.

Solution

This solves fine (), but is indeed a complex solve mainly due to:

underspecification
lack of modularization

Underspecification

This particular environment specification ends up installing well over 300 packages. And there isn't a single one of those that are constrained by the specification. That is a huge SAT problem to solve and Conda will struggle with this. Mamba will help solve faster, but providing additional constraints can vastly reduce the solution space.

At minimum, specify a Python version (major.minor), such as python=3.9. This is the single most effective constraint.

Beyond that, putting minimum requirements on central packages (those that are dependencies of others) can help, such as minimum NumPy.

Lack of Modularization

I assume the name "devenv" means this is a development environment. So, I get that one wants all these tools immediately at hand. However, Conda environment activation is so simple, and most IDE tooling these days (Spyder, VSCode, Jupyter) encourages separation of infrastructure and the execution kernel. Being more thoughtful about how environments (emphasis on the plural) are organized and work together, can go a long way in having a sustainable and painless data science workflow.

The environment at hand has multiple red flags in my book:

conda-build should be in base and only in base
snakemake should be in a dedicated environment
notebook (i.e., Jupyter) should be in a dedicated environment, co-installed with nb_conda_kernels; all kernel environments need are ipykernel

I'd probably also have the linting/formatting packages separated, but that's less an issue. The real killer though is snakemake - it's just a massive piece of infrastructure and I'd strongly encourage keeping that separated.

Answered By - merv

This Answer collected from stackoverflow and tested by PythonFixing community admins, is licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0