Issue
I'm trying to install conda environment using the command:
conda env create -f devenv.yaml
My .yaml file is
name: myname
channels:
- conda-forge
- bioconda
dependencies:
# Package creation and environment management
- conda-build
# Automation control (command line interface, workflow and multi-process management)
- python-dotenv
- click
- snakemake-minimal
- joblib
- numba
# Workspace
- notebook
# Visualization
- plotly
- plotly-orca
- matplotlib
- seaborn
- shap
- openpyxl
- ipywidgets
- tensorboard
# Data manipulation
- numpy
- pandas
- pyarrow
# Functional style tools
- more-itertools
- toolz
# Machine learning
- scikit-learn
- imbalanced-learn
- scikit-image
- statsmodels
- catboost
- hyperopt
- tsfresh
# Deep learning
- pytorch
# code checking and formatting
- pylint
- black
- flake8
- mypy
# Python base
- python
- pip
- pip:
I've tried to update conda but it doesn't help. It just stuck on solving the environment.
conda version: 4.11.0 c OS: Ubuntu 18.04.5 LTS
The exact same environment works fine on my mac, but not on that server. What could be the issue? I appreciate any suggestions. Thx.
Solution
This solves fine (), but is indeed a complex solve mainly due to:
- underspecification
- lack of modularization
Underspecification
This particular environment specification ends up installing well over 300 packages. And there isn't a single one of those that are constrained by the specification. That is a huge SAT problem to solve and Conda will struggle with this. Mamba will help solve faster, but providing additional constraints can vastly reduce the solution space.
At minimum, specify a Python version (major.minor), such as python=3.9
. This is the single most effective constraint.
Beyond that, putting minimum requirements on central packages (those that are dependencies of others) can help, such as minimum NumPy.
Lack of Modularization
I assume the name "devenv" means this is a development environment. So, I get that one wants all these tools immediately at hand. However, Conda environment activation is so simple, and most IDE tooling these days (Spyder, VSCode, Jupyter) encourages separation of infrastructure and the execution kernel. Being more thoughtful about how environments (emphasis on the plural) are organized and work together, can go a long way in having a sustainable and painless data science workflow.
The environment at hand has multiple red flags in my book:
conda-build
should be in base and only in basesnakemake
should be in a dedicated environmentnotebook
(i.e., Jupyter) should be in a dedicated environment, co-installed withnb_conda_kernels
; all kernel environments need areipykernel
I'd probably also have the linting/formatting packages separated, but that's less an issue. The real killer though is snakemake
- it's just a massive piece of infrastructure and I'd strongly encourage keeping that separated.
Answered By - merv