Friday, October 7, 2022

[SOLVED] Are there any downsides to using virtualenv for scientific python and machine learning?

Issue

I have received several recommendations to use virtualenv to clean up my python modules. I am concerned because it seems too good to be true. Has anyone found downside related to performance or memory issues in working with multicore settings, starcluster, numpy, scikit-learn, pandas, or iPython notebook.


Solution

Virtualenv is the best and easiest way to keep some sort of order when it comes to dependencies. Python is really behind Ruby (bundler!) when it comes to dealing with installing and keeping track of modules. The best tool you have is virtualenv.

So I suggest you create a virtualenv directory for each of your applications, put together a file where you list all the 'pip install' commands you need to build the environment and ensure that you have a clean repeatable process for creating this environment.

I think that the nature of the application makes little difference. There should not be any performance issue since all that virtualenv does is to load libraries from a specific path rather than load them from the directory where they are saved by default.

In any case (this may be completely irrelevant), but if performance is an issue, then perhaps you ought to be looking at a compiled language. Most likely though, any performance bottlenecks could be improved with better coding.



Answered By - Dimitris
Answer Checked By - Candace Johnson (WPSolving Volunteer)