Issue
As the title says, how to make the python3 interpreter which executes python programs; faster or even the fastest possible using the vanilla CPython versions available at www.python.org; in any linux distribution?
Is there a way to make it even faster than the default "--enable-optimizations" compiler flag?
Which compiler to use to achieve the goal?
What supportive benchmarks to corroborate assertions on compiler choice?
Solution
I think I found a way but it only makes it slightly faster from my benchmarks.
I did it by doing the following things:
- Extending the profiling for PGO to all 425 regression tests that comes with the python3 source. Configuring "--enable-optimizations" only runs a small subset of the 425 regression test that comes with the Python3 source.
- Adding CFLAGS="-march=native -O3 -pipe" with LTO via "--with-lto" configure option
- Adding "-fprofile-update=prefer-atomic" to the profiling stage
- Adding "-fprofile-partial-training" to the final Feedback Directed Optimisation (FDO) stage.
How to do the above and what are the consequences?
First, the results...
Picture paints a thousand words as they say!
- The red python has all points 1-4 above done.
- While the green python only has points 2 with the stock "--enable-optimizations" configuration which does the limited PGO subset.
Lower is better. So you can see the majority of wins goes to the red python with several wins to the green python.
Pyperformance was used for the benchmarks which has a focus on real-world benchmarks, rather than synthetic benchmarks, using whole applications when possible.
https://pyperformance.readthedocs.io/index.html
And it was graphed using pyperfplot.
https://github.com/stefantalpalaru/pyperfplot
The endeavour wet my appetite so I did several more benchmarks which took a full day to do....
- Red and yellow pythons are the same as the red and green from the previous graph.
- Green python is Python3.9 from Ubuntu's repository compiled by them using gcc9.3.
- Light Blue python is Clang12 with point 2 above with the stock "--enable-optimizations" configuration which does the limited PGO subset. It is the worst performer of the lot in the benchmarks! Surprising really I started this endeavour thinking Clang-12 would win out with all the recent publications and advertising going around with Linux now fully LTO'able and Clang-12 dominating first place wins in many Phoronix benchmark articles in the last couple of months.
- Dark Blue is the default Ubuntu Python3.8 that comes from the repositories. Added here just to show if there's been progress from 3.8 to 3.9 and to compare with my custom builds.
So How to do the above 4 points and what are the consequences?
- Get the python3 version you want to build, I got 3.9.6...
wget https://www.python.org/ftp/python/3.9.6/Python-3.9.6.tar.xz
- Decompress...
tar -xf ./Python-3.9.6.tar.xz
- Go to the directory and configure it.
cd ./Python-3.9.6
For gcc
time CFLAGS="-march=native -O3 -pipe" ./configure --enable-optimizations --with-lto
For clang
time CC="clang" CFLAGS="-march=native -O3 -pipe -Wno-unused-value -Wno-empty-body -Qunused-arguments -Wno-parentheses-equality" ./configure --enable-optimizations --with-lto
The extra options for clang is just following the official advice from the python devs here... https://devguide.python.org/setup/#clang
- At this point you would traditional start building/compiling. However we want to further customise how the build will be with the extra options during profiling and during final release build.
nano Makefile
Search for "PGO_PROF_GEN_FLAG" (ctrl+w) And append after a space "-fprofile-update=prefer-atomic" without the quotes. It should look something like...
PGO_PROF_GEN_FLAG=-fprofile-generate -fprofile-update=prefer-atomic
- The next line underneath should say "PGO_PROF_USE_FLAG"; it affects the final release build/compile append "-fprofile-partial-training" after a space at the end without the quotes. It should look something like...
PGO_PROF_USE_FLAG=-fprofile-use -fprofile-correction -fprofile-partial-training
Note that this point is only compatible with gcc. "-fprofile-partial-training" is not available to clang-12 at the time of this writing. Without this setting, gcc will 'optimise for size' code paths that were not part of profiling. Enabling this setting will make gcc optimise code paths not profiled, to be 'optimised for speed' aggressively which can lead to better performance but at the cost of larger code size.
see here: https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
- Finally we extended the list of regression tests to run from the stock subset to the full set of tests. While still in "nano Makefile" search for "PROFILE_TASK= -m test --pgo" and replace it with:
PROFILE_TASK= -m test --pgo-extended
- Now you can start the build. Note however that enabling the full suite of tests for profiling will massively increase the amount of time needed to build Python3 to completion.
time make -j$(( $(nproc) + 1 ))
The -j formula in the command above just figures out the number of cpus you have and adds 1 for multiprocessing of the build/compile/linking to speed it up.
The regression tests though will be executed sequentially unfortunately with no easy way of switching to a concurrent way of running the tests. It will run 425 tests profiling all of them!
On my i7-3770 it took this long...
real 49m26.882s
user 55m1.160s
sys 2m1.106s
But I did have a few other programs and applications and a VM running at the same time.
- Once done, "altinstall" so you do not mess up the default python3 that comes with your distribution which can cause problems.
sudo make altinstall
- If you have multiple custom built python versions use update-alternatives to manage them.
sudo update-alternatives --verbose --install /usr/local/bin/python3 python3 /usr/local/bin/python3.7 374 --slave /usr/local/bin/python3-config python3-config /usr/local/bin/python3.7-config
sudo update-alternatives --verbose --install /usr/local/bin/python3 python3 /usr/local/bin/python3.8 382 --slave /usr/local/bin/python3-config python3-config /usr/local/bin/python3.8-config
sudo update-alternatives --verbose --install /usr/local/bin/python3 python3 /usr/local/bin/python3.9 396 --slave /usr/local/bin/python3-config python3-config /usr/local/bin/python3.9-config
Use the following command to configure which is the default "python3"
sudo update-alternatives --config python3
This is mine...
There are 4 choices for the alternative python3 (providing /usr/local/bin/python3).
Selection Path Priority Status
------------------------------------------------------------
* 0 /usr/local/bin/python3.9 396 auto mode
1 /usr/bin/pypy3 369 manual mode
2 /usr/local/bin/python3.7 374 manual mode
3 /usr/local/bin/python3.8 382 manual mode
4 /usr/local/bin/python3.9 396 manual mode
Press <enter> to keep the current choice[*], or type selection number:
Lastly, something to note is that any python3 in "/usr/bin" belongs to your linux distribution. Try not to mess with it as it can mess things up later. All your altinstalls will got to "/usr/local/bin".
Some Conclusions...
- Clang, an awesome compiler and project, is bad for python3, well at least with my setup. Perhaps if their devs are reading this, they can do something about it.
- GCC rules Python3, I haven't got Intel's compiler (ICC?) so don't know, but I hear it is even better when used to build python3.
- The tweaks stated above and outlined have made my default python3 faster and snappier overall, however it took a LOT of time to build it! It is worth it in my opinion.
UPDATE: Python 3.10.0 vs ubuntu 20.04 stock python 3.8.10
Python 3.10.0 with Full PGO, Partial Training, Prefer-atomic, march=native (zen3 R7-5800X), O3 optimisation
Clear wins --> 33
Python 3.8.10 Ubuntu stock from repos
Clear wins --> 22
However looking at the graph you can see that where 3.10 lost out to 3.8.10 stock from Ubuntu repos, some of the magnitudes are quite big.
Answered By - DanglingPointer Answer Checked By - Marie Seifert (WPSolving Admin)