F41 Change Proposal - Python built with gcc -03 (self-contained)

Python built with gcc -O3

This is a proposed Change for Fedora Linux.
This document represents a proposed Change. As part of the Changes process, proposals are publicly announced in order to receive community feedback. This proposal will only be implemented if approved by the Fedora Engineering Steering Committee.

Wiki
Announced

:link: Summary

Instead of Fedoraā€™s default -O2 compiler flag, we will use -O3 to build CPython. This only impacts the interpreter and Python standard library, not any 3rd party extension modules built as RPM or on developer machines. This aligns with the way Python is built upstream. According to our performance measurements, it makes Python significantly faster (pyperformance geometric mean: 1.04x faster).

:link: Owner

:link: Detailed Description

We will replace the -O2 compiler flag with -O3 when building the python3.13 package. This change may be backported to older Pythons if desired. Python 3.13 should be the main Python version in Fedora 41+.

The Fedora packaging guidelines about compiler flags explicitly say:

> Overriding these flags for performance optimizations (for instance, -O3 instead of -O2) is generally discouraged. If you can present benchmarks that show a significant speedup for this particular code, this could be revisited on a case-by-case basis.

This change proposal presents such benchmarks and a case for Python to use -O3.

This change is limited to CPython interpreter and extension modules from the Python standard library only thanks to Changes/Python_Extension_Flags_Reduction (since Fedora 39). Other Python extension modules will remain bulidng as before, e.g. in RPM packages, they will still be built with -O2, unless Fedora changes that globally. The extension modules built with -O2 still work with Python built with -O3.

:link: Feedback

:link: Benefit to Fedora

Upstream already builds Python with -O3 by default. Fedoraā€™s Python built with -O3 is faster (1.04x):

Benchmark with python3.12-3.12.2-3.fc41

Benchmark -O2 -O3 Change Significance
2to3 465 ms 446 ms 1.04x faster Significant (t=21.72)
async_generators 853 ms 784 ms 1.09x faster Significant (t=36.61)
async_tree_cpu_io_mixed 1.19 sec 1.11 sec 1.08x faster Significant (t=13.38)
async_tree_cpu_io_mixed_tg 1.17 sec 1.09 sec 1.08x faster Significant (t=18.69)
async_tree_eager 202 ms 189 ms 1.07x faster Significant (t=7.99)
async_tree_eager_cpu_io_mixed 727 ms 664 ms 1.09x faster Significant (t=18.56)
async_tree_eager_cpu_io_mixed_tg 633 ms 558 ms 1.13x faster Significant (t=24.53)
async_tree_eager_io 1.72 sec 1.68 sec 1.03x faster Significant (t=6.13)
async_tree_eager_io_tg 1.65 sec 1.62 sec 1.02x faster Significant (t=4.65)
async_tree_eager_memoization 437 ms 422 ms 1.04x faster Significant (t=5.09)
async_tree_eager_memoization_tg 330 ms 322 ms 1.03x faster Significant (t=2.60)
async_tree_eager_tg 137 ms 125 ms 1.09x faster Significant (t=16.94)
async_tree_io 1.64 sec 1.60 sec 1.02x faster Significant (t=9.49)
async_tree_io_tg 1.65 sec 1.61 sec 1.02x faster Not significant
async_tree_memoization 895 ms 871 ms 1.03x faster Significant (t=3.73)
async_tree_memoization_tg 848 ms 836 ms 1.01x faster Not significant
async_tree_none 718 ms 700 ms 1.03x faster Significant (t=6.90)
async_tree_none_tg 686 ms 659 ms 1.04x faster Significant (t=13.11)
asyncio_tcp 757 ms 748 ms 1.01x faster Not significant
asyncio_tcp_ssl 2.58 sec 2.56 sec 1.01x faster Not significant
asyncio_websockets 419 ms 418 ms 1.00x faster Not significant
bench_mp_pool 10.7 ms 10.7 ms 1.00x faster Not significant
bench_thread_pool 1.62 ms 1.61 ms 1.01x faster Not significant
chameleon 12.2 ms 12.0 ms 1.02x faster Not significant
chaos 113 ms 105 ms 1.07x faster Significant (t=46.23)
comprehensions 37.4 us 35.1 us 1.07x faster Significant (t=49.72)
coroutines 42.4 ms 41.4 ms 1.02x faster Significant (t=18.68)
coverage 109 ms 104 ms 1.05x faster Significant (t=33.91)
create_gc_cycles 1.84 ms 1.79 ms 1.02x faster Significant (t=5.50)
crypto_pyaes 141 ms 127 ms 1.11x faster Significant (t=86.61)
dask 766 ms 769 ms 1.00x slower Not significant
deepcopy 619 us 614 us 1.01x faster Not significant
deepcopy_memo 71.3 us 68.3 us 1.04x faster Significant (t=26.58)
deepcopy_reduce 5.62 us 5.56 us 1.01x faster Not significant
deltablue 5.76 ms 5.49 ms 1.05x faster Significant (t=7.97)
django_template 62.8 ms 59.7 ms 1.05x faster Significant (t=27.05)
docutils 4.38 sec 4.29 sec 1.02x faster Significant (t=11.25)
fannkuch 706 ms 667 ms 1.06x faster Significant (t=75.80)
float 144 ms 137 ms 1.05x faster Significant (t=24.66)
gc_traversal 5.73 ms 5.81 ms 1.01x slower Not significant
generators 56.0 ms 58.2 ms 1.04x slower Significant (t=-16.25)
genshi_text 40.8 ms 39.5 ms 1.03x faster Significant (t=17.64)
genshi_xml 88.2 ms 86.3 ms 1.02x faster Significant (t=6.96)
go 223 ms 217 ms 1.03x faster Significant (t=19.92)
hexiom 10.3 ms 9.76 ms 1.05x faster Significant (t=42.15)
html5lib 109 ms 108 ms 1.01x faster Not significant
json_dumps 17.4 ms 16.3 ms 1.06x faster Significant (t=45.38)
json_loads 44.2 us 42.3 us 1.04x faster Significant (t=27.71)
logging_format 12.9 us 12.4 us 1.04x faster Significant (t=9.81)
logging_silent 176 ns 174 ns 1.01x faster Not significant
logging_simple 11.4 us 11.0 us 1.03x faster Significant (t=9.94)
mako 19.2 ms 18.1 ms 1.06x faster Significant (t=54.89)
mdp 4.46 sec 4.33 sec 1.03x faster Significant (t=30.14)
meteor_contest 189 ms 167 ms 1.13x faster Significant (t=60.31)
nbody 157 ms 153 ms 1.03x faster Significant (t=4.34)
nqueens 153 ms 140 ms 1.09x faster Significant (t=63.60)
pathlib 32.9 ms 32.6 ms 1.01x faster Not significant
pickle 18.6 us 16.0 us 1.16x faster Significant (t=23.88)
pickle_dict 45.8 us 44.6 us 1.03x faster Significant (t=16.51)
pickle_list 6.86 us 6.59 us 1.04x faster Significant (t=19.65)
pickle_pure_python 515 us 505 us 1.02x faster Not significant
pidigits 285 ms 284 ms 1.00x faster Not significant
pprint_pformat 2.72 sec 2.54 sec 1.07x faster Significant (t=40.28)
pprint_safe_repr 1.34 sec 1.25 sec 1.08x faster Significant (t=58.43)
pyflate 738 ms 724 ms 1.02x faster Not significant
python_startup 15.5 ms 15.3 ms 1.01x faster Not significant
python_startup_no_site 11.2 ms 11.0 ms 1.01x faster Not significant
raytrace 549 ms 514 ms 1.07x faster Significant (t=45.37)
regex_compile 245 ms 233 ms 1.05x faster Significant (t=13.30)
regex_dna 269 ms 268 ms 1.00x faster Not significant
regex_effbot 4.83 ms 4.95 ms 1.03x slower Significant (t=-12.52)
regex_v8 33.7 ms 33.1 ms 1.02x faster Not significant
richards 75.7 ms 71.9 ms 1.05x faster Significant (t=18.30)
richards_super 85.2 ms 81.4 ms 1.05x faster Significant (t=31.25)
scimark_fft 662 ms 587 ms 1.13x faster Significant (t=71.10)
scimark_lu 199 ms 190 ms 1.04x faster Significant (t=26.77)
scimark_monte_carlo 123 ms 117 ms 1.05x faster Significant (t=37.45)
scimark_sor 217 ms 210 ms 1.04x faster Significant (t=10.68)
scimark_sparse_mat_mult 8.51 ms 7.42 ms 1.15x faster Significant (t=62.99)
spectral_norm 196 ms 183 ms 1.07x faster Significant (t=95.78)
sqlalchemy_declarative 239 ms 234 ms 1.02x faster Significant (t=4.81)
sqlalchemy_imperative 33.1 ms 33.4 ms 1.01x slower Not significant
sqlglot_normalize 197 ms 187 ms 1.05x faster Significant (t=39.81)
sqlglot_optimize 97.1 ms 91.3 ms 1.06x faster Significant (t=47.14)
sqlglot_parse 2.29 ms 2.18 ms 1.05x faster Significant (t=14.70)
sqlglot_transpile 2.79 ms 2.67 ms 1.04x faster Significant (t=11.76)
sqlite_synth 3.97 us 3.90 us 1.02x faster Not significant
sympy_expand 833 ms 802 ms 1.04x faster Significant (t=19.41)
sympy_integrate 34.7 ms 33.8 ms 1.03x faster Significant (t=9.99)
sympy_str 511 ms 489 ms 1.04x faster Significant (t=18.17)
sympy_sum 286 ms 278 ms 1.03x faster Significant (t=14.46)
telco 12.6 ms 11.7 ms 1.08x faster Significant (t=9.31)
tomli_loads 3.91 sec 3.56 sec 1.10x faster Significant (t=46.29)
tornado_http 213 ms 212 ms 1.01x faster Not significant
typing_runtime_protocols 214 us 196 us 1.09x faster Significant (t=24.74)
unpack_sequence 70.5 ns 66.0 ns 1.07x faster Significant (t=8.58)
unpickle 24.3 us 22.0 us 1.10x faster Significant (t=10.67)
unpickle_list 7.44 us 8.61 us 1.16x slower Significant (t=-45.10)
unpickle_pure_python 390 us 360 us 1.08x faster Significant (t=37.48)
xml_etree_generate 160 ms 145 ms 1.10x faster Significant (t=44.33)
xml_etree_iterparse 189 ms 180 ms 1.05x faster Significant (t=20.16)
xml_etree_parse 275 ms 257 ms 1.07x faster Significant (t=20.58)
xml_etree_process 106 ms 98.6 ms 1.08x faster Significant (t=46.73)
Geometric mean 1.04x faster

Generated by pyperformance run -o Ox.json and pyperformance compare -O table O2.json O3.json on Fedora 40 x86_64 with rawhide-built Python, python3.12-3.12.2-3.fc41, on Lenovo X1 Carbon 3rd gen.

The benchmark was performed on Python 3.12 because it uses 3rd party Python packages lacking support for Python 3.13. Once it is possible to run such a benchmark for Python 3.13, we will do so.

The benchmark was performed on x86_64. Until somebody presents a contradicting benchmark (or gives explicit reason for us to measure it), we believe the change makes sense on all architectures.

:link: Scope

  • Proposal owners:

    • Change python3.13 to build with -O3 instead of -O2
    • Backport the change to older Pythons if desired
  • Other developers: no action expected, report bugs when found

  • Release engineering: no action expected

  • Policies and guidelines: this change is following the spirit of the guidelines

  • Trademark approval: not needed for this Change

  • Alignment with Community Initiatives: faster Python, happier users, more contributors?

:link: Upgrade/compatibility impact

None expected.

:link: How To Test

To verify this change has landed, inspect the build.log of python3.13. It should be built with gcc ... -O3.

To test this change, test Fedora as you would normally do and assert there are no regressions.

Run benchmarks, and report slowdowns if found.

:link: User Experience

Faster Python, faster Fedora.

:link: Dependencies

:link: Contingency Plan

  • Contingency mechanism: revert the change, rebuild Python
  • Contingency deadline: Final Freeze
  • Blocks release? No

:link: Documentation

N/A (not a System Wide Change)

:link: Release Notes

Last edited by @amoloney 2024-04-12T23:49:56Z

How do you feel about the proposal as written?
  • Strongly in favor
  • In favor, with reservations
  • Neutral
  • Opposed, but could be convinced
  • Strongly opposed
0 voters

If you are in favor but have reservations, or are opposed but something could change your mind, please explain in a reply.

We want everyone to be heard, but many posts repeating the same thing actually makes that harder. If you have something new to say, please say it. If, instead, you find someone has already covered what youā€™d like to express, please simply giving that post a :heart: instead of reiterating. You can even do this by email, by replying with the heart emoji or just ā€œ+1ā€. This will make long topics easier to follow.

Please note that this is an advisory ā€œstraw pollā€ meant to gauge sentiment. It isnā€™t a vote or a scientific survey. See About the Change Proposals category for more about the Change Process and moderation policy.

Removed f39

Love the new poll, hopefully this reduces the number of redundant comments!

The table formatting looks broken on Discourse, can it be edited?

2 Likes

That should look better now, my fault - was missing a ā€˜spaceā€™ :slight_smile:

1 Like

On the devel list, Iā€™ve been asked:

How much larger is Python at -O3 compared to -O2?

I consider that a very good question and will measure that.

Added to Changes/Python built with gcc O3 - Fedora Project Wiki

-O2 RPM size:

  • python3-3.12.2-3.fc41.x86_64 32638
  • python3-libs-3.12.2-3.fc41.x86_64 42888846

-O3 RPM size:

  • python3-3.12.2-3.O3.fc41.x86_64 32638
  • python3-libs-3.12.2-3.O3.fc41.x86_64 43389702

Difference of python3-libs:

500856 == 489 kB = 1.1678 % increase in size of pytohn3-libs itself or 1.1669 % of python3-libs+python3 combined size.

I know it would be work/hassle, but would you (or anyone else interested I guess) be willing to do a scratch build that builds the -O3 version, then runs the benchmark against it and the system python?

That would give us some indication on how this changes on other arches.
I donā€™t know if thereā€™s env constraints in a scratch build that would prevent this from working though.

Otherwise the increase seems kind of small to me, but I know python is heavily used so any performance gains are welcome.

The benchmark runs pip install during the setup. I can probably trick it by packaging the whole tree of dependencies as a Source, but it wonā€™t be trivial.

A post was split to a new topic: Please make Change Proposal poll messages more descriptive

Running the benchmark in Koji via changes in https://src.fedoraproject.org/fork/churchyard/rpms/python3.12/commits/benchmark in build (rawhide, python3.12-3.12.3-2.perf.rigorous.fc41.src.rpm) | Task Info | koji

tl;dr Geometric means

  • aarch64: 1.05x faster
  • i686: 1.08x faster
  • ppc64le: 1.05x faster
  • s390x: 1.06x slower
  • x86_64: 1.05x faster

Full results available as jsons at Fedora People

I will run the build multiple times to eliminate potential flakiness.

However, if this proves to be a trend, I am prepared to exclude s390x from this change.

1 Like

Other s390x benchmarks:

aarch64

i686

ppc64le

s390x

x86_64

Three builds still running, but you get the idea.

Take this with a grain of salt, the logs are full of:

WARNING: the benchmark result may be unstable
* the standard deviation (17.4 ns) is 24% of the mean (72.1 ns)
* the maximum (113 ns) is 57% greater than the mean (72.1 ns)
Try to rerun the benchmark with more runs, values and/or loops.
Run 'python -m pyperf system tune' command to reduce the system jitter.
Use pyperf stats, pyperf dump and pyperf hist 
1 Like

This change proposal has now been submitted to FESCo with ticket #3202 for voting.

To find out more, please visit our Changes Policy documentation.

This change has been accepted by FESCo for Fedora Linux 41. A full list of approved changes to date can be found on the Change Set Page.

To find out more about how our changes policy works, please visit our docs site.

This change has been accepted by FESCo for Fedora Linux 41. A full list of approved changes to date can be found on the Change Set Page.

To find out more about how our changes policy works, please visit our docs site.

Amendment: Flag for user-built extension modules

When this change was originally proposed, the expectation was that the C flags for building Python extension modules other than modules from the Python standard library would not be changed.

  • Python extension modules built in Fedora RPM packages were built with -O2 before this change and would continue to be built that way (for packages other than Python itself).
  • Python extension modules built outside of Fedora RPM packages were built with no -O flag before this change and would continue to be built that way.

However, after implementing the change proposal, it was accidentally changed so the Python extension modules built outside of Fedora RPM packages are built with the -O3 flag. This was not originally intended, yet the change owners believe we should keep it that way because it makes Fedoraā€™s Python closer to upstream Python and because it makes Fedora more competitive with other platforms on CIs and similar systems.

For more details, see The -O3 flag leaked into sysconfig CFLAGS, should we keep it? - python-devel - Fedora mailing-lists


I am asking FESCo to approve this amendment in Issue #3260: Fedora 41 chnage proposal amendement: Build user-built Python extension moduels with -O3 - fesco - Pagure.io