The recent glibc 2.33 version recently available in Tumbleweed includes this change:

The dynamic linker loads optimized implementations of shared objects from subdirectories under the glibc-hwcaps directory on the library search path if the system's capabilities meet the requirements for that subdirectory. Initially supported subdirectories include "power9" and "power10" for the powerpc64le-linux-gnu architecture, "z13", "z14", "z15" for s390x-linux-gnu, and "x86-64-v2", "x86-64-v3", "x86-64-v4" for x86_64-linux-gnu. In the x86_64-linux-gnu case, the subdirectory names correspond to the vendor-independent x86-64 microarchitecture levels defined in the x86-64 psABI supplement. ʳᵉˡᵉᵃˢᵉ ⁿᵒᵗᵉ

This means we can now build different optimized libraries for each of those microarchitectures and have the dynamic linker use them if the host cpu supports the optimizations.

There are currently 4 levels of optimizations:

  • baseline (the current default)
    • CMOV
    • CX8
    • FPU
    • FXSR
    • MMX
    • OSFXSR
    • SCE
    • SSE
    • SSE2
  • x86-64-v2
    • CMPXCHG16
    • LAHF-SAHF
    • POPCNT
    • SSE3
    • SSE4-1
    • SSE4-2
    • SSSE3
  • x86-64-v3
    • AVX
    • AVX2
    • BMI1
    • BMI2
    • F16C
    • FMA
    • LZCNT
    • MOVBE
    • OSXSAVE
  • x86-64-v4
    • AVX512F
    • AVX512BW
    • AVX512CD
    • AVX512DQ
    • AVX512VL

ᵈᵒᶜ ʳᵉᶠᵉʳᵉⁿᶜᵉ

The idea for hackweek would be to first build a couple of optimized libraries manually, put them on those directories and test that it actually works. Then work on providing rpm macros and some documentation to make it easy to build different flavors of libraries with -march=x86-64-v2/v3/v4, install them in the right locations and get subpackages generated.

For example, the libfoo1 package would have libfoo1-x86-64-v2, libfoo1-x86-64-v3 and libfoo1-x86-64-v4 subpackages with only the respective optimized libraries in their filelist and they would use Supplements: packageand(libfoo1:x86-64-v3) so a user could install a x86-64-v3 package (name TBD) and get the optimized flavor for that microarchitecture installed automatically for installed libraries.

This would hopefully get a performance benefit in openSUSE Tumbleweed (and in SLE/Leap once they include the new glibc version).

After a quick talk with Florian Weimer (from glibc/Red Hat) who proposed a better and less intrusive approach, the plan (after the manual test mentioned above) is to:

  • Hack gcc to add an option to keep the GIMPLE bytecode when linking (-ffat-lto-objects might work for this but I'd need to test that).

  • Hack gcc to add an option to "relink" (or "reoptimize") an existing library/executable using its embedded GIMPLE bytecode and generate a new library/executable optimized for a given microarchitecture.

  • Check how to use objcopy to strip the embedded GIMPLE bytecode from the original library/executable after everything is finished.

  • Provide rpm macros to generate the mentioned subpackages and scripts in /usr/lib/rpm that would be run after the package is built in order to relink libraries for all microarchitectures with the new added options without having to rebuild the whole package several times.

This will be more difficult than expected since I don't have much experience with gcc's internals, but I guess that's what hackweeks are for :)

Results

There's a report with the results of this hackweek project here:

https://antlarr.io/2021/03/hackweek-20-glibc-hwcaps-in-opensuse/

and an explanation of the new rpm macros that were created for this, here:

https://antlarr.io/2021/03/microarchitecture-rpm-macros/

Looking for hackers with the skills:

Nothing? Add some keywords!

This project is part of:

Hack Week 20

Activity

  • 7 months ago: dfaggioli joined this project.
  • 8 months ago: dgedon liked this project.
  • 8 months ago: mkoutny liked this project.
  • 8 months ago: dfaggioli liked this project.
  • 8 months ago: aplanas liked this project.
  • 8 months ago: namtrac liked this project.
  • 8 months ago: dancermak liked this project.
  • 8 months ago: mbrugger liked this project.
  • 8 months ago: alarrosa started this project.
  • 8 months ago: alarrosa originated this project.

  • Comments

    • dancermak
      8 months ago by dancermak | Reply

      This sounds very intriguing! I have a few notes about this:

      • you might be interested in this (sadly stalled) upstream PR: https://github.com/rpm-software-management/rpm/pull/1035 which adds better detection of the currently running microarchitecture
      • once rpm gains the ability to automatically generate subpackages (https://github.com/rpm-software-management/rpm/pull/1485), this could be completely automated
      • I would suggest to use actual boolean dependencies instead of packageand: Supplements: (libfoo1 and x86-64-v3)

      And please, please make some noise about this and coordinate it with the other rpm based distros, so that we don't end up with yet another SUSE-ism but instead lead the innovation.

      • alarrosa
        8 months ago by alarrosa | Reply

        All are very good points. Thanks for the comments and the links! I'll definitely try to submit whatever I can to upstream.

    • dfaggioli
      8 months ago by dfaggioli | Reply

      Wow... This looks very interesting! I'm not really well versed in any of the technologies involved but, as soon as you have a library or to ready, I'd be happy to run benchmarks (w.g., with MMTests) to try to assess the differences (== improvements, hopefully add-emoji )

    Similar Projects

    This project is one of its kind!