The recent glibc 2.33 version recently available in Tumbleweed includes this change:
The dynamic linker loads optimized implementations of shared objects from subdirectories under the glibc-hwcaps directory on the library search path if the system's capabilities meet the requirements for that subdirectory. Initially supported subdirectories include "power9" and "power10" for the powerpc64le-linux-gnu architecture, "z13", "z14", "z15" for s390x-linux-gnu, and "x86-64-v2", "x86-64-v3", "x86-64-v4" for x86_64-linux-gnu. In the x86_64-linux-gnu case, the subdirectory names correspond to the vendor-independent x86-64 microarchitecture levels defined in the x86-64 psABI supplement. ``` [ʳᵉˡᵉᵃˢᵉ ⁿᵒᵗᵉ](https://sourceware.org/pipermail/libc-alpha/2021-February/122207.html) This means we can now build different optimized libraries for each of those microarchitectures and have the dynamic linker use them if the host cpu supports the optimizations. There are currently 4 levels of optimizations: * baseline (the current default) * CMOV * CX8 * FPU * FXSR * MMX * OSFXSR * SCE * SSE * SSE2 * x86-64-v2 * CMPXCHG16 * LAHF-SAHF * POPCNT * SSE3 * SSE4-1 * SSE4-2 * SSSE3 * x86-64-v3 * AVX * AVX2 * BMI1 * BMI2 * F16C * FMA * LZCNT * MOVBE * OSXSAVE * x86-64-v4 * AVX512F * AVX512BW * AVX512CD * AVX512DQ * AVX512VL [ᵈᵒᶜ ʳᵉᶠᵉʳᵉⁿᶜᵉ](https://gitlab.com/x86-psABIs/x86-64-ABI/-/blob/master/x86-64-ABI/low-level-sys-info.tex) The idea for hackweek would be to first build a couple of optimized libraries manually, put them on those directories and test that it actually works. Then work on providing rpm macros and some documentation to make it easy to build different flavors of libraries with `-march=x86-64-v2/v3/v4`, install them in the right locations and get subpackages generated. For example, the libfoo1 package would have libfoo1-x86-64-v2, libfoo1-x86-64-v3 and libfoo1-x86-64-v4 subpackages with only the respective optimized libraries in their filelist and they would use `Supplements: packageand(libfoo1:x86-64-v3)` so a user could install a x86-64-v3 package (name TBD) and get the optimized flavor for that microarchitecture installed automatically for installed libraries. This would hopefully get a performance benefit in openSUSE Tumbleweed (and in SLE/Leap once they include the new glibc version). After a quick talk with Florian Weimer (from glibc/Red Hat) who proposed a better and less intrusive approach, the plan (after the manual test mentioned above) is to: * Hack gcc to add an option to keep the GIMPLE bytecode when linking (`-ffat-lto-objects` might work for this but I'd need to test that). * Hack gcc to add an option to "relink" (or "reoptimize") an existing library/executable using its embedded GIMPLE bytecode and generate a new library/executable optimized for a given microarchitecture. * Check how to use objcopy to strip the embedded GIMPLE bytecode from the original library/executable after everything is finished. * Provide rpm macros to generate the mentioned subpackages and scripts in /usr/lib/rpm that would be run after the package is built in order to relink libraries for all microarchitectures with the new added options without having to rebuild the whole package several times. This will be more difficult than expected since I don't have much experience with gcc's internals, but I guess that's what hackweeks are for :) ## Results There's a report with the results of this hackweek project here: [https://antlarr.io/2021/03/hackweek-20-glibc-hwcaps-in-opensuse/](https://antlarr.io/2021/03/hackweek-20-glibc-hwcaps-in-opensuse/) and an explanation of the new rpm macros that were created for this, here: [https://antlarr.io/2021/03/microarchitecture-rpm-macros/](https://antlarr.io/2021/03/microarchitecture-rpm-macros/)
Looking for hackers with the skills:
Nothing? Add some keywords!
This project is part of:
Hack Week 20
This project is one of its kind!