Stop searching for shared libraries

Aug 4th, 2022

Nix, Guix, Gentoo Prefix and Spack install every package in their own immutable prefix. These prefix directories contain a unique hash derived from the versions and flavors of the package itself and its dependencies, which ensures that multiple variants of the same package can coexist.

The non-standard directory structure makes life hard for the dynamic linker to locate libraries — but what if we don't have to search at all?

Troubles locating libraries

When running a dynamically linked executable, the dynamic linker has to find the required shared libraries in these non-standard directories. Most build systems make an effort to properly locate libraries to link to, but don't leave hints to the dynamic linker what library they actually linked to. This can be frustrating, cause some library gets linked, but at runtime this library is not found:

$ gcc -shared -o libf.so -x c - <<EOF
#include <stdio.h>
void f() { puts("hello world"); }
EOF

$ gcc -o main -x c - -L. -lf <<EOF
void f();
int main() { f(); }
EOF
$ ./main 
./main: error while loading shared libraries: libf.so: cannot open shared object file: No such file or directory

On traditional Linux distros this is typically not an issue, since libraries are installed in a default location such as /usr/lib, and during the build you can always set LD_LIBRARY_PATH to the build dir if you need to run something:

$ LD_LIBRARY_PATH=. ./main 
hello world

In fact autotools packages give various tips on how to ensure that libraries are located at runtime:

Libraries have been installed in:
   /opt/spack/linux-ubuntu20.04-zen/gcc-7.5.0/texinfo-6.5-uffty3xizvrgyisiaklf3dfewgvqj3oy/lib/texinfo

If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the '-LLIBDIR'
flag during linking and do at least one of the following:
   - add LIBDIR to the 'LD_LIBRARY_PATH' environment variable
     during execution
   - add LIBDIR to the 'LD_RUN_PATH' environment variable
     during linking
   - use the '-Wl,-rpath -Wl,LIBDIR' linker flag
   - have your system administrator add LIBDIR to '/etc/ld.so.conf'

Requiring LD_LIBRARY_PATH is clearly bad user experience, and relying on /etc/ld.so.conf is not an option when multiple variants of the same library should be able to coexist.

Using rpath

In Spack, the solution is to rely on rpaths, which are additonal search paths registered in the executable or library itself, considered by the dynamic linker before it looks in the system paths:

$ gcc -o main -x c - -L. -lf -Wl,-rpath,$PWD <<EOF
void f();
int main() { f(); }
EOF

$ ./main
hello world

Registering search paths in the binary for the binary is clearly an improvement over global search paths, and this should solve all problems, right?

Unfortunately though, the problem is not entirely solved: for a package manager it is still unclear what rpaths to register. In Spack, the rpaths are determined heuristically: take the prefix path of each link-type dependency, as well as the install directory of the package itself, and join the path with lib or lib64, since that's where libraries are typically installed.

There's another (minor) issue with rpaths too, namely that they increase startup time. glibc's dynamic linker uses a cache, which maps needed libraries to their install location. When setting rpaths, this cache is not used, and in fact there are tons of stat calls. This problem has been addressed in Guix by patching glibc's loader to use a per-package loader cache.

Killing two birds with one stone

To solve both the discrepancy between the linker & dynamic loader and the “stat storm” issue, a much simpler solution is to change the soname to the absolute path of the library after it's installed.

The soname is an identifier that (by convention) consists of libname.so.abi-version. The linker copies the soname as a needed library into the dynamic section of the dependent binary:

$ gcc -shared -o libf.so.4.2.1 -x c -Wl,-soname,libf.so.1 - <<EOF
#include <stdio.h>
void f() { puts("hello world"); }
EOF

$ ln -s libf.so.4.2.1 libf.so
$ ln -s libf.so.4.2.1 libf.so.1
$ ls
libf.so  libf.so.1  libf.so.4.2.1

$ gcc -o main -x c - -L. -lf <<EOF
void f();
int main() { f(); }
EOF

$ readelf -d main | grep libf
 0x0000000000000001 (NEEDED)             Shared library: [libf.so.1]

$ LD_LIBRARY_PATH=. ./main
hello world

In this typical example, the linker locates the library as libf.so which is a symlink to libf.so.4.2.1. It copies over the soname (which is libf.so.1) into the executable, and subsequently the dynamic loader locates the library as libf.so.1 at runtime. Note that these version suffixes and symlinks are a convention, there is no rule to it.

Now, nothing prevents us from setting a soname that contains a / dir separator, the linker happily copies the soname verbatim as a string. The dynamic loader will not search for a needed library if it contains a forward slash /, instead it interprets it as a path and loads it directly.

And this is our way out of rpath heuristics and “stat storms”: simply set the soname of a library to its own absolute path upon install, and use the linker and dynamic loader in their natural way:

$ gcc -shared -o libf.so -x c -Wl,-soname,$PWD/libf.so - <<EOF
#include <stdio.h>
void f() { puts("hello world"); }
EOF

$ gcc -o main -x c - -L. -lf <<EOF
void f();
int main() { f(); }
EOF

$ ./main
hello world

Much better.

Note that rpaths could still be useful when the executable or library actually dynamically loads libraries with dlopen. Fortunately non-standard sonames are not an issue for dlopen(filename, ...): it simply locates the library by filename.