Hello,
I'm not completely sure where this bug is but there's a bad interaction of Poppler and SoftHSM 2 when used together via the Mozilla NSS backend for PDF signature operations on my Trixie system. This has caused crashes when GNOME Papers is closing as well as crashes when invoking pdfsig, even when I don't pass any arguments, like this:
I haven't witnessed this same weirdness with Okular, but I haven't used it as much for this sort of thing just yet.
Here's how to reproduce in a shell. You will need softhsm2 and libnss3-tools binary packages installed.
1. Change into a clean and empty directory of your choice; perhaps a subdirectory of /tmp/.
2. Craft a SoftHSM configuration file
eval tee $'<<END\ndirectories.tokendir='"$(pwd)"/tokens$'\nobjectstore.backend=file\nEND\n' > softhsm.conf
3. Set SOFTHSM2_CONF environment variable to reference that just-made configuration file
export SOFTHSM2_CONF=$(pwd)/softhsm.conf
4. Initialize SoftHSM; we need not load any actual keys into it.
mkdir ./tokens/
softhsm2-util --init-token --free --label "SoftHSM2 test" --pin 123456 --so-pin 12345678
Now SoftHSM is set up. We need to make a Mozilla NSS database that "knows of" SoftHSM via its PKCS#11 module.
5. Create Mozilla database directory (analogous to ~/.pki/nssdb/ that GNOME applications frequently use)
mkdir ./nssdb/
certutil -N -d ./nssdb/ --empty-password
Note that the database itself has no password, but SoftHSM does, and this is probably an important distinction for further detective work.
6. Add SoftHSM to NSS's known PKCS#11 modules to use
modutil -add "SoftHSM 2" -libfile /usr/lib/softhsm/libsofthsm2.so -dbdir nssdb -force
7. Now you are ready to reproduce. To do that, you can run pdfsig without any arguments except for the one telling it to use our just-made NSS database:
$ pdfsig -nssdir ./nssdb/
Segmentation fault (core dumped)
Reproducing it may require a few tries; it seems like a use-after-free-type issue in an atexit()-registered handler from any one of the several libraries involved. If you're down on your luck, sprinkle in your choice of toppings:
• set LD_PRELOAD=libc_malloc_debug.so GLIBC_TUNABLES=glibc.malloc.check=3
see https://sourceware.org/glibc/manual/latest/html_node/Memory-Allocation-Tunables.html
• disable address space layout randomization using 'setarch' from util-linux as a wrapper, like this:
setarch -vR -- pdfsig -nssdir ./nssdb/
Like many GLib programs, pdfsig uses several threads on every invocation, and especially to print a summary of its command-line options. It seems exit() is called when multiple threads are still around, possibly waiting on condition variables, but only one is left at the time of the crash:
Thread 1 (Thread 0x7ffff5a3f9c0 (LWP 470687) "pdfsig"):
#0 SlotManager::getSlot (this=0x0, slotID=slotID@entry=957537852) at ./src/lib/slot_mgr/SlotManager.cpp:174
#1 0x00007ffff56aaa7d in SoftHSM::C_CloseAllSessions (this=0x55555563a290, slotID=slotID@entry=957537852) at ./src/lib/SoftHSM.cpp:1386
#2 0x00007ffff568ab68 in C_CloseAllSessions (slotID=957537852) at ./src/lib/main.cpp:347
#3 0x00007ffff716a19c in PK11_DestroySlot (slot=0x5555556d7230) at ./nss/lib/pk11wrap/pk11slot.c:454
#4 0x00007ffff716a245 in PK11_FreeSlot (slot=<optimized out>) at ./nss/lib/pk11wrap/pk11slot.c:491
#5 0x00007ffff716ee72 in SECMOD_DestroyModule (module=0x55555563fd60) at ./nss/lib/pk11wrap/pk11util.c:904
#6 SECMOD_DestroyModule (module=0x55555563fd60) at ./nss/lib/pk11wrap/pk11util.c:866
#7 0x00007ffff716f1be in SECMOD_DestroyModuleListElement (element=0x5555556d7210) at ./nss/lib/pk11wrap/pk11util.c:950
#8 0x00007ffff716f6c5 in SECMOD_DestroyModuleList (list=<optimized out>) at ./nss/lib/pk11wrap/pk11util.c:965
#9 0x00007ffff716f74d in SECMOD_Shutdown () at ./nss/lib/pk11wrap/pk11util.c:68
#10 0x00007ffff7121ee9 in nss_Shutdown () at ./nss/lib/nss/nssinit.c:1163
#11 0x00007ffff7121fd8 in NSS_Shutdown () at ./nss/lib/nss/nssinit.c:1221
#12 0x00007ffff7c788fd in shutdownNss () at ./poppler/NSSCryptoSignBackend.cc:215
#13 0x00007ffff744e2b1 in __run_exit_handlers (status=99, listp=0x7ffff75f1680 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at ./stdlib/exit.c:118
#14 0x00007ffff744e37a in __GI_exit (status=<optimized out>) at ./stdlib/exit.c:148
#15 0x00007ffff7435caf in __libc_start_call_main (main=main@entry=0x555555557640 <main(int, char**)>, argc=argc@entry=3, argv=argv@entry=0x7fffffffdc58) at ../sysdeps/nptl/libc_start_call_main.h:74
#16 0x00007ffff7435d65 in __libc_start_main_impl (main=0x555555557640 <main(int, char**)>, argc=3, argv=0x7fffffffdc58, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffdc48) at ../csu/libc-start.c:360
#17 0x0000555555559b81 in _start ()
That function body is
/* SlotManager.cpp */
170 // Get one slot
171 Slot* SlotManager::getSlot(CK_SLOT_ID slotID)
172 {
173 try {
174 return slots.at(slotID);
175 } catch( const std::out_of_range &oor) {
176 DEBUG_MSG("slotID is out of range: %s", oor.what());
177 return NULL_PTR;
178 }
179 }
This is about as far as I got. There were some other breadcrumb trails I was following but they've dried up when I switched to this stripped-down reproducer. A common mistake with atexit() handlers occurs in multithreaded programs: when exit handlers are being called, POSIX leaves it unspecified what the state of the other threads are (they may have already been made to terminate, or could still be running autonomously, or something in between). In particular if those other threads are waiting on synchronization objects like condition variables, and an atexit() handler is intended to destroy those synchronization objects, that's big trouble.