- Package:
- pidgin-otr
- Source:
- pidgin-otr
- Description:
- Off-the-Record Messaging plugin for Pidgin
- Submitter:
- Michael Berg
- Date:
- 2011-12-27 11:57:20 UTC
- Severity:
- important
After installing gaim-otr, when gaim is started it pops up a dialog box titled "GStreamer Failure" and with contents "GStreamer failed to initialize" In the console I started gaim from, several lines that looks like ===== *** glibc detected *** free(): invalid pointer: 0x00000000005f9fd8 *** ===== print out, and then one new line is printed about every 18 seconds. In the buddy list window, each messaging service is off-line and has an error message to the effect of "disconnected: ... unable to send request to resolver process" or "disconnected: Couldn't connect to host" When I run gaim in debug mode (gaim -d), the following is in the output: ===== .... *** glibc detected *** free(): invalid pointer: 0x00000000005f9fd8 *** dns: Created new DNS child 21220, there are now 1 children. dns: DNS child 21220 no longer exists dnsquery: Unable to send request to resolver process proxy: Connection attempt failed: Unable to send request to resolver process *** glibc detected *** free(): invalid pointer: 0x00000000005f9fd8 *** dns: Created new DNS child 21221, there are now 1 children. dns: DNS child 21221 no longer exists dnsquery: Unable to send request to resolver process proxy: Connection attempt failed: Unable to send request to resolver process .... ===== When I remove gaim-otr, gaim works properly. Without gaim-otr installed, the same section in debug mode looks like: ===== .... dns: Created new DNS child 21274, there are now 1 children. dns: Successfully sent DNS request to child 21274 dns: Created new DNS child 21275, there are now 2 children. dns: Successfully sent DNS request to child 21275 .... =====
tags 411301 upstream help thanks This bug looks a lot like #404590. I think upstream is working on a fix. Michael: do you happen to have 'combined' contacts, as in #404590? Ian and OTR people, FYI, if this bug isn't fixed ASAP, gaim-otr will unfortunately likely be removed from Debian 4.0 (etch) because of the severity of this bug... HTH T-Bone
I don't think this looks like #404590 at all. That bug has to do with multiple conversations being assigned to the same window (something new in gaim 2 beta, and somewhat of a security problem in and of itself). Here, Michael is reporting that gaim doesn't start up at all! This can't be a widespread problem, though, since we'd definitely have heard about it by now. Is anyone else running Debian amd64 (x86_64) that can test this? Michael, what other gaim plugins do you have installed? Can you send me the entire output of gaim -d? What version of gaim is etch going to have? gaim-otr still works great with the last release (1.5), but is apparently having some issues with the rapidly changing gaim 2 betas. - Ian
I'm not using the "combined" contacts mentioned in bug #404590. My original bug post was from an amd64 Debian install (64-bit Linux) The glib message is always for the same address on the 64-bit system ===== *** glibc detected *** free(): invalid pointer: 0x00000000005f9fd8 *** ===== I just installed gaim and gaim-otr in my 32-bit chroot environment to test there and have the same problems. The glibc error is always the following address in the 32-bit chroot ===== *** glibc detected *** free(): invalid pointer: 0x08118838 *** ===== (maybe the 32-bit address will help track down the problem if you don't have a 64-bit system to debug on) I just checked /proc/<gaim_pid>/maps on my 64-bit install when I was running gaim+otr, and the offending address of 0x00000000005f9fd8 is in the heap space of the gaim process. ===== 005aa000-009f6000 rw-p 005aa000 00:00 0 [heap] ===== (what you'd expect for a call to free(), but having the base address of the heap may also help with debugging) I've also checked with several friends who are running Debian unstable and are using gaim + gaim-otr -- but who aren't having this problem. The only difference I can identify between our systems so far is that I'm using ldap for authenticating users on my home network, while my friend's machines are "/etc/passwd and /etc/shadow" only for user authentication. I can't test this easily right now as my effected user won't be able to log in to test if I turn ldap off. I've got a laptop I'm going to put Debian back on tonight. It won't be using ldap for user auth on the laptop, and I can test then to see if that makes a difference. Anyway, I hope some of this helps track down my gaim-otr is causing gaim to fail on my system. If you would like the output of /proc/<gaim_pid>/maps, any output from gdb, or from gaim-dbg, just let me know. - Michael Thibaut VARENE wrote:
gzip'd output of "gaim -d > gaim_debug_log.txt" is attached. - Michael
what I have installed (plus the gaim-otr package when I put it back in for testing). As for what's actually enabled out of the optional plugins: On the gaim plugin manager, the "Message Timestamp Formats" plugin is the only one I have a marked as enabled when OTR is not installed. - Michael
Good point. Unfortunately, if gaim doesn't start at all because of the plugin, there's no way I can lower the severity level of this bug... AFAICT for now, 2.0beta5 (see http://packages.qa.debian.org/g/gaim.html) HTH
What I'm saying is can we find someone else with a Debian amd64 to try this (apt-get install gaim-otr, see if gaim still works) in order to see if it's really a gaim-otr problem, or some weird side-effect of something else. - Ian
Michael reproduced the bug on a 32bit (x86) chroot... If need be I can setup a x86_64 chroot on one of my machines and test it there. HTH
I can't imagine how ldap could cause a problem like this in gaim. Can you check the versions of all the libraries gaim and gaim-otr depend on, on both systems? Can you run gaim under valgrind, and see what that says? Thanks, - Ian
Looks like ldap and/or gnutls actually might be involved somehow
(see valgrind discussion below).
All versions of libraries are what is currently in Debian unstable.
I did an "apt-get dist-upgrade" Feb 17 before filing the bug report.
(the valgrind comments below make this a long email, so I'm going to omit
the very lengthy list of gaim dependencies and their versions).
(uninitialised values, memory leaks, and other bug fodder... sigh...),
so I ran a baseline valgrind of gaim without gaim-otr installed.
Baseline was generated with
$ valgrind --log-file='gaim_valgrind' gaim
(gzip'd output is attached as gaim_valgrind.21818.gz)
And here's the valgrind run with gaim-otr installed
$ valgrind --log-file='gaim+otr_valgrind' gaim
(gzip'd output is attached as gaim+otr_valgrind.21678.gz)
Ok, here's what I see at first glance:
Baseline run of valgrind:
-------------------------
PID of gaim is 21818.
At line 116 of the valgrind capture, it shows PID 21819, which should be
the DNS child (which works ok without OTR installed, but you can see the
reports about memory leaks that valgrind complains about) before it goes
back to main gaim PID 21818 at line 130.
Gaim+OTR run of valgrind:
-------------------------
PID of gaim is 21678.
At line 116 of the valgrind capture, it shows PID 21680, which should be
the DNS child again. However, with OTR installed, there is now a sequence:
=====
Invalid read of size 8
at 0xEC17AF9: (within /usr/lib/libotr.so.2.0.0)
by 0x93C4358: (within /usr/lib/libgnutls.so.13.0.9)
by 0x93B939F: gnutls_deinit (in /usr/lib/libgnutls.so.13.0.9)
by 0x8DF8045: gnutls_SSL_free (in /usr/lib/libldap_r.so.2.0.130)
by 0x8DF62D9: (within /usr/lib/libldap_r.so.2.0.130)
by 0x8F0FA38: ber_sockbuf_remove_io (in /usr/lib/liblber.so.2.0.130)
by 0x8F0FAC6: ber_int_sb_destroy (in /usr/lib/liblber.so.2.0.130)
by 0x8F0FB3B: ber_sockbuf_free (in /usr/lib/liblber.so.2.0.130)
by 0x8DE25CF: ldap_ld_free (in /usr/lib/libldap_r.so.2.0.130)
by 0x8CB0C7C: (within /lib/libnss_ldap-2.3.6.so)
by 0x8CB425E: (within /lib/libnss_ldap-2.3.6.so)
by 0x64FC03E: fork (in /usr/lib/debug/libc-2.3.6.so)
Address 0xAA0EB10 is 8 bytes before a block of size 1,072 alloc'd
at 0x4A1BA55: malloc (vg_replace_malloc.c:149)
by 0x9AC4B12: (within /usr/lib/libgcrypt.so.11.2.2)
by 0x9AC4CE8: gcry_malloc (in /usr/lib/libgcrypt.so.11.2.2)
by 0x9AC4EEE: gcry_calloc (in /usr/lib/libgcrypt.so.11.2.2)
by 0x9AC9DEA: gcry_cipher_open (in /usr/lib/libgcrypt.so.11.2.2)
by 0x93C44E2: (within /usr/lib/libgnutls.so.13.0.9)
by 0x93A8936: _gnutls_cipher_init (in /usr/lib/libgnutls.so.13.0.9)
by 0x93B2E6C: _gnutls_read_connection_state_init (in
/usr/lib/libgnutls.so.13.0.9)
by 0x93A3FEC: (within /usr/lib/libgnutls.so.13.0.9)
by 0x93A4105: _gnutls_handshake_common (in /usr/lib/libgnutls.so.13.0.9)
by 0x93A4CBF: gnutls_handshake (in /usr/lib/libgnutls.so.13.0.9)
by 0x8DF7E0D: (within /usr/lib/libldap_r.so.2.0.130)
Invalid free() / delete / delete[]
at 0x4A1B66A: free (vg_replace_malloc.c:233)
by 0x93C4358: (within /usr/lib/libgnutls.so.13.0.9)
by 0x93B939F: gnutls_deinit (in /usr/lib/libgnutls.so.13.0.9)
by 0x8DF8045: gnutls_SSL_free (in /usr/lib/libldap_r.so.2.0.130)
by 0x8DF62D9: (within /usr/lib/libldap_r.so.2.0.130)
by 0x8F0FA38: ber_sockbuf_remove_io (in /usr/lib/liblber.so.2.0.130)
by 0x8F0FAC6: ber_int_sb_destroy (in /usr/lib/liblber.so.2.0.130)
by 0x8F0FB3B: ber_sockbuf_free (in /usr/lib/liblber.so.2.0.130)
by 0x8DE25CF: ldap_ld_free (in /usr/lib/libldap_r.so.2.0.130)
by 0x8CB0C7C: (within /lib/libnss_ldap-2.3.6.so)
by 0x8CB425E: (within /lib/libnss_ldap-2.3.6.so)
by 0x64FC03E: fork (in /usr/lib/debug/libc-2.3.6.so)
Address 0xAA0EB10 is 8 bytes before a block of size 1,072 alloc'd
at 0x4A1BA55: malloc (vg_replace_malloc.c:149)
by 0x9AC4B12: (within /usr/lib/libgcrypt.so.11.2.2)
by 0x9AC4CE8: gcry_malloc (in /usr/lib/libgcrypt.so.11.2.2)
by 0x9AC4EEE: gcry_calloc (in /usr/lib/libgcrypt.so.11.2.2)
by 0x9AC9DEA: gcry_cipher_open (in /usr/lib/libgcrypt.so.11.2.2)
by 0x93C44E2: (within /usr/lib/libgnutls.so.13.0.9)
by 0x93A8936: _gnutls_cipher_init (in /usr/lib/libgnutls.so.13.0.9)
by 0x93B2E6C: _gnutls_read_connection_state_init (in
/usr/lib/libgnutls.so.13.0.9)
by 0x93A3FEC: (within /usr/lib/libgnutls.so.13.0.9)
by 0x93A4105: _gnutls_handshake_common (in /usr/lib/libgnutls.so.13.0.9)
by 0x93A4CBF: gnutls_handshake (in /usr/lib/libgnutls.so.13.0.9)
by 0x8DF7E0D: (within /usr/lib/libldap_r.so.2.0.130)
=====
that happens twice before the error and leak summary and it goes back to
the main gaim PID of 21678 at line 238.
This same basic pattern repeats itself for PID 21682 on lines 259-366, PID
21683 on lines 367-474, PID 21707 on lines 496-603, PID 21714 on lines
604-711, and so on.
- Michael
severity 411301 important thanks Well, this problem indeed doesn't seem to be reproducible on i386 or amd64 when not using nss_ldap. Given that users of other gnutls- or gcrypt-using packages aren't reporting similar problems, it seems likely that this is a bug in gaim-otr or libotr, but I don't think it's one that should block the package from being released; it is generally usable, just not in certain system configurations. Thanks,
Is it reproducible on other systems that *do* use nss_ldap? Can you turn nss_ldsp on on one of those other systems you tested, and try again? The output of valgrind should definitely help find the problem, in any event. Thanks, - Ian
I'll do this tonight. I did a clean install of Debian unstable onto a laptop this weekend - but I got busy and wasn't able to test gaim on it yet (still have to copy over my user account and such). When I get home tonight, I'll test without nss_ldap first, and then I'll change it to use nss_ldap for the home network and try again. It's an i386 system, and I'll provide valgrind output for the following test cases: 1) gaim, no OTR, no nss_ldap 2) gaim+OTR, no nss_ldap 3) gaim, no OTR, with nss_ldap 4) gaim+OTR, with nss_ldap That covers all combinations and should provide some useful comparison points. If there is anything else you'd like for debugging purposes, let me know (specific options to valgrind, etc) and I'll put them in the queue for when I get home. - Michael
Steve Langasek wrote: Yeah, I do have a system configuration that you don't run across every day ;-) Ian Goldberg wrote: gaim with and without gaim-otr installed. gaim and gaim+otr both worked properly. Then I installed libnss-ldap and libpam-ldap and configured them for my network setup. gaim (without otr) worked, but gaim+otr had the same errors as I reported for my amd64 system (and the 32-bit chroot I also tested there). So I can duplicate the bug when nss_ldap is in use. Valgrind output is attached for the following 4 test cases: 1) gaim (without otr or nss_ldap): gaim.9110.gz 2) gaim+otr (without nss_ldap): gaim+otr.9180.gz 3) gaim with nss_ldap in use: gaim+ldap.10134.gz 4) gaim+otr with nss_ldap in use: gaim+otr+ldap.10038.gz Unfortunately, all of these valgrind runs on that x86 laptop have a TON of ===== Conditional jump or move depends on uninitialised value(s) at 0x5C55DC7: (within /usr/lib/libsoftokn3.so.0d) by 0x5C5617D: (within /usr/lib/libsoftokn3.so.0d) by 0x5C56243: (within /usr/lib/libsoftokn3.so.0d) by 0x5C56471: (within /usr/lib/libsoftokn3.so.0d) by 0x5C566D8: (within /usr/lib/libsoftokn3.so.0d) by 0x5C445C4: (within /usr/lib/libsoftokn3.so.0d) by 0x5C44721: (within /usr/lib/libsoftokn3.so.0d) by 0x5B9B7C2: (within /usr/lib/libnss3.so.0d) by 0x5B9B8C2: (within /usr/lib/libnss3.so.0d) by 0x5BA4133: SECMOD_LoadModule (in /usr/lib/libnss3.so.0d) by 0x5BA428A: SECMOD_LoadModule (in /usr/lib/libnss3.so.0d) by 0x5B8342D: (within /usr/lib/libnss3.so.0d) ===== that occur in each capture. I didn't see these on the amd64 system or the x86 chroot I set up on that system. Do you guys get pages and pages of that output when you run valgrind on gaim on a 32-bit x86 system? If not, I'll try to figure out what's causing it on that freshly installed laptop. - Michael
Sorry, small correction. I *thought* I'd run valgrind in my chroot, but I hadn't. When I run "valgrind gaim" in my Debian i386 chroot, I get the same pages upon pages of ===== by 0x5C5617D: (within /usr/lib/libsoftokn3.so.0d) ===== related output as I did on the laptop. Oh, and just to avoid any possibly confusion for people not familiar with these, libnss3.so.0d and libnss_ldap-2.3.6.so are of no relation. libnss_ldap-2.3.6.so is in the libnss-ldap package, which is the "NSS module for using LDAP as a naming service" libnss3.so.0d is in the libnss3-0d package, which is a "Network Security Service libraries" that came out of Netscape/Mozilla - Michael
OK, I think I see the problem. libldap_r.so.2.0.130 and gaim-otr both use libgcrypt. libgcrypt has a well-known problem that it can't be used as a shared library by more than one client in the same program. This is because it uses global variables that the two clients (gaim-otr and libldap) both need to use in different ways. We also see this problem, for example, if you try to use gaim-otr and another encrypting plugin that uses libgcrypt. One solution would be to statically link libgcrypt into gaim-otr (so it gets its own copies of global data). Can one of the Debian people try to build a package like that? A better solution, of course, would be to make libgcrypt sharing-friendly. This had been discussed on libgcrypt's mailing list a while back, but I don't know what, if anything, became of it. Thanks, - Ian
I figured out how to do this on Linux. I have no idea how to do it on
other systems, or how to get libtool to do this on its own (possibly not
possible).
The problem, again: other gaim plugins, such as Jabber or ldap, use
libgcrypt (often in the guise of TLS). libgcrypt uses global variables,
and assumes that it will only have one caller per address space. The
various plugins initialize libgcrypt's global variables, and stomp all
over each other's (and gaim-otr's) initializations. Badness ensues.
The "right" solution is for libgcrypt to stop using global variables,
and to pass handles around. Back-compatibility can be easily arranged
by having the current routines do something like:
gcry_foo(int bar, char *baz)
{
gcry_foo_r(&global_handle, bar, baz);
}
but callers "in the know" could call gcry_foo_r directly with a private
handle.
But until that happens, here's a workaround for gaim-otr to link
libgcrypt statically. It's actually pretty tricky, since it seems calls
from one .o to another in a .so file are always looked up dynamically,
so if another copy of libgcrypt exists in the address space, you'll
still get that one. So you have to put everything in a single .o, make
the libgcrypt symbols local, and turn the result into a .so.
Here's Makefile.static (for gaim-otr):
.libs/gaim-otr.so: FORCE
# Build everything from the standard Makefile
make
# Link everything, including libotr and libgcrypt, together into
# a single .o file
ld -r .libs/otr-plugin.o .libs/ui.o .libs/dialogs.o .libs/gtk-ui.o .libs/gtk-dialog.o /usr/lib/libotr.a /usr/lib/libgcrypt.a /usr/lib/libgpg-error.a -o .libs/gaim-otr-shared.o
# Make all the libgcrypt references local to that .o file
objcopy -w -L '*gcry*' .libs/gaim-otr-shared.o .libs/gaim-otr-static.o
# Turn the .o into a .so
gcc -shared .libs/gaim-otr-static.o -Wl,-soname -Wl,gaim-otr.so -o .libs/gaim-otr.so
FORCE:
- Ian
Hi Steve, Nobody reacted to this email. What's your opinion on this? Is this an acceptable fix for Debian? Thanks T-Bone
Salut Thibaut,
I just wanted to notify you that the fix proposed by Ian does not work
on my box. The DNS children still die; the configuration of my system is
as described previously by other people - x86_64, ldap, problem only
appears when using pidgin-otr in addition to pidgin.
All the best,
Hubert