#425317 pidgin crashes when pidgin-otr is enabled in nss_ldap setups

Package:
pidgin-otr
Source:
pidgin-otr
Description:
Off-the-Record Messaging plugin for Pidgin
Submitter:
Michael Berg
Date:
2011-12-27 11:57:20 UTC
Severity:
important
#425317#5
Date:
2007-02-17 23:13:43 UTC
From:
To:
After installing gaim-otr, when gaim is started it pops up a dialog box
titled "GStreamer Failure" and with contents "GStreamer failed to initialize"

In the console I started gaim from, several lines that looks like
=====
*** glibc detected *** free(): invalid pointer: 0x00000000005f9fd8 ***
=====
print out, and then one new line is printed about every 18 seconds.

In the buddy list window, each messaging service is off-line and has
an error message to the effect of
"disconnected: ... unable to send request to resolver process" or
"disconnected: Couldn't connect to host"


When I run gaim in debug mode (gaim -d), the following is in the output:
=====
....
*** glibc detected *** free(): invalid pointer: 0x00000000005f9fd8 ***
dns: Created new DNS child 21220, there are now 1 children.
dns: DNS child 21220 no longer exists
dnsquery: Unable to send request to resolver process

proxy: Connection attempt failed: Unable to send request to resolver process

*** glibc detected *** free(): invalid pointer: 0x00000000005f9fd8 ***
dns: Created new DNS child 21221, there are now 1 children.
dns: DNS child 21221 no longer exists
dnsquery: Unable to send request to resolver process

proxy: Connection attempt failed: Unable to send request to resolver process
....
=====


When I remove gaim-otr, gaim works properly.
Without gaim-otr installed, the same section in debug mode looks like:
=====
....
dns: Created new DNS child 21274, there are now 1 children.
dns: Successfully sent DNS request to child 21274
dns: Created new DNS child 21275, there are now 2 children.
dns: Successfully sent DNS request to child 21275
....
=====

#425317#10
Date:
2007-02-17 23:48:26 UTC
From:
To:
tags 411301 upstream help
thanks

This bug looks a lot like #404590. I think upstream is working on a fix.

Michael: do you happen to have 'combined' contacts, as in #404590?

Ian and OTR people, FYI, if this bug isn't fixed ASAP, gaim-otr will
unfortunately likely be removed from Debian 4.0 (etch) because of the
severity of this bug...

HTH

T-Bone

#425317#13
Date:
2007-02-18 01:38:22 UTC
From:
To:
I don't think this looks like #404590 at all.  That bug has to do
with multiple conversations being assigned to the same window (something
new in gaim 2 beta, and somewhat of a security problem in and of
itself).

Here, Michael is reporting that gaim doesn't start up at all!

This can't be a widespread problem, though, since we'd definitely have
heard about it by now.  Is anyone else running Debian amd64 (x86_64)
that can test this?

Michael, what other gaim plugins do you have installed?  Can you send
me the entire output of gaim -d?

What version of gaim is etch going to have?  gaim-otr still works great
with the last release (1.5), but is apparently having some issues with
the rapidly changing gaim 2 betas.

   - Ian

#425317#14
Date:
2007-02-18 02:00:36 UTC
From:
To:
I'm not using the "combined" contacts mentioned in bug #404590.

My original bug post was from an amd64 Debian install (64-bit Linux)
The glib message is always for the same address on the 64-bit system
=====
*** glibc detected *** free(): invalid pointer: 0x00000000005f9fd8 ***
=====

I just installed gaim and gaim-otr in my 32-bit chroot environment to test
there and have the same problems.
The glibc error is always the following address in the 32-bit chroot
=====
*** glibc detected *** free(): invalid pointer: 0x08118838 ***
=====
(maybe the 32-bit address will help track down the problem if you don't
have a 64-bit system to debug on)

I just checked /proc/<gaim_pid>/maps on my 64-bit install when I was
running gaim+otr, and the offending address of 0x00000000005f9fd8 is in the
heap space of the gaim process.
=====
005aa000-009f6000 rw-p 005aa000 00:00 0    [heap]
=====
(what you'd expect for a call to free(), but having the base address of the
heap may also help with debugging)


I've also checked with several friends who are running Debian unstable and
are using gaim + gaim-otr -- but who aren't having this problem.
The only difference I can identify between our systems so far is that I'm
using ldap for authenticating users on my home network, while my friend's
machines are "/etc/passwd and /etc/shadow" only for user authentication.
I can't test this easily right now as my effected user won't be able to log
in to test if I turn ldap off.  I've got a laptop I'm going to put Debian
back on tonight.  It won't be using ldap for user auth on the laptop, and I
can test then to see if that makes a difference.

Anyway, I hope some of this helps track down my gaim-otr is causing gaim to
fail on my system.

If you would like the output of /proc/<gaim_pid>/maps, any output from gdb,
or from gaim-dbg, just let me know.

- Michael



Thibaut VARENE wrote:

#425317#15
Date:
2007-02-18 02:10:00 UTC
From:
To:
gzip'd output of "gaim -d > gaim_debug_log.txt" is attached.

- Michael

#425317#16
Date:
2007-02-18 02:18:57 UTC
From:
To:
what I have installed (plus the gaim-otr package when I put it back in for
testing).

As for what's actually enabled out of the optional plugins:
On the gaim plugin manager, the "Message Timestamp Formats" plugin is the
only one I have a marked as enabled when OTR is not installed.

- Michael

#425317#21
Date:
2007-02-18 09:49:45 UTC
From:
To:
Good point.

Unfortunately, if gaim doesn't start at all because of the plugin,
there's no way I can lower the severity level of this bug...

AFAICT for now, 2.0beta5 (see http://packages.qa.debian.org/g/gaim.html)

HTH

#425317#28
Date:
2007-02-18 15:13:11 UTC
From:
To:
What I'm saying is can we find someone else with a Debian amd64 to try
this (apt-get install gaim-otr, see if gaim still works) in order to see
if it's really a gaim-otr problem, or some weird side-effect of
something else.

   - Ian

#425317#33
Date:
2007-02-18 15:16:14 UTC
From:
To:
Michael reproduced the bug on a 32bit (x86) chroot... If need be I can
setup a x86_64 chroot on one of my machines and test it there.

HTH

#425317#34
Date:
2007-02-18 15:30:11 UTC
From:
To:
I can't imagine how ldap could cause a problem like this in gaim.  Can
you check the versions of all the libraries gaim and gaim-otr depend on,
on both systems?

Can you run gaim under valgrind, and see what that says?

Thanks,

   - Ian

#425317#35
Date:
2007-02-18 21:54:54 UTC
From:
To:
Looks like ldap and/or gnutls actually might be involved somehow
(see valgrind discussion below).

All versions of libraries are what is currently in Debian unstable.
I did an "apt-get dist-upgrade" Feb 17 before filing the bug report.

(the valgrind comments below make this a long email, so I'm going to omit
the very lengthy list of gaim dependencies and their versions).
(uninitialised values, memory leaks, and other bug fodder... sigh...),
so I ran a baseline valgrind of gaim without gaim-otr installed.

Baseline was generated with
    $ valgrind --log-file='gaim_valgrind' gaim
(gzip'd output is attached as gaim_valgrind.21818.gz)


And here's the valgrind run with gaim-otr installed
    $ valgrind --log-file='gaim+otr_valgrind' gaim
(gzip'd output is attached as gaim+otr_valgrind.21678.gz)


Ok, here's what I see at first glance:


Baseline run of valgrind:
-------------------------
PID of gaim is 21818.
At line 116 of the valgrind capture, it shows PID 21819, which should be
the DNS child (which works ok without OTR installed, but you can see the
reports about memory leaks that valgrind complains about) before it goes
back to main gaim PID 21818 at line 130.


Gaim+OTR run of valgrind:
-------------------------
PID of gaim is 21678.
At line 116 of the valgrind capture, it shows PID 21680, which should be
the DNS child again.  However, with OTR installed, there is now a sequence:
=====
Invalid read of size 8
  at 0xEC17AF9: (within /usr/lib/libotr.so.2.0.0)
  by 0x93C4358: (within /usr/lib/libgnutls.so.13.0.9)
  by 0x93B939F: gnutls_deinit (in /usr/lib/libgnutls.so.13.0.9)
  by 0x8DF8045: gnutls_SSL_free (in /usr/lib/libldap_r.so.2.0.130)
  by 0x8DF62D9: (within /usr/lib/libldap_r.so.2.0.130)
  by 0x8F0FA38: ber_sockbuf_remove_io (in /usr/lib/liblber.so.2.0.130)
  by 0x8F0FAC6: ber_int_sb_destroy (in /usr/lib/liblber.so.2.0.130)
  by 0x8F0FB3B: ber_sockbuf_free (in /usr/lib/liblber.so.2.0.130)
  by 0x8DE25CF: ldap_ld_free (in /usr/lib/libldap_r.so.2.0.130)
  by 0x8CB0C7C: (within /lib/libnss_ldap-2.3.6.so)
  by 0x8CB425E: (within /lib/libnss_ldap-2.3.6.so)
  by 0x64FC03E: fork (in /usr/lib/debug/libc-2.3.6.so)
Address 0xAA0EB10 is 8 bytes before a block of size 1,072 alloc'd
  at 0x4A1BA55: malloc (vg_replace_malloc.c:149)
  by 0x9AC4B12: (within /usr/lib/libgcrypt.so.11.2.2)
  by 0x9AC4CE8: gcry_malloc (in /usr/lib/libgcrypt.so.11.2.2)
  by 0x9AC4EEE: gcry_calloc (in /usr/lib/libgcrypt.so.11.2.2)
  by 0x9AC9DEA: gcry_cipher_open (in /usr/lib/libgcrypt.so.11.2.2)
  by 0x93C44E2: (within /usr/lib/libgnutls.so.13.0.9)
  by 0x93A8936: _gnutls_cipher_init (in /usr/lib/libgnutls.so.13.0.9)
  by 0x93B2E6C: _gnutls_read_connection_state_init (in
/usr/lib/libgnutls.so.13.0.9)
  by 0x93A3FEC: (within /usr/lib/libgnutls.so.13.0.9)
  by 0x93A4105: _gnutls_handshake_common (in /usr/lib/libgnutls.so.13.0.9)
  by 0x93A4CBF: gnutls_handshake (in /usr/lib/libgnutls.so.13.0.9)
  by 0x8DF7E0D: (within /usr/lib/libldap_r.so.2.0.130)

Invalid free() / delete / delete[]
  at 0x4A1B66A: free (vg_replace_malloc.c:233)
  by 0x93C4358: (within /usr/lib/libgnutls.so.13.0.9)
  by 0x93B939F: gnutls_deinit (in /usr/lib/libgnutls.so.13.0.9)
  by 0x8DF8045: gnutls_SSL_free (in /usr/lib/libldap_r.so.2.0.130)
  by 0x8DF62D9: (within /usr/lib/libldap_r.so.2.0.130)
  by 0x8F0FA38: ber_sockbuf_remove_io (in /usr/lib/liblber.so.2.0.130)
  by 0x8F0FAC6: ber_int_sb_destroy (in /usr/lib/liblber.so.2.0.130)
  by 0x8F0FB3B: ber_sockbuf_free (in /usr/lib/liblber.so.2.0.130)
  by 0x8DE25CF: ldap_ld_free (in /usr/lib/libldap_r.so.2.0.130)
  by 0x8CB0C7C: (within /lib/libnss_ldap-2.3.6.so)
  by 0x8CB425E: (within /lib/libnss_ldap-2.3.6.so)
  by 0x64FC03E: fork (in /usr/lib/debug/libc-2.3.6.so)
Address 0xAA0EB10 is 8 bytes before a block of size 1,072 alloc'd
  at 0x4A1BA55: malloc (vg_replace_malloc.c:149)
  by 0x9AC4B12: (within /usr/lib/libgcrypt.so.11.2.2)
  by 0x9AC4CE8: gcry_malloc (in /usr/lib/libgcrypt.so.11.2.2)
  by 0x9AC4EEE: gcry_calloc (in /usr/lib/libgcrypt.so.11.2.2)
  by 0x9AC9DEA: gcry_cipher_open (in /usr/lib/libgcrypt.so.11.2.2)
  by 0x93C44E2: (within /usr/lib/libgnutls.so.13.0.9)
  by 0x93A8936: _gnutls_cipher_init (in /usr/lib/libgnutls.so.13.0.9)
  by 0x93B2E6C: _gnutls_read_connection_state_init (in
/usr/lib/libgnutls.so.13.0.9)
  by 0x93A3FEC: (within /usr/lib/libgnutls.so.13.0.9)
  by 0x93A4105: _gnutls_handshake_common (in /usr/lib/libgnutls.so.13.0.9)
  by 0x93A4CBF: gnutls_handshake (in /usr/lib/libgnutls.so.13.0.9)
  by 0x8DF7E0D: (within /usr/lib/libldap_r.so.2.0.130)
=====
that happens twice before the error and leak summary and it goes back to
the main gaim PID of 21678 at line 238.

This same basic pattern repeats itself for PID 21682 on lines 259-366, PID
21683 on lines 367-474, PID 21707 on lines 496-603, PID 21714 on lines
604-711, and so on.


- Michael

#425317#40
Date:
2007-02-19 09:43:23 UTC
From:
To:
severity 411301 important
thanks

Well, this problem indeed doesn't seem to be reproducible on i386 or amd64
when not using nss_ldap.  Given that users of other gnutls- or gcrypt-using
packages aren't reporting similar problems, it seems likely that this is a
bug in gaim-otr or libotr, but I don't think it's one that should block the
package from being released; it is generally usable, just not in certain
system configurations.

Thanks,

#425317#47
Date:
2007-02-19 12:42:25 UTC
From:
To:
Is it reproducible on other systems that *do* use nss_ldap?  Can you
turn nss_ldsp on on one of those other systems you tested, and try
again?

The output of valgrind should definitely help find the problem, in any
event.

Thanks,

   - Ian

#425317#52
Date:
2007-02-19 15:26:03 UTC
From:
To:
I'll do this tonight.  I did a clean install of Debian unstable onto a
laptop this weekend - but I got busy and wasn't able to test gaim on it yet
(still have to copy over my user account and such).

When I get home tonight, I'll test without nss_ldap first, and then I'll
change it to use nss_ldap for the home network and try again.

It's an i386 system, and I'll provide valgrind output for the following
test cases:
1) gaim, no OTR, no nss_ldap
2) gaim+OTR, no nss_ldap
3) gaim, no OTR, with nss_ldap
4) gaim+OTR, with nss_ldap

That covers all combinations and should provide some useful comparison
points.  If there is anything else you'd like for debugging purposes, let
me know (specific options to valgrind, etc) and I'll put them in the queue
for when I get home.

- Michael

#425317#57
Date:
2007-02-20 03:34:21 UTC
From:
To:
Steve Langasek wrote:

Yeah, I do have a system configuration that you don't run across every day ;-)


Ian Goldberg wrote:
gaim with and without gaim-otr installed.  gaim and gaim+otr both worked
properly.

Then I installed libnss-ldap and libpam-ldap and configured them for my
network setup.  gaim (without otr) worked, but gaim+otr had the same errors
as I reported for my amd64 system (and the 32-bit chroot I also tested
there).  So I can duplicate the bug when nss_ldap is in use.

Valgrind output is attached for the following 4 test cases:
1) gaim (without otr or nss_ldap): gaim.9110.gz
2) gaim+otr (without nss_ldap):    gaim+otr.9180.gz
3) gaim with nss_ldap in use:      gaim+ldap.10134.gz
4) gaim+otr with nss_ldap in use:  gaim+otr+ldap.10038.gz

Unfortunately, all of these valgrind runs on that x86 laptop have a TON of
=====
Conditional jump or move depends on uninitialised value(s)
  at 0x5C55DC7: (within /usr/lib/libsoftokn3.so.0d)
  by 0x5C5617D: (within /usr/lib/libsoftokn3.so.0d)
  by 0x5C56243: (within /usr/lib/libsoftokn3.so.0d)
  by 0x5C56471: (within /usr/lib/libsoftokn3.so.0d)
  by 0x5C566D8: (within /usr/lib/libsoftokn3.so.0d)
  by 0x5C445C4: (within /usr/lib/libsoftokn3.so.0d)
  by 0x5C44721: (within /usr/lib/libsoftokn3.so.0d)
  by 0x5B9B7C2: (within /usr/lib/libnss3.so.0d)
  by 0x5B9B8C2: (within /usr/lib/libnss3.so.0d)
  by 0x5BA4133: SECMOD_LoadModule (in /usr/lib/libnss3.so.0d)
  by 0x5BA428A: SECMOD_LoadModule (in /usr/lib/libnss3.so.0d)
  by 0x5B8342D: (within /usr/lib/libnss3.so.0d)
=====
that occur in each capture.
I didn't see these on the amd64 system or the x86 chroot I set up on that
system.  Do you guys get pages and pages of that output when you run
valgrind on gaim on a 32-bit x86 system?  If not, I'll try to figure out
what's causing it on that freshly installed laptop.

- Michael

#425317#62
Date:
2007-02-20 04:20:54 UTC
From:
To:
Sorry, small correction.  I *thought* I'd run valgrind in my chroot, but I
hadn't.  When I run "valgrind gaim" in my Debian i386 chroot, I get the
same pages upon pages of
=====
  by 0x5C5617D: (within /usr/lib/libsoftokn3.so.0d)
=====
related output as I did on the laptop.

Oh, and just to avoid any possibly confusion for people not familiar with
these, libnss3.so.0d and libnss_ldap-2.3.6.so are of no relation.

libnss_ldap-2.3.6.so is in the libnss-ldap package, which is the
"NSS module for using LDAP as a naming service"

libnss3.so.0d is in the libnss3-0d package, which is a
"Network Security Service libraries" that came out of Netscape/Mozilla

- Michael

#425317#67
Date:
2007-02-20 11:10:18 UTC
From:
To:
OK, I think I see the problem.  libldap_r.so.2.0.130 and gaim-otr both
use libgcrypt.  libgcrypt has a well-known problem that it can't be
used as a shared library by more than one client in the same program.
This is because it uses global variables that the two clients (gaim-otr
and libldap) both need to use in different ways.  We also see this
problem, for example, if you try to use gaim-otr and another encrypting
plugin that uses libgcrypt.

One solution would be to statically link libgcrypt into gaim-otr (so it
gets its own copies of global data).  Can one of the Debian people try
to build a package like that?

A better solution, of course, would be to make libgcrypt
sharing-friendly.  This had been discussed on libgcrypt's mailing list
a while back, but I don't know what, if anything, became of it.

Thanks,

   - Ian

#425317#72
Date:
2007-03-07 22:07:07 UTC
From:
To:
I figured out how to do this on Linux.  I have no idea how to do it on
other systems, or how to get libtool to do this on its own (possibly not
possible).

The problem, again: other gaim plugins, such as Jabber or ldap, use
libgcrypt (often in the guise of TLS).  libgcrypt uses global variables,
and assumes that it will only have one caller per address space.  The
various plugins initialize libgcrypt's global variables, and stomp all
over each other's (and gaim-otr's) initializations.  Badness ensues.

The "right" solution is for libgcrypt to stop using global variables,
and to pass handles around.  Back-compatibility can be easily arranged
by having the current routines do something like:

gcry_foo(int bar, char *baz)
{
    gcry_foo_r(&global_handle, bar, baz);
}

but callers "in the know" could call gcry_foo_r directly with a private
handle.

But until that happens, here's a workaround for gaim-otr to link
libgcrypt statically.  It's actually pretty tricky, since it seems calls
from one .o to another in a .so file are always looked up dynamically,
so if another copy of libgcrypt exists in the address space, you'll
still get that one.  So you have to put everything in a single .o, make
the libgcrypt symbols local, and turn the result into a .so.

Here's Makefile.static (for gaim-otr):

.libs/gaim-otr.so: FORCE
	# Build everything from the standard Makefile
	make
	# Link everything, including libotr and libgcrypt, together into
	# a single .o file
	ld -r  .libs/otr-plugin.o .libs/ui.o .libs/dialogs.o .libs/gtk-ui.o .libs/gtk-dialog.o /usr/lib/libotr.a /usr/lib/libgcrypt.a /usr/lib/libgpg-error.a -o .libs/gaim-otr-shared.o
	# Make all the libgcrypt references local to that .o file
	objcopy -w -L '*gcry*' .libs/gaim-otr-shared.o .libs/gaim-otr-static.o
	# Turn the .o into a .so
	gcc -shared .libs/gaim-otr-static.o -Wl,-soname -Wl,gaim-otr.so -o .libs/gaim-otr.so

FORCE:

   - Ian

#425317#77
Date:
2007-05-20 21:18:34 UTC
From:
To:
Hi Steve,

Nobody reacted to this email. What's your opinion on this? Is this an
acceptable fix for Debian?

Thanks

T-Bone

#425317#88
Date:
2008-08-01 16:27:57 UTC
From:
To:
Salut Thibaut,

I just wanted to notify you that the fix proposed by Ian does not work
on my box. The DNS children still die; the configuration of my system is
as described previously by other people - x86_64, ldap, problem only
appears when using pidgin-otr in addition to pidgin.

All the best,
     Hubert