#645592 libc6: 2.11 (maybe?) breaks backwards (binary) compatibility

Package:
libc6
Source:
glibc
Description:
GNU C Library: Shared libraries
Submitter:
Lionel Elie Mamane
Date:
2011-10-17 16:12:05 UTC
Severity:
important
#645592#5
Date:
2011-10-17 08:03:44 UTC
From:
To:
On amd64, one specific program behaves differently when run with libc6
2.10.2-9 (or earlier) than when run with libc6 2.11.1-1 (or later). So
this smells like it could be either a libc6 backwards binary
compatibility bug to me, or plainly a bug somewhere in eglibc. I
suppose it is also possible that the program makes an invalid (not
guaranteed) assumption and (e)glibc 2.11 changed something that breaks
that assumption. The incorrect behaviour does not occur with the i386
version of the program, even with newer libc6. Neither on a full i386
Debian GNU/Linux, nor with the i386 version of the program installed
on an amd64 Debian GNU/Linux with "--force-architecture" and
ia32-libs.

The program is Symantec's Backup Exec Remote Agent for Linux and UNIX
Servers, versions 2010R2 (13.0.4164) and 2010R3 (13.0.5204); hereafter
called "beremote". Alas, it is proprietary and all that, so precise
debugging and testing is rather complicated. In particular, I don't
know if a recompile and/or relink against newer headers / libraries
would "fix" the program's behaviour or not.

The incorrect behaviour is that in the backup, many directory names
are mangled and/or duplicated. For example, /boot is split between
/boot and /bott, and /boot/grub becomes /boot/guub. Filenames seem to
be OK.

I've run beremote under various variants of gdb, strace, ltrace and
latrace, but I haven't been able to pinpoint a specific call to a libc
function that returns a wrong result. All I see is that beremote does
readdir(), and then the directory name (e.g. "boot") reappears mangled
(e.g. "bott") in a call to wmemcpy. I would appreciate detailed
instructions on how to better pinpoint it: as l(a)trace does not
support *wchar_t strings, but gdb (sid version) does, I've been mainly
using gdb breakpoints to look at what happens (after installing
libc6-dbg), but that is rather onerous and it shows me only the libc
functions I thought to put a breakpoint on, not all calls like
l(a)trace would.

I can send you the strace, ltrace and latrace outputs if it is
useful.

I've started from a lenny system: problem does not appear. Upgraded
kernel to squeeze version, problem does not appear. Upgraded libc6 to
squeeze, problem appears. I started taking versions from
snapshot.debian.org until I found that problem appears with 2.11.1-1,
but not 2.10.2-9. So it looks like a 2.10 vs 2.11 thing. I vaguely
looked at the NEWS file and diff of source code between those
versions, the only thing that struck my eye is:

* New optimized string functions for x86-64: strstr, strcasestr, memcmp,
  strcspn, strpbrk, strspn, strcpy, stpcpy, strncpy, strcmp (SSE2, SSE4.2),
  strncmp (SSE2, SSE4.2), strchr (SSE4.2), strrchr (SSE4.2).
  Contributed by H.J. Lu.

  strlen, rawmemchr, strcmp (SSSE3), strncmp (SSSE3).
  Implemented by Ulrich Drepper.

But I think that if these functions were buggy, we would have noticed!
So, unlikely.

#645592#10
Date:
2011-10-17 08:43:57 UTC
From:
To:
This is likely to be these functions actually, and more precisely
memcpy(). C standards specifies that the source and dest areas should
not overlap, otherwise memmove() should be used instead. Recent version
of the libc uses to optimize this function, but some binaries use
memcpy() while they should use memmove().

Given your code is closed source and thus can't be fixed, the best is to
use the provided wrapper, see /usr/share/doc/libc6/NEWS.Debian.gz for
more information about how to use them.

#645592#17
Date:
2011-10-17 09:25:14 UTC
From:
To:
Interesting theory, but according the NEWS.Debian.gz of *sid* and the
BTS entries linked to, the issue you refer to seems to be new to
(e)glibc 2.13, not (e)glibc 2.11... And the problem was originally
observed in *squeeze*. And I've just checked for overlap by running
these on the latrace output:

grep ' memcpy(' | sed 's/[(,)]/ /g' | gawk '{if ( strtonum($8) <= strtonum($5) && strtonum($8) + strtonum($11) >= strtonum($5)) print "overlap: " $5 " " $8 " " $11 }'
grep ' memcpy(' | sed 's/[(,)]/ /g' | gawk '{if ( strtonum($8) >= strtonum($5) && strtonum($5) + strtonum($11) >= strtonum($8)) print "overlap: " $5 " " $8 " " $11 }'


Do you happen to have another theory, that would apply to a change
between 2.10 and 2.11?

#645592#22
Date:
2011-10-17 10:14:23 UTC
From:
To:
Hi Lionel,

Lionel Elie Mamane wrote:

Could you try the analagous checks with strcpy and stpcpy?

#645592#27
Date:
2011-10-17 13:44:25 UTC
From:
To:
Jonathan Nieder wrote:
[...]

Ah, I have another idea.  Could you try the libc6 package from wheezy
or sid?

If my hunch is right, it will work fine, and you ran into something
like PR12159 (bug#635885).

#645592#32
Date:
2011-10-17 14:44:26 UTC
From:
To:
No, the problem appears on a mix of *lenny* and sid with libc6
2.13-21.

#645592#37
Date:
2011-10-17 16:08:40 UTC
From:
To:
I found those with overlap:
 6734     wcscpy(dest = 0x7f6c5d40a570, src = (0x7f6c5d40a574, 33) "\\honey.domain.gestman.lu\[ROOT]"") [/lib/x86_64-linux-gnu/libc.so.6] {
 6734     } wcscpy = (0x7f6c5d40a570, 33) "\\honey.domain.gestman.lu\[ROOT]""

 6734     wcscpy(dest = 0x7f6c5d42b9e0, src = (0x7f6c5d42b9e4, 33) "\\honey.domain.gestman.lu\[ROOT]"") [/lib/x86_64-linux-gnu/libc.so.6] {
 6734     } wcscpy = (0x7f6c5d42b9e0, 33) "\\honey.domain.gestman.lu\[ROOT]""

However, in both cases the result looks like it is what the programmer
(erroneously) expects?

Neither stpcpy, nor wcpcpy appears in the latrace.