Fabre

#721617 liblo7: broken on sparc: bus error while sending 64 bit integers #721617

Package:: liblo7

Source:: liblo

Description:: Lightweight OSC library

Submitter:: Sebastian Ramacher

Date:: 2025-08-17 17:47:51 UTC

Severity:: normal

Tags:

#721617#5

Date:: 2013-09-02 12:51:18 UTC

From:

To:

Source: pyliblo
Version: 0.9.1-3
Severity: serious
Justification: FTBFS but built successfully in the past

pyliblo FTBFS on sparc:
| PYTHONPATH=$(ls -d /home/sramacher/pyliblo-0.9.1/build/lib.*-2.7) \
|                 python2.7 -m unittest discover -s test/ -p '*.py' -v
| testHostPort (unit.AddressTestCase) ... ok
| testHostPortProto (unit.AddressTestCase) ... ok
| testPort (unit.AddressTestCase) ... ok
| testUrl (unit.AddressTestCase) ... ok
| testSendReceive (unit.DecoratorTestCase) ... ok
| testNoPermission (unit.ServerCreationTestCase) ... ok
| testPort (unit.ServerCreationTestCase) ... ok
| testPortProto (unit.ServerCreationTestCase) ... ok
| testRandomPort (unit.ServerCreationTestCase) ... ok
| testNotReachable (unit.ServerTCPTestCase) ... ok
| testSendReceive (unit.ServerTCPTestCase) ... ok
| testPort (unit.ServerTestCase) ... ok
| testRecvImmediate (unit.ServerTestCase) ... ok
| testRecvTimeout (unit.ServerTestCase) ... ok
| testSendBlob (unit.ServerTestCase) ... ok
| testSendBundle (unit.ServerTestCase) ... ok
| testSendInt (unit.ServerTestCase) ... ok
| testSendInvalid (unit.ServerTestCase) ... ok
| testSendMessage (unit.ServerTestCase) ... ok
| testSendOthers (unit.ServerTestCase) ... Bus error (core dumped)
| make[1]: *** [test-python2.7] Error 138

I was able to reproduce this issue on smetana.d.o. liblo rebuilt with
debugging symbols gives me the following back trace:

Core was generated by `python2.7 -m unittest discover -s test/ -p *.py -v'.
Program terminated with signal 10, Bus error.
#0  0x7056634c in lo_arg_network_endian (type=<error reading variable: Cannot access memory at address 0x47>,
    data=<error reading variable: Cannot access memory at address 0x4b>) at message.c:687
687             *(int64_t *)data = lo_htoo64(*(int64_t *)data);
(gdb) bt
#0  0x7056634c in lo_arg_network_endian (type=<error reading variable: Cannot access memory at address 0x47>,
    data=<error reading variable: Cannot access memory at address 0x4b>) at message.c:687
#1  0x7036c008 in ?? () from /lib/sparc-linux-gnu/libc.so.6
#2  0x7036c008 in ?? () from /lib/sparc-linux-gnu/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Regards

#721617#10

Date:: 2013-09-02 17:56:31 UTC

From:

To:

Control: reassign -1 liblo7 0.26~repack-7
Control: retitle -1 liblo7: broken on sparc: bus error while sending 64 bit integers
Control: affects -1 src:pyliblo

I've debugged this a bit further and it looks like liblo is broken on
sparc. Using the two binaries shipped in liblo-tools, one can easily
produce bus errors:

In one shell run 'oscdump 54321', in the other 'oscsend localhost 54321
/test hTfs 12345 3.14 hello' (this is the example from the oscsend with
64 bit instead of 32 bit integers) and oscdump immediately crashes with
a bus error:

Core was generated by `oscdump 54321'.
Program terminated with signal 10, Bus error.
#0  0xf7eed3ec in lo_arg_pp_internal () from /usr/lib/liblo.so.7
(gdb) bt
#0  0xf7eed3ec in lo_arg_pp_internal () from /usr/lib/liblo.so.7
#1  0x0000000d in ?? ()
#2  0x0000000d in ?? ()

So this makes me believe that liblo is broken on sparc and hence I'm
reassigning this bug to liblo.

Regards

#721617#27

Date:: 2013-09-02 23:33:04 UTC

From:

To:

Hi liblo devs,

The python liblo wrapper has a test suite that has uncovered a bug in
liblo. More details below:

#721617#32

Date:: 2013-09-03 00:16:50 UTC

From:

To:

I just tried current git (5a7a54b4a0a) and the test provided by
Sebastian continues to fail:

Program received signal SIGBUS, Bus error.
0xf7fb0e5c in lo_arg_pp_internal (type=LO_STRING, data=0x22e94, bigendian=0)
    at message.c:1021
1021            val64.nl = *(int64_t *) data;
(gdb) bt
#0  0xf7fb0e5c in lo_arg_pp_internal (type=LO_STRING, data=0x22e94,
    bigendian=0) at message.c:1021
#1  0xf7fb0d44 in lo_arg_pp (type=LO_STRING, data=0x22e94) at message.c:993
#2  0x0001082c in messageHandler (path=0x22728 "/test", types=0x22781 "hTfs",
    argv=0x22ea0, argc=4, msg=0x22750, user_data=0x0) at oscdump.c:61
#3  0xf7fb5aec in dispatch_method (s=0x22008, path=0x22728 "/test",
    msg=0x22750, sock=-1) at server.c:1746
#4  0xf7fb56c0 in dispatch_data (s=0x22008, data=0x22728, size=36, sock=-1)
    at server.c:1670
#5  0xf7fb4cf8 in lo_server_recv (s=0x22008) at server.c:1498
#6  0x00010a1c in main (argc=2, argv=0xffffdcb4) at oscdump.c:105


Moreover, the testsuite also fails, but in a different place:

Program received signal SIGBUS, Bus error.
lo_message_add_int64 (m=0x2c730, a=81985529216486895) at message.c:368
368    *nptr = b.nl;
(gdb) bt
#0  lo_message_add_int64 (m=0x2c730, a=81985529216486895) at message.c:368
#1  0x00012bdc in test_deserialise () at testlo.c:1040
#2  0x000154d0 in main () at testlo.c:149


Tried several gcc versions, all fail.

#721617#37

Date:: 2013-09-03 00:42:01 UTC

From:

To:

It appears the problem is that in sparc, you can't just say
*(datatype*)data. Depending on datatype, 'data' has to be aligned at a
certain number of bytes from the original block (4 for int, 8 for
int64):

char* src = something();
int* tmp = (int*)(src + 1); // If 1 is replaced by 4, no bus error.
*src = 1; // Bus error here.
int a = *src; // This yields bus error too

So, at least lo_message_add_data (plus all its users),
lo_arg_pp_internal and lo_arg_host_endian need to change to support
sparc.

#721617#42

Date:: 2013-09-05 09:10:19 UTC

From:

To:

Any idea how to get a sparc test environment running?  Is there an
emulator I can use for example?

In the provided stack traces the "data" variable does seem to be
4-byte aligned, but the error is on a 64-bit data type.  I am curious
to know if this problem _only_ occurs for 64-bit types?

Type-casting is somewhat fundamental to how liblo uses the lo_arg data
structure and for interpreting raw memory blocks of OSC data.  In
general OSC is by-design 32-bit-aligned, so generally this shouldn't
be an issue, but if there are cases where things need to be
64-bit-aligned I can see how this problem might creep in.

Suggestions on how to debug would be useful as I am completely
unfamiliar with sparc.

thanks,
Steve

#721617#47

Date:: 2013-09-05 16:30:21 UTC

From:

To:

QEMU should support sparc targets. I believe qemu-system-sparc is the
package (in Debian) you can use to create a sparc system. At [1] there
appear to be instructions for setting up a debian qemu sparc system.
Alternatively, we could request access for you on some debian porter
machine. This could take a while, though, and requires some steps to
be taken[2]


[1] http://tyom.blogspot.com/2013/03/debiansparc64-wheezy-under-qemu-how-to.html
[2] http://dsa.debian.org/doc/guest-account/

I'm definitely no expert, but my googling leaves me with the
impression that every native type (sizes 1, 2, 4 and 8 bytes) has a
dedicated instruction that requires data to be 1,2,4 or 8 bytes
aligned. So errors should only happen with 64 bit types if OSC is
32-bit aligned.

#721617#52

Date:: 2014-01-31 16:46:48 UTC

From:

To:

Closing because liblo is no longer in sparc, and will not be built again there.

#721617#61

Date:: 2016-06-01 06:48:20 UTC

From:

To:

Control: reopen -1
Control: severity -1 normal

Re-opening because in Debian we don't "fix bugs" by sweeping them under the
carpet. Also, we're currently making sparc64 fit for release and there is
a very active upstream.

Which is absolutely not specific to SPARC. The moment you are recasting
a pointer that way, you are leaving the territory of the C99 specification
which explicitly states that declarations which refer to the same object must
have compatible types, otherwise the behavior is undefined (C99, 6.2.7/2) [1].

The code in question is therefore buggy and has to be fixed anyway as there
is otherwise no guarantee it will work on future architectures or compilers.

I'll have a look at this issue, it's a common programming mistake.

Cheers,
Adrian

#721617#72

Date:: 2016-06-01 22:01:17 UTC

From:

To:

That's a good attitude, currently I don't know anyone using sparc64
with liblo nor do I have access to such a machine, so reproducing and
testing, and therefore fixing, this bug is not possible for me.  So, I
look forward to your contribution.

It is a good point.  If you have some examples where this fails it
would be a good contribution to our unit testing.  (testlo.c)

At the moment no one has actually complained about this bug, and
therefore I can only assume it has not actually been encountered and
remains an entirely theoretical bug, but I do welcome ideas for how to
fix it nonetheless, because compatibility with future architectures is
certainly a desirable goal.

Unfortunately one that seems to be baked into the liblo API, but
perhaps there is a way to fix it without sacrificing efficiency, at
least on unaffected architectures.

If not, perhaps it can be fixed in a future API-breaking version of
liblo.  Proposals welcome.

Steve

#721617#77

Date:: 2016-06-01 22:19:03 UTC

From:

To:

We have a large machine available on which I can create an account for
you if you want to look at the problem yourself. It runs Debian unstable
with a current kernel and toolchain.

If you're interested, send me a private email with your public SSH key
and sign the mail with your GPG key if possible.

The problem is that this violates strict aliasing and can lead to
unpredictable results. I don't know enough about liblo though
to understand the ramifications without looking at the code.

Well, you can do these casts, but when you copy data, you must actually
use memcpy instead of a direct assignment. Then the compiler will
automatically take care of the proper alignment.

To fix those issues, it's normally enough to build with debug symbols
enabled, run gdb over it and the corresponding assignment will show
in the backtrace. Replacing the assignment with memcpy then fixes
the issue. I just recently did that for systemd [1].

Adrian

#721617#82

Date:: 2016-08-01 00:11:22 UTC

From:

To:

Hello!

So, here's what I suggest:

1. Create a common header file called "unaligned.h" which contains all
   macros required for unaligned access, both for little-endian,
   big-endian and native-endian access. This can be handily sourced
   from the systemd sources [1].

2. Go through the sources and fix all of the following pointers acrobatics
   using the macros from "unaligned.h":

   glaubitz@ikarus:/tmp/liblo-0.28/src$ grep -R  \*\( *
   address.c:            *((unsigned long*)&a.addr) = inet_addr(ip);
   message.c:                        *(types - 1), file, line);
   message.c:    dsize = lo_otoh32(*(uint32_t *) data);
   message.c:        elem_len = lo_otoh32(*((uint32_t *) pos));
   message.c:        *(int32_t *) data = lo_otoh32(*(int32_t *) data);
   message.c:        *(int32_t *) data = lo_otoh32(*(int32_t *) data);
   message.c:        *(int32_t *) data = lo_otoh32(*(int32_t *) data);
   message.c:        *(int64_t *) data = lo_otoh64(*(int64_t *) data);
   message.c:        *(int32_t *) data = lo_htoo32(*(int32_t *) data);
   message.c:        *(uint32_t *) data = lo_htoo32(*(uint32_t *) data);
   message.c:        *(uint32_t *) data = lo_htoo32(*(uint32_t *) data);
   message.c:        *(int64_t *) data = lo_htoo64(*(int64_t *) data);
   message.c:            val32.nl = lo_otoh32(*(int32_t *) data);
   message.c:            val32.nl = *(int32_t *) data;
   message.c:            bigendian ? lo_otoh32(*(uint32_t *) data) : *(uint32_t *) data;
   message.c:            bigendian ? lo_otoh32(*(uint32_t *) data) : *(uint32_t *) data;
   message.c:            val64.nl = lo_otoh64(*(int64_t *) data);
   message.c:            val64.nl = *(int64_t *) data;
   message.c:                printf("%#02x", (unsigned int)*((unsigned char *) (data) + 4 + i));
   message.c:            printf("0x%02x", *((uint8_t *) (data) + i));
   server.c:    s->sockets = calloc(2, sizeof(*(s->sockets)));
   server.c:    s->contexts = calloc(2, sizeof(*(s->contexts)));
   server.c:                *(*buffer)++ = *from++;
   server.c:                *(*buffer)++ = SLIP_END;
   server.c:                *(*buffer)++ = SLIP_ESC;
   server.c:        uint32_t msg_len = ntohl(*(uint32_t*)sc->buffer);
   server.c:            *(uint32_t*)(sc->buffer + sc->buffer_read_offset) = 0;
   server.c:            *(uint32_t*)(sc->buffer + sc->buffer_msg_offset) = htonl(msg_len);
   server.c:            *(uint32_t*)(sc->buffer + sc->buffer_msg_offset) = 0;
   server.c:                           sizeof(*(s->sockets)) * (s->sockets_alloc * 2));
   server.c:                     sizeof(*(s->contexts))
   server.c:        ts.sec = lo_otoh32(*((uint32_t *) pos));
   server.c:        ts.frac = lo_otoh32(*((uint32_t *) pos));
   server.c:            elem_len = lo_otoh32(*((uint32_t *) pos));
   server.c:        if (pos && *(pos + 1) == '\0') {
   server_thread.c:            pthread_create(&(st->thread), NULL, (void *(*)(void *)) &thread_func, st);
   testlo.c:    *(uint32_t *) (data + 8) = lo_htoo32((uint32_t) 99999);
   testlo.c:        *(uint32_t*)msg = htonl(24);
   testlo.c:        *(uint32_t*)(msg+28) = htonl(8);
   glaubitz@ikarus:/tmp/liblo-0.28/src$

   Since the unaligned.h header already caters of BE/LE systems, we can use these macros
   instead lo_otoh32, for example.

Thanks,
Adrian

#721617 liblo7: broken on sparc: bus error while sending 64 bit integers #721617

Just Reply to ...

Reply to submitter ...

Send control command (Silently)

Set Architecture Tags (Silently)