Source: pyliblo
Version: 0.9.1-3
Severity: serious
Justification: FTBFS but built successfully in the past
pyliblo FTBFS on sparc:
| PYTHONPATH=$(ls -d /home/sramacher/pyliblo-0.9.1/build/lib.*-2.7) \
| python2.7 -m unittest discover -s test/ -p '*.py' -v
| testHostPort (unit.AddressTestCase) ... ok
| testHostPortProto (unit.AddressTestCase) ... ok
| testPort (unit.AddressTestCase) ... ok
| testUrl (unit.AddressTestCase) ... ok
| testSendReceive (unit.DecoratorTestCase) ... ok
| testNoPermission (unit.ServerCreationTestCase) ... ok
| testPort (unit.ServerCreationTestCase) ... ok
| testPortProto (unit.ServerCreationTestCase) ... ok
| testRandomPort (unit.ServerCreationTestCase) ... ok
| testNotReachable (unit.ServerTCPTestCase) ... ok
| testSendReceive (unit.ServerTCPTestCase) ... ok
| testPort (unit.ServerTestCase) ... ok
| testRecvImmediate (unit.ServerTestCase) ... ok
| testRecvTimeout (unit.ServerTestCase) ... ok
| testSendBlob (unit.ServerTestCase) ... ok
| testSendBundle (unit.ServerTestCase) ... ok
| testSendInt (unit.ServerTestCase) ... ok
| testSendInvalid (unit.ServerTestCase) ... ok
| testSendMessage (unit.ServerTestCase) ... ok
| testSendOthers (unit.ServerTestCase) ... Bus error (core dumped)
| make[1]: *** [test-python2.7] Error 138
I was able to reproduce this issue on smetana.d.o. liblo rebuilt with
debugging symbols gives me the following back trace:
Core was generated by `python2.7 -m unittest discover -s test/ -p *.py -v'.
Program terminated with signal 10, Bus error.
#0 0x7056634c in lo_arg_network_endian (type=<error reading variable: Cannot access memory at address 0x47>,
data=<error reading variable: Cannot access memory at address 0x4b>) at message.c:687
687 *(int64_t *)data = lo_htoo64(*(int64_t *)data);
(gdb) bt
#0 0x7056634c in lo_arg_network_endian (type=<error reading variable: Cannot access memory at address 0x47>,
data=<error reading variable: Cannot access memory at address 0x4b>) at message.c:687
#1 0x7036c008 in ?? () from /lib/sparc-linux-gnu/libc.so.6
#2 0x7036c008 in ?? () from /lib/sparc-linux-gnu/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Regards
Control: reassign -1 liblo7 0.26~repack-7 Control: retitle -1 liblo7: broken on sparc: bus error while sending 64 bit integers Control: affects -1 src:pyliblo I've debugged this a bit further and it looks like liblo is broken on sparc. Using the two binaries shipped in liblo-tools, one can easily produce bus errors: In one shell run 'oscdump 54321', in the other 'oscsend localhost 54321 /test hTfs 12345 3.14 hello' (this is the example from the oscsend with 64 bit instead of 32 bit integers) and oscdump immediately crashes with a bus error: Core was generated by `oscdump 54321'. Program terminated with signal 10, Bus error. #0 0xf7eed3ec in lo_arg_pp_internal () from /usr/lib/liblo.so.7 (gdb) bt #0 0xf7eed3ec in lo_arg_pp_internal () from /usr/lib/liblo.so.7 #1 0x0000000d in ?? () #2 0x0000000d in ?? () So this makes me believe that liblo is broken on sparc and hence I'm reassigning this bug to liblo. Regards
Hi liblo devs, The python liblo wrapper has a test suite that has uncovered a bug in liblo. More details below:
I just tried current git (5a7a54b4a0a) and the test provided by
Sebastian continues to fail:
Program received signal SIGBUS, Bus error.
0xf7fb0e5c in lo_arg_pp_internal (type=LO_STRING, data=0x22e94, bigendian=0)
at message.c:1021
1021 val64.nl = *(int64_t *) data;
(gdb) bt
#0 0xf7fb0e5c in lo_arg_pp_internal (type=LO_STRING, data=0x22e94,
bigendian=0) at message.c:1021
#1 0xf7fb0d44 in lo_arg_pp (type=LO_STRING, data=0x22e94) at message.c:993
#2 0x0001082c in messageHandler (path=0x22728 "/test", types=0x22781 "hTfs",
argv=0x22ea0, argc=4, msg=0x22750, user_data=0x0) at oscdump.c:61
#3 0xf7fb5aec in dispatch_method (s=0x22008, path=0x22728 "/test",
msg=0x22750, sock=-1) at server.c:1746
#4 0xf7fb56c0 in dispatch_data (s=0x22008, data=0x22728, size=36, sock=-1)
at server.c:1670
#5 0xf7fb4cf8 in lo_server_recv (s=0x22008) at server.c:1498
#6 0x00010a1c in main (argc=2, argv=0xffffdcb4) at oscdump.c:105
Moreover, the testsuite also fails, but in a different place:
Program received signal SIGBUS, Bus error.
lo_message_add_int64 (m=0x2c730, a=81985529216486895) at message.c:368
368 *nptr = b.nl;
(gdb) bt
#0 lo_message_add_int64 (m=0x2c730, a=81985529216486895) at message.c:368
#1 0x00012bdc in test_deserialise () at testlo.c:1040
#2 0x000154d0 in main () at testlo.c:149
Tried several gcc versions, all fail.
It appears the problem is that in sparc, you can't just say *(datatype*)data. Depending on datatype, 'data' has to be aligned at a certain number of bytes from the original block (4 for int, 8 for int64): char* src = something(); int* tmp = (int*)(src + 1); // If 1 is replaced by 4, no bus error. *src = 1; // Bus error here. int a = *src; // This yields bus error too So, at least lo_message_add_data (plus all its users), lo_arg_pp_internal and lo_arg_host_endian need to change to support sparc.
Any idea how to get a sparc test environment running? Is there an emulator I can use for example? In the provided stack traces the "data" variable does seem to be 4-byte aligned, but the error is on a 64-bit data type. I am curious to know if this problem _only_ occurs for 64-bit types? Type-casting is somewhat fundamental to how liblo uses the lo_arg data structure and for interpreting raw memory blocks of OSC data. In general OSC is by-design 32-bit-aligned, so generally this shouldn't be an issue, but if there are cases where things need to be 64-bit-aligned I can see how this problem might creep in. Suggestions on how to debug would be useful as I am completely unfamiliar with sparc. thanks, Steve
QEMU should support sparc targets. I believe qemu-system-sparc is the package (in Debian) you can use to create a sparc system. At [1] there appear to be instructions for setting up a debian qemu sparc system. Alternatively, we could request access for you on some debian porter machine. This could take a while, though, and requires some steps to be taken[2] [1] http://tyom.blogspot.com/2013/03/debiansparc64-wheezy-under-qemu-how-to.html [2] http://dsa.debian.org/doc/guest-account/ I'm definitely no expert, but my googling leaves me with the impression that every native type (sizes 1, 2, 4 and 8 bytes) has a dedicated instruction that requires data to be 1,2,4 or 8 bytes aligned. So errors should only happen with 64 bit types if OSC is 32-bit aligned.
Closing because liblo is no longer in sparc, and will not be built again there.
Control: reopen -1 Control: severity -1 normal Re-opening because in Debian we don't "fix bugs" by sweeping them under the carpet. Also, we're currently making sparc64 fit for release and there is a very active upstream. Which is absolutely not specific to SPARC. The moment you are recasting a pointer that way, you are leaving the territory of the C99 specification which explicitly states that declarations which refer to the same object must have compatible types, otherwise the behavior is undefined (C99, 6.2.7/2) [1]. The code in question is therefore buggy and has to be fixed anyway as there is otherwise no guarantee it will work on future architectures or compilers. I'll have a look at this issue, it's a common programming mistake. Cheers, Adrian
That's a good attitude, currently I don't know anyone using sparc64 with liblo nor do I have access to such a machine, so reproducing and testing, and therefore fixing, this bug is not possible for me. So, I look forward to your contribution. It is a good point. If you have some examples where this fails it would be a good contribution to our unit testing. (testlo.c) At the moment no one has actually complained about this bug, and therefore I can only assume it has not actually been encountered and remains an entirely theoretical bug, but I do welcome ideas for how to fix it nonetheless, because compatibility with future architectures is certainly a desirable goal. Unfortunately one that seems to be baked into the liblo API, but perhaps there is a way to fix it without sacrificing efficiency, at least on unaffected architectures. If not, perhaps it can be fixed in a future API-breaking version of liblo. Proposals welcome. Steve
We have a large machine available on which I can create an account for you if you want to look at the problem yourself. It runs Debian unstable with a current kernel and toolchain. If you're interested, send me a private email with your public SSH key and sign the mail with your GPG key if possible. The problem is that this violates strict aliasing and can lead to unpredictable results. I don't know enough about liblo though to understand the ramifications without looking at the code. Well, you can do these casts, but when you copy data, you must actually use memcpy instead of a direct assignment. Then the compiler will automatically take care of the proper alignment. To fix those issues, it's normally enough to build with debug symbols enabled, run gdb over it and the corresponding assignment will show in the backtrace. Replacing the assignment with memcpy then fixes the issue. I just recently did that for systemd [1]. Adrian
Hello!
So, here's what I suggest:
1. Create a common header file called "unaligned.h" which contains all
macros required for unaligned access, both for little-endian,
big-endian and native-endian access. This can be handily sourced
from the systemd sources [1].
2. Go through the sources and fix all of the following pointers acrobatics
using the macros from "unaligned.h":
glaubitz@ikarus:/tmp/liblo-0.28/src$ grep -R \*\( *
address.c: *((unsigned long*)&a.addr) = inet_addr(ip);
message.c: *(types - 1), file, line);
message.c: dsize = lo_otoh32(*(uint32_t *) data);
message.c: elem_len = lo_otoh32(*((uint32_t *) pos));
message.c: *(int32_t *) data = lo_otoh32(*(int32_t *) data);
message.c: *(int32_t *) data = lo_otoh32(*(int32_t *) data);
message.c: *(int32_t *) data = lo_otoh32(*(int32_t *) data);
message.c: *(int64_t *) data = lo_otoh64(*(int64_t *) data);
message.c: *(int32_t *) data = lo_htoo32(*(int32_t *) data);
message.c: *(uint32_t *) data = lo_htoo32(*(uint32_t *) data);
message.c: *(uint32_t *) data = lo_htoo32(*(uint32_t *) data);
message.c: *(int64_t *) data = lo_htoo64(*(int64_t *) data);
message.c: val32.nl = lo_otoh32(*(int32_t *) data);
message.c: val32.nl = *(int32_t *) data;
message.c: bigendian ? lo_otoh32(*(uint32_t *) data) : *(uint32_t *) data;
message.c: bigendian ? lo_otoh32(*(uint32_t *) data) : *(uint32_t *) data;
message.c: val64.nl = lo_otoh64(*(int64_t *) data);
message.c: val64.nl = *(int64_t *) data;
message.c: printf("%#02x", (unsigned int)*((unsigned char *) (data) + 4 + i));
message.c: printf("0x%02x", *((uint8_t *) (data) + i));
server.c: s->sockets = calloc(2, sizeof(*(s->sockets)));
server.c: s->contexts = calloc(2, sizeof(*(s->contexts)));
server.c: *(*buffer)++ = *from++;
server.c: *(*buffer)++ = SLIP_END;
server.c: *(*buffer)++ = SLIP_ESC;
server.c: uint32_t msg_len = ntohl(*(uint32_t*)sc->buffer);
server.c: *(uint32_t*)(sc->buffer + sc->buffer_read_offset) = 0;
server.c: *(uint32_t*)(sc->buffer + sc->buffer_msg_offset) = htonl(msg_len);
server.c: *(uint32_t*)(sc->buffer + sc->buffer_msg_offset) = 0;
server.c: sizeof(*(s->sockets)) * (s->sockets_alloc * 2));
server.c: sizeof(*(s->contexts))
server.c: ts.sec = lo_otoh32(*((uint32_t *) pos));
server.c: ts.frac = lo_otoh32(*((uint32_t *) pos));
server.c: elem_len = lo_otoh32(*((uint32_t *) pos));
server.c: if (pos && *(pos + 1) == '\0') {
server_thread.c: pthread_create(&(st->thread), NULL, (void *(*)(void *)) &thread_func, st);
testlo.c: *(uint32_t *) (data + 8) = lo_htoo32((uint32_t) 99999);
testlo.c: *(uint32_t*)msg = htonl(24);
testlo.c: *(uint32_t*)(msg+28) = htonl(8);
glaubitz@ikarus:/tmp/liblo-0.28/src$
Since the unaligned.h header already caters of BE/LE systems, we can use these macros
instead lo_otoh32, for example.
Thanks,
Adrian