#754729 [claws-mail] Randomly segfault

#754729#5
Date:
2014-07-13 17:47:03 UTC
From:
To:
After upgrade to the 3.10.1 version i encounter randomly crash with
this message in syslog:

Jul 13 19:37:25 bonifac kernel: [31018.896768] claws-mail[4105]: segfault at b50 ip 00007f0f4279d2f0 sp 00007f0f28dc7898 error 4 in
libetpan.so.17.1.0[7f0f4276b000+bf000]

I am trying to identify when this crash happen, but i am no success.
The last one happen on right click in the composing window to see
spellcheck suggestions - the context menu shows and then claws mail
dies. But it sometime died without any reason (in mean my interaction) i
only notice, that the icon in systray disappears, i found these lines
in old syslog (i hope, that they can help):

Jul 10 17:38:44 bonifac kernel: [ 6695.272757] claws-mail[5197]: segfault at 5fb0 ip 00007f84e45592f0 sp 00007f84cbc79898 error 4 in libetpan.so.17.1.0[7f84e4527000+bf000]
Jul 10 21:39:00 bonifac kernel: [21119.727123] claws-mail[18836]: segfault at 5fb0 ip 00007fbd093752f0 sp 00007fbcdec9c898 error 4 in
libetpan.so.17.1.0[7fbd09343000+bf000]
Debian Release: jessie/sid
  500 testing         security.debian.org
  500 testing         ftp.sk.debian.org

regards

#754729#10
Date:
2014-07-15 08:37:37 UTC
From:
To:
Control: tags -1 moreinfo

  You can try to run it under gdb for some days until it crashes again.
There will be some performance impact, but will allow you to get a
backtrace when crashing again.

  Before doing so, please install claws-mail-dbg and libetpan-dbg packages
so the backtrace contains symbols. Once you get a backtrace reply this with
it :)

  If you need it, you may found more help on debugging on upstream FAQ:
http://www.claws-mail.org/faq/index.php/Debugging_Claws

Thanks in advance,

#754729#17
Date:
2014-07-15 09:03:38 UTC
From:
To:
Hi	,

Dňa Tue, 15 Jul 2014 10:37:37 +0200 Ricardo Mones <mones@debian.org>
napísal:

I am willing to help with finding the root of problem, but after i fill
the bug, the crash don't come again. Perhaps (i am no expert) some other
library(ies) update fixed it. I use CM at daily base, more hours daily,
then delay time between crashes (as in my original post) are not by the
inactivity, but indicates that he crash does not happen often.

But if it will be back, i will try to debug it.

regards

#754729#22
Date:
2014-07-16 15:57:48 UTC
From:
To:
Control: tags -1 unreproducible

  Mmm, this looks like a bug in your reply template ;)

  Ok, adjusting bug tags then. If you don't manage to reproduce it soon I
  think your theory of a third library interferring is more likely to be
  true, and this could be closed.

  best regards,

#754729#29
Date:
2014-07-16 16:42:50 UTC
From:
To:
Ahoj,

Dňa Wed, 16 Jul 2014 17:57:48 +0200 Ricardo Mones <mones@debian.org>
napísal:

No, it was translation typo (unwanted tab) ;)

The CM is not able to autotranslate template yet :P

OK

regards

#754729#34
Date:
2014-07-28 12:03:39 UTC
From:
To:
Ahoj,

Dňa Wed, 16 Jul 2014 17:57:48 +0200 Ricardo Mones <mones@debian.org>
napísal:

It happens again for me, but only once, then i cannot debug it.

But today i see it in the real time - it happens after resume from
suspend and after manually triggered the receive. CM first seems to
freeze while access to NNTP (by the status line message) from gmane.org
and then crash.

regards

#754729#39
Date:
2014-07-28 12:39:02 UTC
From:
To:
  Hi,

  Well, the time after resuming from suspension is always a messy one and
involves how hardware behaves. I know it's not a consolation but for
example I've had even a few kernel crashes just after resuming (running
stable in my laptop), but 99.9% of the time it goes fine. I doubt that can
be attributed only to software too.

  You may want to run claws-mail under gdb for a while and catch the backtrace
when it happens. Not sure that the catched backtrace is going to be useful,
but at least is a starting point.

  regards,

#754729#44
Date:
2014-08-31 07:49:58 UTC
From:
To:
Hi,

Dňa Mon, 28 Jul 2014 14:39:02 +0200 Ricardo Mones <mones@debian.org>
napísal:

OK, i spend some time to see how it is with suspend related. The claws
crashes only after resume from suspend, but not always.

Here i gdb traceback, i hope that it will be suefull, but i don't
understand it:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffda09e700 (LWP 13241)]
0x00007ffff2d28820 in ?? () from /usr/lib/x86_64-linux-gnu/libetpan.so.17
(gdb) backtrace
#0  0x00007ffff2d28820 in ?? () from /usr/lib/x86_64-linux-gnu/libetpan.so.17
#1  0x00007ffff2cf53f4 in mailstream_low_read () from /usr/lib/x86_64-linux-gnu/libetpan.so.17
#2  0x00007ffff2cf6734 in mailstream_feed_read_buffer () from /usr/lib/x86_64-linux-gnu/libetpan.so.17
#3  0x00007ffff2cf4810 in mailstream_read_line_append () from /usr/lib/x86_64-linux-gnu/libetpan.so.17
#4  0x00007ffff2cf4889 in mailstream_read_line_remove_eol () from /usr/lib/x86_64-linux-gnu/libetpan.so.17
#5  0x00007ffff2d2acc4 in newsnntp_date () from /usr/lib/x86_64-linux-gnu/libetpan.so.17
#6  0x00000000005ca23f in date_run (op=<optimized out>) at nntp-thread.c:556
#7  0x00000000005e9639 in thread_run (data=0x1442a70) at etpan-thread-manager.c:351
#8  0x00007ffff43300a4 in start_thread (arg=0x7fffda09e700) at pthread_create.c:309
#9  0x00007ffff1cecfbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) quit


regards

#754729#49
Date:
2014-09-01 07:13:05 UTC
From:
To:
Hi,

That's probably some condition which doesn't happen always. From the
backtrace seems it could be that suspending while it's reading from
the network can trigger it, but that's just a wild guess.
[…]

Well, backtrace focuses the problem on libetpan, but it's not very
useful because it doesn't have symbols, hence the values which may
cause the crash cannot be seen. Can you install libetpan-dbg package
and try to reproduce the crash?

Thanks in advance,

#754729#54
Date:
2014-09-01 08:38:08 UTC
From:
To:
Ahoj,

Dňa Mon, 1 Sep 2014 09:13:05 +0200 Ricardo Mones <mones@debian.org>
napísal:

IMO no. I have 30 min between reading new mails and i am sure, that
when i am investigating it, no suspend happens when mails are read (i
have only POP3 and NNTP accounts) by CM, because i just run CM and do
suspend immediate.

No problem. It takes some time again - i am sorry, i need don't forget
to run it via gdb before suspend and i need to work with computer, which
is impossible while suspend :-)

regards

#754729#59
Date:
2014-09-01 14:34:18 UTC
From:
To:
Hi,

Dňa Mon, 1 Sep 2014 09:13:05 +0200 Ricardo Mones <mones@debian.org>
napísal:

It happens sooner, than i expect:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd8875700 (LWP 22398)]
low_logger (s=0x7fffcc001520, log_type=3, str=0x7fffcc00dc80 "111 20140901124005\r\nane.mail.sylpheed.claws.general\r\n ready (posting ok)\r\n", size=0,
    context=0x160) at mailstream.c:287
287	mailstream.c: Adresár alebo súbor neexistuje.
(gdb) backtrace
#0  low_logger (s=0x7fffcc001520, log_type=3, str=0x7fffcc00dc80 "111 20140901124005\r\nane.mail.sylpheed.claws.general\r\n ready (posting ok)\r\n",
    size=0, context=0x160) at mailstream.c:287
#1  0x00007ffff2cf53f4 in mailstream_logger_internal (size=0,
    buffer=0x7fffcc00dc80 "111 20140901124005\r\nane.mail.sylpheed.claws.general\r\n ready (posting ok)\r\n", direction=4, is_stream_data=2,
    s=0x7fffcc001520) at mailstream_low.c:408
#2  mailstream_low_read (s=0x7fffcc001520, buf=0x7fffcc00dc80, count=<optimized out>) at mailstream_low.c:240
#3  0x00007ffff2cf6734 in mailstream_feed_read_buffer (s=s@entry=0x7fffcc001560) at mailstream.c:323
#4  0x00007ffff2cf4810 in mailstream_read_line_append (stream=0x7fffcc001560, line=0x15fdfb0) at mailstream_helper.c:101
#5  0x00007ffff2cf4889 in mailstream_read_line_remove_eol (stream=<optimized out>, line=0x15fdfb0) at mailstream_helper.c:116
#6  0x00007ffff2d2acc4 in read_line (f=0x1382380, f=0x1382380) at newsnntp.c:1835
#7  newsnntp_date (f=0x1382380, tm=0x7fffffffdc50) at newsnntp.c:1358
#8  0x00000000005ca23f in date_run (op=<optimized out>) at nntp-thread.c:556
#9  0x00000000005e9639 in thread_run (data=0x12e12c0) at etpan-thread-manager.c:351
#10 0x00007ffff43300a4 in start_thread (arg=0x7fffd8875700) at pthread_create.c:309
#11 0x00007ffff1cecfbd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

regards

#754729#64
Date:
2014-09-02 15:01:37 UTC
From:
To:
Control: reassign 754729 libetpan/1.5-1
[…]

Reassigning to the correct package, which is also under my umbrella,
so I'll forward it to upstream later ;-)

Thanks for the feedback!

#754729#73
Date:
2014-09-02 16:59:22 UTC
From:
To:
Hi,

Dňa Tue, 2 Sep 2014 17:01:37 +0200 Ricardo Mones <mones@debian.org>
napísal:

I don't know how to write it in English, then only:

thanks! ;-)

#754729#78
Date:
2014-09-07 14:44:53 UTC
From:
To:
Ahoj,

Dňa Mon, 1 Sep 2014 09:13:05 +0200 Ricardo Mones <mones@debian.org>
napísal:

I want to let you know, that in last days two or three times the
suspend happens while reading mails (because immediately after resume
CM continue to read mails) without problems. Then the problems doesn't
happens in this time.

regards

#754729#83
Date:
2014-09-09 10:46:35 UTC
From:
To:
Hi!

Uh, that makes it weirder :( Have you upgraded something on your system?

If not it's probably some race condition, which won't probably be easy
to spot. Anyway, since all of this are just guesses I've forwarded it
upstream as promised, sorry for the delay on this.

Thanks for the feedback,

#754729#90
Date:
2014-09-09 17:01:36 UTC
From:
To:
Ahoj,

Dňa Tue, 9 Sep 2014 12:46:35 +0200 Ricardo Mones <mones@debian.org>
napísal:

Sure :-) It is a Debian testing, which i upgrade daily. I have multiarch
(amd64 and i386), with a lot of *-dev (and their dependencies) packages
installed. I have get a lot updated packages daily (e.g. today 62
packages). While i check all updates (what and why is updated), but i
skip  the libraries in this check, because most of libraries are simple
out of my knowledge. Then yes, i am sure that i upgraded something, but
it is terrible to tell, what was upgraded, especially when crash
doesn't happens after all suspends. But if you will ask for particular
package i am able to find it in logs.

I understand (from this and other your responses), that solution will
not necessary be easy. The crashes are annoying, but it is not a big
problem, then i will be patient ;-)

Don't worry about delay. We all are working in our free time (i am
contributing with not programming things in more projects - e.g. with
the CM translation), then from my point of view it is OK ;-)

regards