Fabre

#1037042 graphicsmagick: GetImageDepth has a thread arena and memory leak #1037042

Package:: graphicsmagick

Source:: graphicsmagick

Description:: collection of image processing tools

Submitter:: Beauregard,Christophe (ECCC)

Date:: 2023-06-07 17:21:03 UTC

Severity:: normal

Tags:

#1037042#5

Date:: 2023-06-02 14:13:31 UTC

From:

To:

The GraphicsMagick API has a thread arena and more general memory leak
somewhere in the GetImageDepth() API call, presumably a side-effect of
the OpenMP usage.

First, you need a server class system... a lot of beefy cores (i.e. I
can't reproduce it on my quad-core laptop), and possibly a lot of RAM
(128G on this one):

root@host:~# grep 'model name' /proc/cpuinfo |tail -1
model name  : Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
root@host:~# grep processor /proc/cpuinfo |tail -1
processor   : 39

Then you need this test program:

root@host:~# cat threadarena.c
/*
 * gcc -o threadarena threadarena.c \
 *    `GraphicsMagick-config --cppflags --ldflags --libs`
 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <limits.h>
#include <string.h>
#include <time.h>
#include <sys/types.h>
#include <pthread.h>
#include <magick/api.h>

static void* readimage(void* arg) {
      ImageInfo *imageInfo;
      ExceptionInfo exception;
      const char* filename = (const char*) arg;
      ImageCharacteristics chars;

      imageInfo=CloneImageInfo(0);
      GetExceptionInfo(&exception);

      (void) strncpy(imageInfo->filename, arg, MaxTextExtent-1 );

      Image *image = ReadImage(imageInfo, &exception);
      if (image == (Image *) NULL)
      {
            CatchException(&exception);
            perror("ReadImage");
      }

      GetImageDepth(image,&image->exception);

      DestroyImage(image);
      DestroyImageInfo(imageInfo);
      DestroyExceptionInfo( &exception );
      return NULL;
}

int main ( int argc, char **argv )
{
      const char* file = argv[1];
      if( file == NULL ) {
            fprintf(stderr,"usage: %s <imagefile>\n", argv[0]);
            return 1;
      }

      InitializeMagick(NULL);

      while(1) {
            pid_t myp = getpid();
            char s[PATH_MAX];
#if 1
            /* this code path has a thread arena leak */
            pthread_t thread_id;
            pthread_create(&thread_id, NULL, readimage, argv[1]);

            /* pthread_detach(thread_id); also shows the problem */
            pthread_join(thread_id, NULL);
#else
            /* this code path doesn't have a thread arena leak */
            readimage(argv[1]);
#endif

            sleep(1);

            fprintf(stderr,"%lld\n", (long long)time(0));
            snprintf(s,sizeof(s),
                  "ps -o vsz -o rss -o user -o command -p %lld", (long long)myp);
            system(s);
            snprintf(s,sizeof(s),
                  "pmap -x %lld | grep 65404 | wc -l", (long long)myp);
            system(s);
      }

      DestroyMagick();

      return 0;
}

Compile and run the test program:

root@host:~# gcc -o threadarena threadarena.c `GraphicsMagick-config --cppflags --ldflags --libs`

root@host:~# ./threadarena  Zoom140.png
1685713732
   VSZ   RSS USER     COMMAND
2443096 8248 root     ./threadarena Zoom140.png
35
1685713734
   VSZ   RSS USER     COMMAND
4802392 9816 root     ./threadarena Zoom140.png
70
1685713735
   VSZ   RSS USER     COMMAND
7161688 11352 root    ./threadarena Zoom140.png
105
1685713736
   VSZ   RSS USER     COMMAND
9520984 12920 root    ./threadarena Zoom140.png
140
1685713737
   VSZ   RSS USER     COMMAND
11880280 14456 root   ./threadarena Zoom140.png
175
1685713738
   VSZ   RSS USER     COMMAND
14239576 16040 root   ./threadarena Zoom140.png
210
1685713739
   VSZ   RSS USER     COMMAND
16598872 17608 root   ./threadarena Zoom140.png
245
1685713740
   VSZ   RSS USER     COMMAND
18958168 19128 root   ./threadarena Zoom140.png
280
1685713741
   VSZ   RSS USER     COMMAND
20989784 20660 root   ./threadarena Zoom140.png
310
1685713742
   VSZ   RSS USER     COMMAND
20989784 22100 root   ./threadarena Zoom140.png
309
1685713743
   VSZ   RSS USER     COMMAND
20989784 23492 root   ./threadarena Zoom140.png
308
1685713744
   VSZ   RSS USER     COMMAND
20989784 24900 root   ./threadarena Zoom140.png
307
1685713745
   VSZ   RSS USER     COMMAND
20989784 26292 root   ./threadarena Zoom140.png
306

As you can see, the number of thread arenas (65404 k mapped sections)
according to pmap grows dramatically with each invocation of
GetImageDepth(), and the virtual memory usage blows up to ~20G before
stabilizing. RSS also increases, albeit less dramatically, but eventually
the system does start to hit an OOM condition.

My understanding of malloc thread arenas is that new threads are supposed
to reuse the arenas, and only attempt to create new ones when an existing
arena is locked, so I assume that's what's happening here. And I also assume
there's some missing housekeeping that's also leaking RSS memory.

Some fun facts about the bug:

1. so far, only GetImageDepth() triggers the error. Reading pixel data and
other transformations seems fine.

2. it only happens when the API is called from within a thread (hence the
pthread_start()). If you call it from the main process, no problem.

3. the problem can be mitigated by mallopt(M_ARENA_MAX, n); for some fixed
n (currently running with 8), although I haven't really quantified what that does to
performance. It seems to stop the RSS leak.

4. I found suggestions that a periodic malloc_trim(0) might mitigate this,
but it doesn't seem to have any impact.

5. obviously, this will only be an issue for long-running GraphicsMagick
processes, such as FastCGI type of applications.

#1037042#10

Date:: 2023-06-02 18:38:54 UTC

From:

To:

Christophe,

It would be good to test a modern GM version which supports "resource
limited memory".  Probably Debian's 1.4 version supports that.

For a version which supports the resource limited memory allocator,
you can set 'MAGICK_DEBUG=resource' in the environment prior to
running the program.  This will cause very verbose output with a tally
similar to:

13:27:40 0:0.007952  0.050u 173194 resource.c/unknown/236/Resource:
   memory +65.2MiB/81.0MiB/39.0GiB
13:27:40 0:0.027664  0.190u 173194 resource.c/unknown/818/Resource:
   memory -3.3KiB/81.0MiB/39.0GiB
13:27:40 0:0.027692  0.190u 173194 resource.c/unknown/818/Resource:
.
.
.
13:27:40 0:0.033629  0.220u 173194 resource.c/unknown/818/Resource:
   memory -65.2MiB/0B/39.0GiB

The "0B" at the end means that at least as related to memory allocated
by the resource limited memory allocator (used only for non-public
allocations), there are no leaks.

The type of leak you seem to be finding is a thread-specific memory
allocation leak where a destructor did not fire to free the memory
when the thread quit.  I don't believe that GetImageDepth() uses
thread-specific memory but the "thread arena"s that you are talking
about likely do.

For allocations which specifically use a memory allocator, I have had
some good luck using Glibc's mtrace.

Bob

#1037042#15

Date:: 2023-06-02 19:32:08 UTC

From:

To:

I'm kind of stuck with this version of GM; my organization is currently tracking Debian LTS.

I suppose I should have added some more points... while digging into this I did run diagnostics under gdb on my "real" application.

The threads themselves are terminating correctly; there aren't any lingering/unjoined threads, and the internal pthreads cleanup routines are all being hit. No sign of any red flags.

Running with MAGICK_DEBUG=resource:

root@host:~# MAGICK_DEBUG=resource ./threadarena  Zoom140.png
18:46:16 0:0.000223  0.000u 2666849 resource.c/InitializeMagickResources/447/Resource:
  Total physical memory 128776 MB (32966668 pages and 4096 bytes per page)
18:46:16 0:0.000279  0.000u 2666849 resource.c/InitializeMagickResources/615/Resource:
  40 CPU cores are available
18:46:16 0:0.000294  0.000u 2666849 resource.c/InitializeMagickResources/644/Resource:
  System file open limits are 1024 soft, 1048576 hard
18:46:16 0:0.000306  0.000u 2666849 resource.c/SetMagickResourceLimit/964/Resource:
  Set files resource limit to 256
18:46:16 0:0.000321  0.000u 2666849 resource.c/SetMagickResourceLimit/964/Resource:
  Set map resource limit to 251.5GiB
18:46:16 0:0.000332  0.000u 2666849 resource.c/SetMagickResourceLimit/964/Resource:
  Set memory resource limit to 125.8GiB
18:46:16 0:0.000343  0.000u 2666849 resource.c/SetMagickResourceLimit/964/Resource:
  Set threads resource limit to 40
18:46:16 0:0.001345  0.000u 2666849 resource.c/AcquireMagickResource/235/Resource:
  width +414P/----/256.0MiP
18:46:16 0:0.001372  0.000u 2666849 resource.c/AcquireMagickResource/235/Resource:
  height +377P/----/256.0MiP
18:46:16 0:0.001383  0.000u 2666849 resource.c/AcquireMagickResource/235/Resource:
  pixels +152.4KiP/----/Unlimited
18:46:16 0:0.001394  0.000u 2666849 resource.c/AcquireMagickResource/235/Resource:
  memory +1.5MiB/1.5MiB/125.8GiB
18:46:16 0:0.019786  0.010u 2666849 resource.c/LiberateMagickResource/811/Resource:
  memory -1.5MiB/0B/125.8GiB
1685731577
   VSZ   RSS USER     COMMAND
2443096 8280 root     ./threadarena Zoom140.png
35
18:46:17 0:1.054500  0.100u 2666849 resource.c/AcquireMagickResource/235/Resource:
  width +414P/----/256.0MiP
18:46:17 0:1.054537  0.100u 2666849 resource.c/AcquireMagickResource/235/Resource:
  height +377P/----/256.0MiP
18:46:17 0:1.054554  0.100u 2666849 resource.c/AcquireMagickResource/235/Resource:
  pixels +152.4KiP/----/Unlimited
18:46:17 0:1.054574  0.100u 2666849 resource.c/AcquireMagickResource/235/Resource:
  memory +1.5MiB/1.5MiB/125.8GiB
18:46:17 0:1.073180  0.170u 2666849 resource.c/LiberateMagickResource/811/Resource:
  memory -1.5MiB/0B/125.8GiB
1685731578
   VSZ   RSS USER     COMMAND
4802392 9848 root     ./threadarena Zoom140.png
70

...

20989784 31940 root   ./threadarena Zoom140.png
302
18:46:34 0:17.922374 2.080u 2666849 resource.c/AcquireMagickResource/235/Resource:
  width +414P/----/256.0MiP
18:46:34 0:17.922409 2.080u 2666849 resource.c/AcquireMagickResource/235/Resource:
  height +377P/----/256.0MiP
18:46:34 0:17.922426 2.080u 2666849 resource.c/AcquireMagickResource/235/Resource:
  pixels +152.4KiP/----/Unlimited
18:46:34 0:17.922448 2.080u 2666849 resource.c/AcquireMagickResource/235/Resource:
  memory +1.5MiB/1.5MiB/125.8GiB
18:46:34 0:17.943440 2.180u 2666849 resource.c/LiberateMagickResource/811/Resource:
  memory -1.5MiB/0B/125.8GiB

0B, which backs up what I saw with valgrind/memcheck. It leaks, but it's not a resource leak that GM seems to be able to track.

Unfortunately/interestingly, this test application doesn't quite show the same behaviour under valgrind. That is, if I modify it to only run 10 cycles:

root@host:~# valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all  ./threadarena  Zoom140.png
==2674090== Memcheck, a memory error detector
==2674090== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2674090== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==2674090== Command: ./threadarena Zoom140.png
==2674090==
1685732666
   VSZ   RSS USER     COMMAND
224116 115612 root    /usr/bin/valgrind.bin --tool=memcheck --leak-check=full --
0
1685732677
   VSZ   RSS USER     COMMAND
224196 116692 root    /usr/bin/valgrind.bin --tool=memcheck --leak-check=full --
0
1685732689
   VSZ   RSS USER     COMMAND
228388 118252 root    /usr/bin/valgrind.bin --tool=memcheck --leak-check=full --
0
1685732701
   VSZ   RSS USER     COMMAND
228452 119784 root    /usr/bin/valgrind.bin --tool=memcheck --leak-check=full --
0
1685732711
   VSZ   RSS USER     COMMAND
232644 121336 root    /usr/bin/valgrind.bin --tool=memcheck --leak-check=full --
0
1685732721
   VSZ   RSS USER     COMMAND
232724 122880 root    /usr/bin/valgrind.bin --tool=memcheck --leak-check=full --
0
1685732732
   VSZ   RSS USER     COMMAND
236916 124440 root    /usr/bin/valgrind.bin --tool=memcheck --leak-check=full --
0
1685732743
   VSZ   RSS USER     COMMAND
236980 125972 root    /usr/bin/valgrind.bin --tool=memcheck --leak-check=full --
0
1685732755
   VSZ   RSS USER     COMMAND
241172 127532 root    /usr/bin/valgrind.bin --tool=memcheck --leak-check=full --
0
1685732765
   VSZ   RSS USER     COMMAND
241252 129068 root    /usr/bin/valgrind.bin --tool=memcheck --leak-check=full --
0
==2674090==
==2674090== HEAP SUMMARY:
==2674090==     in use at exit: 216 bytes in 2 blocks
==2674090==   total heap usage: 1,703 allocs, 1,701 frees, 17,954,690 bytes allocated
==2674090==
==2674090== 8 bytes in 1 blocks are still reachable in loss record 1 of 2
==2674090==    at 0x483877F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2674090==    by 0x57B10F8: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==2674090==    by 0x57C1076: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==2674090==    by 0x57AF599: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==2674090==    by 0x400FFE1: call_init.part.0 (dl-init.c:72)
==2674090==    by 0x40100E8: call_init (dl-init.c:30)
==2674090==    by 0x40100E8: _dl_init (dl-init.c:119)
==2674090==    by 0x40010C9: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so)
==2674090==    by 0x1: ???
==2674090==    by 0x1FFF000786: ???
==2674090==    by 0x1FFF000794: ???
==2674090==
==2674090== 208 bytes in 1 blocks are still reachable in loss record 2 of 2
==2674090==    at 0x483877F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==2674090==    by 0x57B10F8: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==2674090==    by 0x57C063A: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==2674090==    by 0x57B1584: omp_set_num_threads (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==2674090==    by 0x49437BB: InitializeMagickResources (in /usr/lib/libGraphicsMagick-Q16.so.3.22.0)
==2674090==    by 0x491117A: InitializeMagickEx (in /usr/lib/libGraphicsMagick-Q16.so.3.22.0)
==2674090==    by 0x4911451: InitializeMagick (in /usr/lib/libGraphicsMagick-Q16.so.3.22.0)
==2674090==    by 0x1093A6: main (in /root/threadarena)
==2674090==
==2674090== LEAK SUMMARY:
==2674090==    definitely lost: 0 bytes in 0 blocks
==2674090==    indirectly lost: 0 bytes in 0 blocks
==2674090==      possibly lost: 0 bytes in 0 blocks
==2674090==    still reachable: 216 bytes in 2 blocks
==2674090==         suppressed: 0 bytes in 0 blocks
==2674090==
==2674090== For lists of detected and suppressed errors, rerun with: -s
==2674090== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

So, that's... great. There are trivial increases in process heap use, but you have to expect that with valgrind. There are no thread arena leaks at all, and only a smattering of memory lost in initialization code.

Granted, the leaking thread arenas are memory mapped areas, so memcheck might not trigger on them. I might have a go at the valgrind-mmt fork unless someone has a better idea.

I was hoping another set of eyes could spot something trivial and obvious in how GetImageDepth() invokes/manages OpenMP, because the more I look at this bug the less I trust my sanity.

#1037042#20

Date:: 2023-06-02 19:54:24 UTC

From:

To:

And to answer the question about mtrace.

root@host:~# MALLOC_TRACE=mtrace.log  ./threadarena Zoom140.png
1685735077
   VSZ   RSS USER     COMMAND
2312024 8124 root     ./threadarena Zoom140.png
33
1685735078
   VSZ   RSS USER     COMMAND
4605784 9684 root     ./threadarena Zoom140.png
67
1685735079
   VSZ   RSS USER     COMMAND
6899544 11232 root    ./threadarena Zoom140.png
101
1685735080
   VSZ   RSS USER     COMMAND
9193304 12796 root    ./threadarena Zoom140.png
135
1685735081
   VSZ   RSS USER     COMMAND
11487064 14328 root   ./threadarena Zoom140.png
169
1685735082
   VSZ   RSS USER     COMMAND
13780824 15924 root   ./threadarena Zoom140.png
203
1685735084
   VSZ   RSS USER     COMMAND
16074584 17424 root   ./threadarena Zoom140.png
237
1685735085
   VSZ   RSS USER     COMMAND
18368344 19004 root   ./threadarena Zoom140.png
271
1685735086
   VSZ   RSS USER     COMMAND
20662104 20536 root   ./threadarena Zoom140.png
305
1685735087
   VSZ   RSS USER     COMMAND
20989784 21948 root   ./threadarena Zoom140.png
309
root@host:~# mtrace ./threadarena mtrace.log
- 0x00007f31380008d0 Free 365 was never alloc'd 0x7f315afcc4d4
- 0x00007f3150008a70 Free 367 was never alloc'd 0x7f315afcc4bb
- 0x00007f3150006ed0 Free 369 was never alloc'd 0x7f315afcc4bb
- 0x00007f3150008300 Free 370 was never alloc'd 0x7f315afcc4bb
- 0x00007f3150008a90 Free 371 was never alloc'd 0x7f315afcc4bb
- 0x00007f315000b350 Free 372 was never alloc'd 0x7f315afcc4bb
- 0x00007f315000b150 Free 373 was never alloc'd 0x7f315afcc4bb
- 0x00007f315000b050 Free 374 was never alloc'd 0x7f315afcc4bb
- 0x00007f315000af50 Free 375 was never alloc'd 0x7f315afcc4bb
...
- 0x00007f2c5000a610 Free 3839 was never alloc'd 0x7f315afcc4bb
- 0x00007f2c50002d90 Free 3840 was never alloc'd 0x7f315afcc4bb
- 0x00007f2c50007df0 Free 3842 was never alloc'd 0x7f315afcc4bb
- 0x00007f2c50011940 Free 3843 was never alloc'd 0x7f315afcc4bb
- 0x00007f2c50011ad0 Free 3844 was never alloc'd 0x7f315afcc4bb
- 0x00007f2c5000a740 Free 3846 was never alloc'd 0x7f315afcc4bb
- 0x00007f2c500008d0 Free 3847 was never alloc'd 0x7f315afcc4d4

Memory not freed:
-----------------
           Address     Size     Caller
0x0000555b1d8652d0     0xd0  at 0x7f315a4a90f9

There's a total of 514 "Free xxx was never alloc'd" messages, which doesn't match the thread arena leaks. It might match the total number of threads OpenMP might use, but it seems a bit off.

As an extra data point, restricting the number of OpenMP threads doesn't fix the problem:

root@host:~# OMP_NUM_THREADS=8 ./threadarena  Zoom140.png
1685735486
   VSZ   RSS USER     COMMAND
345944  8324 root     ./threadarena Zoom140.png
4
1685735487
   VSZ   RSS USER     COMMAND
608088  9724 root     ./threadarena Zoom140.png
7
1685735488
   VSZ   RSS USER     COMMAND
870232 11156 root     ./threadarena Zoom140.png
10
1685735489
   VSZ   RSS USER     COMMAND
1132376 12556 root    ./threadarena Zoom140.png
13
1685735490
   VSZ   RSS USER     COMMAND
1394520 13988 root    ./threadarena Zoom140.png
16
1685735491
   VSZ   RSS USER     COMMAND
1656664 15388 root    ./threadarena Zoom140.png
19
1685735492
   VSZ   RSS USER     COMMAND
1918808 16820 root    ./threadarena Zoom140.png
22
1685735493
   VSZ   RSS USER     COMMAND
2180952 18220 root    ./threadarena Zoom140.png
25
1685735494
   VSZ   RSS USER     COMMAND
2443096 19652 root    ./threadarena Zoom140.png
28
1685735495
   VSZ   RSS USER     COMMAND
2705240 21052 root    ./threadarena Zoom140.png
31

#1037042#25

Date:: 2023-06-03 13:55:14 UTC

From:

To:

I did manage to get the test program compiled on my PC (a laptop).
It was challenging since the source code (extracted from email) seemed
to have hidden characters in it that the C compiler did not like.

I do see the reported RSS very gradually creeping up.  It seems to go
up and then go back down a little bit, but the overall average trends
up.

Looking at /proc/PID/smaps, the size attributed to 'heap' seems
stable.

Bob

#1037042#30

Date:: 2023-06-03 18:27:53 UTC

From:

To:

I am definitely able to confirm that memory consumption builds due to
invoking GetImageDepth() via a POSIX thread.  The rate that it builds
is image sensitive since some images cause GetImageDepth() to perform
more OpenMP loops.

In /proc/PID/smaps I see multiple memory-mapped regions (including
"heap") building at once.

Running under valgrind reports no leaks.

I changed the sole memory allocation used in GetImageDepth() to use
the resource-limited memory allocator and it did not report any leaks.

My own testing is under Ubuntu 20.04 using GCC 10.

Bob

#1037042#35

Date:: 2023-06-04 10:25:32 UTC

From:

To:

Hi,
 Unfortunately I can not reproduce. My processor is an Intel K variant
CPU, six cores and twelve threads, 64 Gb of RAM.
GM is 1.3.40 with two security fixes backported, compiled with GCC
v12.2.0. Tried with three PNG images, all memory consumption is static
from the beginning. Do I need some special case of PNG files to
experience this issue?
 Do you think it might be a problem with another system component, a
GCC optimization or this is fixed meanwhile? At least I do wonder why
this issue is CPU / machine dependent.

Regards,
Laszlo/GCS

#1037042#40

Date:: 2023-06-04 12:58:18 UTC

From:

To:

Using PNG is not important.  The nature of the input image is
important.  Prepare a test image like

   gm convert infile.png -depth 4 testfile.pnm

and then use testfile.pnm as input.

The PNM reader is now also also threaded (in very recent versions).
PNG should work as well to reproduce the bug.

That of course is the question.  It seems like some small bit of state
is being added to (presumably) many thread stacks, I see the number of
'anon' mappings increasing, and the "heap" similarly grows and, but is
released before the program exits.

It is possible that doing some OpenMP thing prior to using POSIX
threads (so OpenMP teams are already created) might change the
outcome.  I have not tried that yet.

I have attached a version of threadarena.c which loops 1024 times and
then quits.

Bob

#1037042#45

Date:: 2023-06-04 14:19:03 UTC

From:

To:

As a further data point, I compiled the test program under Illumos
(Solaris derivative) and it was able to run tens of thousands of loops
without any memory leak or other anomaly.

In this case GCC 7.5.0 was used.

Bob

#1037042#50

Date:: 2023-06-05 12:58:47 UTC

From:

To:

GCC optimization or this is fixed meanwhile?

I'm not sure. It took me most of a week to even properly isolate the problem and find a repeatable test case on that one machine, and even then I feel like it's still a moving target.

I've done more poking and it's not 100% exclusive to GetImageDepth(). In fact, I'm seeing the problem now if I comment out the call to GetImageDepth() (but in my original application, ReadImage() without GetImageDepth() in another code path isn't enough to trigger the bug).

My working theory is something about how GM uses the OpenMP/libgomp API is tickling a bug in a code path that's only seen with certain CPU/memory configs.

The questions I can't answer are (a) is there possibly something incomplete/incorrect in how GM uses OpenMP which could lead to this sort of bug? and (b) just how deep does this rabbit hole go (glibc? kernel?)?

I'm starting to think that I might need to peel off a machine from the development cluster and investigate whether I can reproduce the error in a container, at which point I can play around with different configurations of libraries, compilers, etc.

c.

Unfortunately I can not reproduce. My processor is an Intel K variant
CPU, six cores and twelve threads, 64 Gb of RAM.
GM is 1.3.40 with two security fixes backported, compiled with GCC
v12.2.0. Tried with three PNG images, all memory consumption is static
from the beginning. Do I need some special case of PNG files to
experience this issue?
Do you think it might be a problem with another system component, a
GCC optimization or this is fixed meanwhile? At least I do wonder why
this issue is CPU / machine dependent.

Regards,
Laszlo/GCS

#1037042#55

Date:: 2023-06-05 13:20:26 UTC

From:

To:

<sigh>

Never mind, you can close this. I've managed to reproduce the leak without GM:

/*
 * gcc -o gomparena gomparena.c -fopenmp
 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <pthread.h>

static void* go_omp(void* arg) {
      const int m = 1024;
      double* a = alloca(m*sizeof(double));

      #pragma omp parallel for
      for( int n = 0; n < m; n ++ ) {
            a[n] = n*n;
      }
      return NULL;
}

int main ( int argc, char **argv )
{
      int cycles = 10;

      while(cycles--) {
            pid_t myp = getpid();
            char s[1024];
#if 1
            /* this code path has a thread arena leak */
            pthread_t thread_id;
            pthread_create(&thread_id, NULL, go_omp, NULL);

            /* pthread_detach(thread_id); also shows the problem */
            pthread_join(thread_id, NULL);
#else
            /* this code path doesn't have a thread arena leak */
            go_omp(NULL);
#endif

            sleep(1);

            fprintf(stderr,"%lld\n", (long long)time(0));
            snprintf(s,sizeof(s),
                  "ps -o vsz -o rss -o user -o command -p %lld", (long long)myp);
            system(s);
            snprintf(s,sizeof(s),
                  "pmap -x %lld | grep 65404 | wc -l", (long long)myp);
            system(s);
      }

      return 0;
}

GCC optimization or this is fixed meanwhile?

I'm not sure. It took me most of a week to even properly isolate the problem and find a repeatable test case on that one machine, and even then I feel like it's still a moving target.

I've done more poking and it's not 100% exclusive to GetImageDepth(). In fact, I'm seeing the problem now if I comment out the call to GetImageDepth() (but in my original application, ReadImage() without GetImageDepth() in another code path isn't enough to trigger the bug).

My working theory is something about how GM uses the OpenMP/libgomp API is tickling a bug in a code path that's only seen with certain CPU/memory configs.

The questions I can't answer are (a) is there possibly something incomplete/incorrect in how GM uses OpenMP which could lead to this sort of bug? and (b) just how deep does this rabbit hole go (glibc? kernel?)?

I'm starting to think that I might need to peel off a machine from the development cluster and investigate whether I can reproduce the error in a container, at which point I can play around with different configurations of libraries, compilers, etc.

c.

 Unfortunately I can not reproduce. My processor is an Intel K variant
CPU, six cores and twelve threads, 64 Gb of RAM.
GM is 1.3.40 with two security fixes backported, compiled with GCC
v12.2.0. Tried with three PNG images, all memory consumption is static
from the beginning. Do I need some special case of PNG files to
experience this issue?
 Do you think it might be a problem with another system component, a
GCC optimization or this is fixed meanwhile? At least I do wonder why
this issue is CPU / machine dependent.

Regards,
Laszlo/GCS

#1037042#60

Date:: 2023-06-05 14:13:41 UTC

From:

To:

FWIW, I think I may have figured out what's happening. It's probably something to note in the http://www.graphicsmagick.org/OpenMP.html documentation.

In a nutshell, my understanding is that libgomp keeps track of its thread pool in thread local storage. When that thread local storage happens to be in the main process, all is well, everything works fine, etc. When that thread local storage is in a transient pthread (and, apparently, you're running on a certain class memory/CPU configuration) and your transient thread terminates, that thread local storage doesn't (always) get cleaned up.

What this means in practice is making GM API calls from within a pthread can result in undefined behaviour when an API triggers OpenMP calls.

I suspect this isn't a particularly desirable situation, as GraphicsMagick appears to have put at least some effort into making the API thread-safe.

It likely means that even my mallopt(M_ARENA_MAX,n) fix is still not right, and the only "safe" solution is to disable OpenMP completely in that particular application.

I'm going to keep looking for a better workaround/mitigation. It would be ideal if there was a mechanism to keep the thread pool in the main process and still have it used from the subthread.

GCC optimization or this is fixed meanwhile?

I'm not sure. It took me most of a week to even properly isolate the problem and find a repeatable test case on that one machine, and even then I feel like it's still a moving target.

I've done more poking and it's not 100% exclusive to GetImageDepth(). In fact, I'm seeing the problem now if I comment out the call to GetImageDepth() (but in my original application, ReadImage() without GetImageDepth() in another code path isn't enough to trigger the bug).

My working theory is something about how GM uses the OpenMP/libgomp API is tickling a bug in a code path that's only seen with certain CPU/memory configs.

The questions I can't answer are (a) is there possibly something incomplete/incorrect in how GM uses OpenMP which could lead to this sort of bug? and (b) just how deep does this rabbit hole go (glibc? kernel?)?

I'm starting to think that I might need to peel off a machine from the development cluster and investigate whether I can reproduce the error in a container, at which point I can play around with different configurations of libraries, compilers, etc.

c.

Unfortunately I can not reproduce. My processor is an Intel K variant
CPU, six cores and twelve threads, 64 Gb of RAM.
GM is 1.3.40 with two security fixes backported, compiled with GCC
v12.2.0. Tried with three PNG images, all memory consumption is static
from the beginning. Do I need some special case of PNG files to
experience this issue?
Do you think it might be a problem with another system component, a
GCC optimization or this is fixed meanwhile? At least I do wonder why
this issue is CPU / machine dependent.

Regards,
Laszlo/GCS

#1037042#65

Date:: 2023-06-05 15:32:37 UTC

From:

To:

This is very unfortunate and seems like a serious bug.

Considerable effort has been put into making the API as thread-safe as
possible, but invoking OpenMP code from a POSIX thread (or other sort
of thread) has not been significantly tested.

It would be good to write a bug report against gomp (it seems to still
be a GCC product) so that there is hope to get this fixed, or be
informed of a work-around.

Bob

#1037042#70

Date:: 2023-06-05 16:44:47 UTC

From:

To:

I've managed to find a workaround. Not a great workaround, but it seems to let me take advantage of all cores for OMP without penalty. I'll be running some more tests on that.

The thread needs to call:

    #include <omp.h>
    omp_pause_resource_all(omp_pause_soft);

at thread termination (a hard pause seens fine, too). The program needs to compiled with -fopenmp.

The call releases the TLS variables, and I believe that's what triggers the release of the memory mapped thread arena.

The downsides of that fix are (a) OMP doesn't get to share the thread pool across the program but instead blows it away for each thread the finishes, and (b) calling programs need to know stuff about the use of OMP at a level they really shouldn't need to know about.

It doesn't feel like something that can be shimmed into the GM API in a nice place, as IMHO there really isn't much other than Initialize/DestroyMagick() that should even care about crossing thread boundaries.

It's not entirely a new class of problem. For example, the issue that led me to trying the omp_pause_resource_all() call: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035

I still need to dig deeper as it's not clear to me why my particular machine is seeing the problem, but I certainly intend to follow up with gomp. I really hope it's a bug; a threading API that itself isn't thread-safe would just be... weird.

#1037042#75

Date:: 2023-06-06 13:14:03 UTC

From:

To:

I've managed to reproduce the bug in virtual machines (even on my laptop), in case anyone is curious to see it in action.

The magic configuration is to force the CPU topology to multiple-sockets-multiple-cores (the default KVM config is typically multiple-socket-single-core). The CPU model doesn't appear to matter. The leak obviously grows much more slowly than on my 2-socket-20-core server, but it's clearly visible.

#1037042#80

Date:: 2023-06-07 17:16:49 UTC

From:

To:

Reported to the GCC maintainers: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110156

#1037042 graphicsmagick: GetImageDepth has a thread arena and memory leak #1037042

Just Reply to ...

Reply to submitter ...

Send control command (Silently)

Set Architecture Tags (Silently)