- Package:
- graphicsmagick
- Source:
- graphicsmagick
- Description:
- collection of image processing tools
- Submitter:
- Beauregard,Christophe (ECCC)
- Date:
- 2023-06-07 17:21:03 UTC
- Severity:
- normal
- Tags:
The GraphicsMagick API has a thread arena and more general memory leak
somewhere in the GetImageDepth() API call, presumably a side-effect of
the OpenMP usage.
First, you need a server class system... a lot of beefy cores (i.e. I
can't reproduce it on my quad-core laptop), and possibly a lot of RAM
(128G on this one):
root@host:~# grep 'model name' /proc/cpuinfo |tail -1
model name : Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
root@host:~# grep processor /proc/cpuinfo |tail -1
processor : 39
Then you need this test program:
root@host:~# cat threadarena.c
/*
* gcc -o threadarena threadarena.c \
* `GraphicsMagick-config --cppflags --ldflags --libs`
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <limits.h>
#include <string.h>
#include <time.h>
#include <sys/types.h>
#include <pthread.h>
#include <magick/api.h>
static void* readimage(void* arg) {
ImageInfo *imageInfo;
ExceptionInfo exception;
const char* filename = (const char*) arg;
ImageCharacteristics chars;
imageInfo=CloneImageInfo(0);
GetExceptionInfo(&exception);
(void) strncpy(imageInfo->filename, arg, MaxTextExtent-1 );
Image *image = ReadImage(imageInfo, &exception);
if (image == (Image *) NULL)
{
CatchException(&exception);
perror("ReadImage");
}
GetImageDepth(image,&image->exception);
DestroyImage(image);
DestroyImageInfo(imageInfo);
DestroyExceptionInfo( &exception );
return NULL;
}
int main ( int argc, char **argv )
{
const char* file = argv[1];
if( file == NULL ) {
fprintf(stderr,"usage: %s <imagefile>\n", argv[0]);
return 1;
}
InitializeMagick(NULL);
while(1) {
pid_t myp = getpid();
char s[PATH_MAX];
#if 1
/* this code path has a thread arena leak */
pthread_t thread_id;
pthread_create(&thread_id, NULL, readimage, argv[1]);
/* pthread_detach(thread_id); also shows the problem */
pthread_join(thread_id, NULL);
#else
/* this code path doesn't have a thread arena leak */
readimage(argv[1]);
#endif
sleep(1);
fprintf(stderr,"%lld\n", (long long)time(0));
snprintf(s,sizeof(s),
"ps -o vsz -o rss -o user -o command -p %lld", (long long)myp);
system(s);
snprintf(s,sizeof(s),
"pmap -x %lld | grep 65404 | wc -l", (long long)myp);
system(s);
}
DestroyMagick();
return 0;
}
Compile and run the test program:
root@host:~# gcc -o threadarena threadarena.c `GraphicsMagick-config --cppflags --ldflags --libs`
root@host:~# ./threadarena Zoom140.png
1685713732
VSZ RSS USER COMMAND
2443096 8248 root ./threadarena Zoom140.png
35
1685713734
VSZ RSS USER COMMAND
4802392 9816 root ./threadarena Zoom140.png
70
1685713735
VSZ RSS USER COMMAND
7161688 11352 root ./threadarena Zoom140.png
105
1685713736
VSZ RSS USER COMMAND
9520984 12920 root ./threadarena Zoom140.png
140
1685713737
VSZ RSS USER COMMAND
11880280 14456 root ./threadarena Zoom140.png
175
1685713738
VSZ RSS USER COMMAND
14239576 16040 root ./threadarena Zoom140.png
210
1685713739
VSZ RSS USER COMMAND
16598872 17608 root ./threadarena Zoom140.png
245
1685713740
VSZ RSS USER COMMAND
18958168 19128 root ./threadarena Zoom140.png
280
1685713741
VSZ RSS USER COMMAND
20989784 20660 root ./threadarena Zoom140.png
310
1685713742
VSZ RSS USER COMMAND
20989784 22100 root ./threadarena Zoom140.png
309
1685713743
VSZ RSS USER COMMAND
20989784 23492 root ./threadarena Zoom140.png
308
1685713744
VSZ RSS USER COMMAND
20989784 24900 root ./threadarena Zoom140.png
307
1685713745
VSZ RSS USER COMMAND
20989784 26292 root ./threadarena Zoom140.png
306
As you can see, the number of thread arenas (65404 k mapped sections)
according to pmap grows dramatically with each invocation of
GetImageDepth(), and the virtual memory usage blows up to ~20G before
stabilizing. RSS also increases, albeit less dramatically, but eventually
the system does start to hit an OOM condition.
My understanding of malloc thread arenas is that new threads are supposed
to reuse the arenas, and only attempt to create new ones when an existing
arena is locked, so I assume that's what's happening here. And I also assume
there's some missing housekeeping that's also leaking RSS memory.
Some fun facts about the bug:
1. so far, only GetImageDepth() triggers the error. Reading pixel data and
other transformations seems fine.
2. it only happens when the API is called from within a thread (hence the
pthread_start()). If you call it from the main process, no problem.
3. the problem can be mitigated by mallopt(M_ARENA_MAX, n); for some fixed
n (currently running with 8), although I haven't really quantified what that does to
performance. It seems to stop the RSS leak.
4. I found suggestions that a periodic malloc_trim(0) might mitigate this,
but it doesn't seem to have any impact.
5. obviously, this will only be an issue for long-running GraphicsMagick
processes, such as FastCGI type of applications.
Christophe, It would be good to test a modern GM version which supports "resource limited memory". Probably Debian's 1.4 version supports that. For a version which supports the resource limited memory allocator, you can set 'MAGICK_DEBUG=resource' in the environment prior to running the program. This will cause very verbose output with a tally similar to: 13:27:40 0:0.007952 0.050u 173194 resource.c/unknown/236/Resource: memory +65.2MiB/81.0MiB/39.0GiB 13:27:40 0:0.027664 0.190u 173194 resource.c/unknown/818/Resource: memory -3.3KiB/81.0MiB/39.0GiB 13:27:40 0:0.027692 0.190u 173194 resource.c/unknown/818/Resource: . . . 13:27:40 0:0.033629 0.220u 173194 resource.c/unknown/818/Resource: memory -65.2MiB/0B/39.0GiB The "0B" at the end means that at least as related to memory allocated by the resource limited memory allocator (used only for non-public allocations), there are no leaks. The type of leak you seem to be finding is a thread-specific memory allocation leak where a destructor did not fire to free the memory when the thread quit. I don't believe that GetImageDepth() uses thread-specific memory but the "thread arena"s that you are talking about likely do. For allocations which specifically use a memory allocator, I have had some good luck using Glibc's mtrace. Bob
I'm kind of stuck with this version of GM; my organization is currently tracking Debian LTS. I suppose I should have added some more points... while digging into this I did run diagnostics under gdb on my "real" application. The threads themselves are terminating correctly; there aren't any lingering/unjoined threads, and the internal pthreads cleanup routines are all being hit. No sign of any red flags. Running with MAGICK_DEBUG=resource: root@host:~# MAGICK_DEBUG=resource ./threadarena Zoom140.png 18:46:16 0:0.000223 0.000u 2666849 resource.c/InitializeMagickResources/447/Resource: Total physical memory 128776 MB (32966668 pages and 4096 bytes per page) 18:46:16 0:0.000279 0.000u 2666849 resource.c/InitializeMagickResources/615/Resource: 40 CPU cores are available 18:46:16 0:0.000294 0.000u 2666849 resource.c/InitializeMagickResources/644/Resource: System file open limits are 1024 soft, 1048576 hard 18:46:16 0:0.000306 0.000u 2666849 resource.c/SetMagickResourceLimit/964/Resource: Set files resource limit to 256 18:46:16 0:0.000321 0.000u 2666849 resource.c/SetMagickResourceLimit/964/Resource: Set map resource limit to 251.5GiB 18:46:16 0:0.000332 0.000u 2666849 resource.c/SetMagickResourceLimit/964/Resource: Set memory resource limit to 125.8GiB 18:46:16 0:0.000343 0.000u 2666849 resource.c/SetMagickResourceLimit/964/Resource: Set threads resource limit to 40 18:46:16 0:0.001345 0.000u 2666849 resource.c/AcquireMagickResource/235/Resource: width +414P/----/256.0MiP 18:46:16 0:0.001372 0.000u 2666849 resource.c/AcquireMagickResource/235/Resource: height +377P/----/256.0MiP 18:46:16 0:0.001383 0.000u 2666849 resource.c/AcquireMagickResource/235/Resource: pixels +152.4KiP/----/Unlimited 18:46:16 0:0.001394 0.000u 2666849 resource.c/AcquireMagickResource/235/Resource: memory +1.5MiB/1.5MiB/125.8GiB 18:46:16 0:0.019786 0.010u 2666849 resource.c/LiberateMagickResource/811/Resource: memory -1.5MiB/0B/125.8GiB 1685731577 VSZ RSS USER COMMAND 2443096 8280 root ./threadarena Zoom140.png 35 18:46:17 0:1.054500 0.100u 2666849 resource.c/AcquireMagickResource/235/Resource: width +414P/----/256.0MiP 18:46:17 0:1.054537 0.100u 2666849 resource.c/AcquireMagickResource/235/Resource: height +377P/----/256.0MiP 18:46:17 0:1.054554 0.100u 2666849 resource.c/AcquireMagickResource/235/Resource: pixels +152.4KiP/----/Unlimited 18:46:17 0:1.054574 0.100u 2666849 resource.c/AcquireMagickResource/235/Resource: memory +1.5MiB/1.5MiB/125.8GiB 18:46:17 0:1.073180 0.170u 2666849 resource.c/LiberateMagickResource/811/Resource: memory -1.5MiB/0B/125.8GiB 1685731578 VSZ RSS USER COMMAND 4802392 9848 root ./threadarena Zoom140.png 70 ... 20989784 31940 root ./threadarena Zoom140.png 302 18:46:34 0:17.922374 2.080u 2666849 resource.c/AcquireMagickResource/235/Resource: width +414P/----/256.0MiP 18:46:34 0:17.922409 2.080u 2666849 resource.c/AcquireMagickResource/235/Resource: height +377P/----/256.0MiP 18:46:34 0:17.922426 2.080u 2666849 resource.c/AcquireMagickResource/235/Resource: pixels +152.4KiP/----/Unlimited 18:46:34 0:17.922448 2.080u 2666849 resource.c/AcquireMagickResource/235/Resource: memory +1.5MiB/1.5MiB/125.8GiB 18:46:34 0:17.943440 2.180u 2666849 resource.c/LiberateMagickResource/811/Resource: memory -1.5MiB/0B/125.8GiB 0B, which backs up what I saw with valgrind/memcheck. It leaks, but it's not a resource leak that GM seems to be able to track. Unfortunately/interestingly, this test application doesn't quite show the same behaviour under valgrind. That is, if I modify it to only run 10 cycles: root@host:~# valgrind --tool=memcheck --leak-check=full --show-leak-kinds=all ./threadarena Zoom140.png ==2674090== Memcheck, a memory error detector ==2674090== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. ==2674090== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info ==2674090== Command: ./threadarena Zoom140.png ==2674090== 1685732666 VSZ RSS USER COMMAND 224116 115612 root /usr/bin/valgrind.bin --tool=memcheck --leak-check=full -- 0 1685732677 VSZ RSS USER COMMAND 224196 116692 root /usr/bin/valgrind.bin --tool=memcheck --leak-check=full -- 0 1685732689 VSZ RSS USER COMMAND 228388 118252 root /usr/bin/valgrind.bin --tool=memcheck --leak-check=full -- 0 1685732701 VSZ RSS USER COMMAND 228452 119784 root /usr/bin/valgrind.bin --tool=memcheck --leak-check=full -- 0 1685732711 VSZ RSS USER COMMAND 232644 121336 root /usr/bin/valgrind.bin --tool=memcheck --leak-check=full -- 0 1685732721 VSZ RSS USER COMMAND 232724 122880 root /usr/bin/valgrind.bin --tool=memcheck --leak-check=full -- 0 1685732732 VSZ RSS USER COMMAND 236916 124440 root /usr/bin/valgrind.bin --tool=memcheck --leak-check=full -- 0 1685732743 VSZ RSS USER COMMAND 236980 125972 root /usr/bin/valgrind.bin --tool=memcheck --leak-check=full -- 0 1685732755 VSZ RSS USER COMMAND 241172 127532 root /usr/bin/valgrind.bin --tool=memcheck --leak-check=full -- 0 1685732765 VSZ RSS USER COMMAND 241252 129068 root /usr/bin/valgrind.bin --tool=memcheck --leak-check=full -- 0 ==2674090== ==2674090== HEAP SUMMARY: ==2674090== in use at exit: 216 bytes in 2 blocks ==2674090== total heap usage: 1,703 allocs, 1,701 frees, 17,954,690 bytes allocated ==2674090== ==2674090== 8 bytes in 1 blocks are still reachable in loss record 1 of 2 ==2674090== at 0x483877F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==2674090== by 0x57B10F8: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) ==2674090== by 0x57C1076: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) ==2674090== by 0x57AF599: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) ==2674090== by 0x400FFE1: call_init.part.0 (dl-init.c:72) ==2674090== by 0x40100E8: call_init (dl-init.c:30) ==2674090== by 0x40100E8: _dl_init (dl-init.c:119) ==2674090== by 0x40010C9: ??? (in /usr/lib/x86_64-linux-gnu/ld-2.31.so) ==2674090== by 0x1: ??? ==2674090== by 0x1FFF000786: ??? ==2674090== by 0x1FFF000794: ??? ==2674090== ==2674090== 208 bytes in 1 blocks are still reachable in loss record 2 of 2 ==2674090== at 0x483877F: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so) ==2674090== by 0x57B10F8: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) ==2674090== by 0x57C063A: ??? (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) ==2674090== by 0x57B1584: omp_set_num_threads (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0) ==2674090== by 0x49437BB: InitializeMagickResources (in /usr/lib/libGraphicsMagick-Q16.so.3.22.0) ==2674090== by 0x491117A: InitializeMagickEx (in /usr/lib/libGraphicsMagick-Q16.so.3.22.0) ==2674090== by 0x4911451: InitializeMagick (in /usr/lib/libGraphicsMagick-Q16.so.3.22.0) ==2674090== by 0x1093A6: main (in /root/threadarena) ==2674090== ==2674090== LEAK SUMMARY: ==2674090== definitely lost: 0 bytes in 0 blocks ==2674090== indirectly lost: 0 bytes in 0 blocks ==2674090== possibly lost: 0 bytes in 0 blocks ==2674090== still reachable: 216 bytes in 2 blocks ==2674090== suppressed: 0 bytes in 0 blocks ==2674090== ==2674090== For lists of detected and suppressed errors, rerun with: -s ==2674090== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0) So, that's... great. There are trivial increases in process heap use, but you have to expect that with valgrind. There are no thread arena leaks at all, and only a smattering of memory lost in initialization code. Granted, the leaking thread arenas are memory mapped areas, so memcheck might not trigger on them. I might have a go at the valgrind-mmt fork unless someone has a better idea. I was hoping another set of eyes could spot something trivial and obvious in how GetImageDepth() invokes/manages OpenMP, because the more I look at this bug the less I trust my sanity.
And to answer the question about mtrace.
root@host:~# MALLOC_TRACE=mtrace.log ./threadarena Zoom140.png
1685735077
VSZ RSS USER COMMAND
2312024 8124 root ./threadarena Zoom140.png
33
1685735078
VSZ RSS USER COMMAND
4605784 9684 root ./threadarena Zoom140.png
67
1685735079
VSZ RSS USER COMMAND
6899544 11232 root ./threadarena Zoom140.png
101
1685735080
VSZ RSS USER COMMAND
9193304 12796 root ./threadarena Zoom140.png
135
1685735081
VSZ RSS USER COMMAND
11487064 14328 root ./threadarena Zoom140.png
169
1685735082
VSZ RSS USER COMMAND
13780824 15924 root ./threadarena Zoom140.png
203
1685735084
VSZ RSS USER COMMAND
16074584 17424 root ./threadarena Zoom140.png
237
1685735085
VSZ RSS USER COMMAND
18368344 19004 root ./threadarena Zoom140.png
271
1685735086
VSZ RSS USER COMMAND
20662104 20536 root ./threadarena Zoom140.png
305
1685735087
VSZ RSS USER COMMAND
20989784 21948 root ./threadarena Zoom140.png
309
root@host:~# mtrace ./threadarena mtrace.log
- 0x00007f31380008d0 Free 365 was never alloc'd 0x7f315afcc4d4
- 0x00007f3150008a70 Free 367 was never alloc'd 0x7f315afcc4bb
- 0x00007f3150006ed0 Free 369 was never alloc'd 0x7f315afcc4bb
- 0x00007f3150008300 Free 370 was never alloc'd 0x7f315afcc4bb
- 0x00007f3150008a90 Free 371 was never alloc'd 0x7f315afcc4bb
- 0x00007f315000b350 Free 372 was never alloc'd 0x7f315afcc4bb
- 0x00007f315000b150 Free 373 was never alloc'd 0x7f315afcc4bb
- 0x00007f315000b050 Free 374 was never alloc'd 0x7f315afcc4bb
- 0x00007f315000af50 Free 375 was never alloc'd 0x7f315afcc4bb
...
- 0x00007f2c5000a610 Free 3839 was never alloc'd 0x7f315afcc4bb
- 0x00007f2c50002d90 Free 3840 was never alloc'd 0x7f315afcc4bb
- 0x00007f2c50007df0 Free 3842 was never alloc'd 0x7f315afcc4bb
- 0x00007f2c50011940 Free 3843 was never alloc'd 0x7f315afcc4bb
- 0x00007f2c50011ad0 Free 3844 was never alloc'd 0x7f315afcc4bb
- 0x00007f2c5000a740 Free 3846 was never alloc'd 0x7f315afcc4bb
- 0x00007f2c500008d0 Free 3847 was never alloc'd 0x7f315afcc4d4
Memory not freed:
-----------------
Address Size Caller
0x0000555b1d8652d0 0xd0 at 0x7f315a4a90f9
There's a total of 514 "Free xxx was never alloc'd" messages, which doesn't match the thread arena leaks. It might match the total number of threads OpenMP might use, but it seems a bit off.
As an extra data point, restricting the number of OpenMP threads doesn't fix the problem:
root@host:~# OMP_NUM_THREADS=8 ./threadarena Zoom140.png
1685735486
VSZ RSS USER COMMAND
345944 8324 root ./threadarena Zoom140.png
4
1685735487
VSZ RSS USER COMMAND
608088 9724 root ./threadarena Zoom140.png
7
1685735488
VSZ RSS USER COMMAND
870232 11156 root ./threadarena Zoom140.png
10
1685735489
VSZ RSS USER COMMAND
1132376 12556 root ./threadarena Zoom140.png
13
1685735490
VSZ RSS USER COMMAND
1394520 13988 root ./threadarena Zoom140.png
16
1685735491
VSZ RSS USER COMMAND
1656664 15388 root ./threadarena Zoom140.png
19
1685735492
VSZ RSS USER COMMAND
1918808 16820 root ./threadarena Zoom140.png
22
1685735493
VSZ RSS USER COMMAND
2180952 18220 root ./threadarena Zoom140.png
25
1685735494
VSZ RSS USER COMMAND
2443096 19652 root ./threadarena Zoom140.png
28
1685735495
VSZ RSS USER COMMAND
2705240 21052 root ./threadarena Zoom140.png
31
I did manage to get the test program compiled on my PC (a laptop). It was challenging since the source code (extracted from email) seemed to have hidden characters in it that the C compiler did not like. I do see the reported RSS very gradually creeping up. It seems to go up and then go back down a little bit, but the overall average trends up. Looking at /proc/PID/smaps, the size attributed to 'heap' seems stable. Bob
I am definitely able to confirm that memory consumption builds due to invoking GetImageDepth() via a POSIX thread. The rate that it builds is image sensitive since some images cause GetImageDepth() to perform more OpenMP loops. In /proc/PID/smaps I see multiple memory-mapped regions (including "heap") building at once. Running under valgrind reports no leaks. I changed the sole memory allocation used in GetImageDepth() to use the resource-limited memory allocator and it did not report any leaks. My own testing is under Ubuntu 20.04 using GCC 10. Bob
Hi, Unfortunately I can not reproduce. My processor is an Intel K variant CPU, six cores and twelve threads, 64 Gb of RAM. GM is 1.3.40 with two security fixes backported, compiled with GCC v12.2.0. Tried with three PNG images, all memory consumption is static from the beginning. Do I need some special case of PNG files to experience this issue? Do you think it might be a problem with another system component, a GCC optimization or this is fixed meanwhile? At least I do wonder why this issue is CPU / machine dependent. Regards, Laszlo/GCS
Using PNG is not important. The nature of the input image is important. Prepare a test image like gm convert infile.png -depth 4 testfile.pnm and then use testfile.pnm as input. The PNM reader is now also also threaded (in very recent versions). PNG should work as well to reproduce the bug. That of course is the question. It seems like some small bit of state is being added to (presumably) many thread stacks, I see the number of 'anon' mappings increasing, and the "heap" similarly grows and, but is released before the program exits. It is possible that doing some OpenMP thing prior to using POSIX threads (so OpenMP teams are already created) might change the outcome. I have not tried that yet. I have attached a version of threadarena.c which loops 1024 times and then quits. Bob
As a further data point, I compiled the test program under Illumos (Solaris derivative) and it was able to run tens of thousands of loops without any memory leak or other anomaly. In this case GCC 7.5.0 was used. Bob
GCC optimization or this is fixed meanwhile? I'm not sure. It took me most of a week to even properly isolate the problem and find a repeatable test case on that one machine, and even then I feel like it's still a moving target. I've done more poking and it's not 100% exclusive to GetImageDepth(). In fact, I'm seeing the problem now if I comment out the call to GetImageDepth() (but in my original application, ReadImage() without GetImageDepth() in another code path isn't enough to trigger the bug). My working theory is something about how GM uses the OpenMP/libgomp API is tickling a bug in a code path that's only seen with certain CPU/memory configs. The questions I can't answer are (a) is there possibly something incomplete/incorrect in how GM uses OpenMP which could lead to this sort of bug? and (b) just how deep does this rabbit hole go (glibc? kernel?)? I'm starting to think that I might need to peel off a machine from the development cluster and investigate whether I can reproduce the error in a container, at which point I can play around with different configurations of libraries, compilers, etc. c. Unfortunately I can not reproduce. My processor is an Intel K variant CPU, six cores and twelve threads, 64 Gb of RAM. GM is 1.3.40 with two security fixes backported, compiled with GCC v12.2.0. Tried with three PNG images, all memory consumption is static from the beginning. Do I need some special case of PNG files to experience this issue? Do you think it might be a problem with another system component, a GCC optimization or this is fixed meanwhile? At least I do wonder why this issue is CPU / machine dependent. Regards, Laszlo/GCS
<sigh>
Never mind, you can close this. I've managed to reproduce the leak without GM:
/*
* gcc -o gomparena gomparena.c -fopenmp
*/
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <time.h>
#include <pthread.h>
static void* go_omp(void* arg) {
const int m = 1024;
double* a = alloca(m*sizeof(double));
#pragma omp parallel for
for( int n = 0; n < m; n ++ ) {
a[n] = n*n;
}
return NULL;
}
int main ( int argc, char **argv )
{
int cycles = 10;
while(cycles--) {
pid_t myp = getpid();
char s[1024];
#if 1
/* this code path has a thread arena leak */
pthread_t thread_id;
pthread_create(&thread_id, NULL, go_omp, NULL);
/* pthread_detach(thread_id); also shows the problem */
pthread_join(thread_id, NULL);
#else
/* this code path doesn't have a thread arena leak */
go_omp(NULL);
#endif
sleep(1);
fprintf(stderr,"%lld\n", (long long)time(0));
snprintf(s,sizeof(s),
"ps -o vsz -o rss -o user -o command -p %lld", (long long)myp);
system(s);
snprintf(s,sizeof(s),
"pmap -x %lld | grep 65404 | wc -l", (long long)myp);
system(s);
}
return 0;
}
GCC optimization or this is fixed meanwhile?
I'm not sure. It took me most of a week to even properly isolate the problem and find a repeatable test case on that one machine, and even then I feel like it's still a moving target.
I've done more poking and it's not 100% exclusive to GetImageDepth(). In fact, I'm seeing the problem now if I comment out the call to GetImageDepth() (but in my original application, ReadImage() without GetImageDepth() in another code path isn't enough to trigger the bug).
My working theory is something about how GM uses the OpenMP/libgomp API is tickling a bug in a code path that's only seen with certain CPU/memory configs.
The questions I can't answer are (a) is there possibly something incomplete/incorrect in how GM uses OpenMP which could lead to this sort of bug? and (b) just how deep does this rabbit hole go (glibc? kernel?)?
I'm starting to think that I might need to peel off a machine from the development cluster and investigate whether I can reproduce the error in a container, at which point I can play around with different configurations of libraries, compilers, etc.
c.
Unfortunately I can not reproduce. My processor is an Intel K variant
CPU, six cores and twelve threads, 64 Gb of RAM.
GM is 1.3.40 with two security fixes backported, compiled with GCC
v12.2.0. Tried with three PNG images, all memory consumption is static
from the beginning. Do I need some special case of PNG files to
experience this issue?
Do you think it might be a problem with another system component, a
GCC optimization or this is fixed meanwhile? At least I do wonder why
this issue is CPU / machine dependent.
Regards,
Laszlo/GCS
FWIW, I think I may have figured out what's happening. It's probably something to note in the http://www.graphicsmagick.org/OpenMP.html documentation. In a nutshell, my understanding is that libgomp keeps track of its thread pool in thread local storage. When that thread local storage happens to be in the main process, all is well, everything works fine, etc. When that thread local storage is in a transient pthread (and, apparently, you're running on a certain class memory/CPU configuration) and your transient thread terminates, that thread local storage doesn't (always) get cleaned up. What this means in practice is making GM API calls from within a pthread can result in undefined behaviour when an API triggers OpenMP calls. I suspect this isn't a particularly desirable situation, as GraphicsMagick appears to have put at least some effort into making the API thread-safe. It likely means that even my mallopt(M_ARENA_MAX,n) fix is still not right, and the only "safe" solution is to disable OpenMP completely in that particular application. I'm going to keep looking for a better workaround/mitigation. It would be ideal if there was a mechanism to keep the thread pool in the main process and still have it used from the subthread. GCC optimization or this is fixed meanwhile? I'm not sure. It took me most of a week to even properly isolate the problem and find a repeatable test case on that one machine, and even then I feel like it's still a moving target. I've done more poking and it's not 100% exclusive to GetImageDepth(). In fact, I'm seeing the problem now if I comment out the call to GetImageDepth() (but in my original application, ReadImage() without GetImageDepth() in another code path isn't enough to trigger the bug). My working theory is something about how GM uses the OpenMP/libgomp API is tickling a bug in a code path that's only seen with certain CPU/memory configs. The questions I can't answer are (a) is there possibly something incomplete/incorrect in how GM uses OpenMP which could lead to this sort of bug? and (b) just how deep does this rabbit hole go (glibc? kernel?)? I'm starting to think that I might need to peel off a machine from the development cluster and investigate whether I can reproduce the error in a container, at which point I can play around with different configurations of libraries, compilers, etc. c. Unfortunately I can not reproduce. My processor is an Intel K variant CPU, six cores and twelve threads, 64 Gb of RAM. GM is 1.3.40 with two security fixes backported, compiled with GCC v12.2.0. Tried with three PNG images, all memory consumption is static from the beginning. Do I need some special case of PNG files to experience this issue? Do you think it might be a problem with another system component, a GCC optimization or this is fixed meanwhile? At least I do wonder why this issue is CPU / machine dependent. Regards, Laszlo/GCS
This is very unfortunate and seems like a serious bug. Considerable effort has been put into making the API as thread-safe as possible, but invoking OpenMP code from a POSIX thread (or other sort of thread) has not been significantly tested. It would be good to write a bug report against gomp (it seems to still be a GCC product) so that there is hope to get this fixed, or be informed of a work-around. Bob
I've managed to find a workaround. Not a great workaround, but it seems to let me take advantage of all cores for OMP without penalty. I'll be running some more tests on that.
The thread needs to call:
#include <omp.h>
omp_pause_resource_all(omp_pause_soft);
at thread termination (a hard pause seens fine, too). The program needs to compiled with -fopenmp.
The call releases the TLS variables, and I believe that's what triggers the release of the memory mapped thread arena.
The downsides of that fix are (a) OMP doesn't get to share the thread pool across the program but instead blows it away for each thread the finishes, and (b) calling programs need to know stuff about the use of OMP at a level they really shouldn't need to know about.
It doesn't feel like something that can be shimmed into the GM API in a nice place, as IMHO there really isn't much other than Initialize/DestroyMagick() that should even care about crossing thread boundaries.
It's not entirely a new class of problem. For example, the issue that led me to trying the omp_pause_resource_all() call: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035
I still need to dig deeper as it's not clear to me why my particular machine is seeing the problem, but I certainly intend to follow up with gomp. I really hope it's a bug; a threading API that itself isn't thread-safe would just be... weird.
I've managed to reproduce the bug in virtual machines (even on my laptop), in case anyone is curious to see it in action. The magic configuration is to force the CPU topology to multiple-sockets-multiple-cores (the default KVM config is typically multiple-socket-single-core). The CPU model doesn't appear to matter. The leak obviously grows much more slowly than on my 2-socket-20-core server, but it's clearly visible.
Reported to the GCC maintainers: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110156