Project

General

Profile

bug #5048

cdm-server causes unusual high native memory consumption for jvm

Added by Andreas Kohlbecker over 4 years ago. Updated 7 days ago.

Status:
Closed
Priority:
Priority08
Category:
cdmserver
Target version:
Start date:
06/30/2015
Due date:
% Done:

100%

Severity:
major
Found in Version:

Description

Our cdm-server are suffering from memory problems! Evidence for this can bee seen st least since the middle of last week.

  • edit-int was consuming about 12GB of RAM with only 4 instances running. This became obvious because the server was swapping and became more or less unresponsive. After a restart RAM consumption was at 5GB,
  • edit-test was today also consuming about 12GB of RAM but with over 20 instances. It also was stalled due to swapping memory to the disk.
  • edit-production spit out a lot of monit warnings that the server reached the warning limit of more that 80% RAM being consumed.

see also #5375


General background information

The top rated post to stackoverflow - Java using much more memory than heap size provides an excellent explanation of the whole jvm memory consumption. The factors that sum up to the total memory footprint of a java application are:

Total memory = Heap + Code Cache + Metaspace + Symbol tables +
               Other JVM structures + Thread stacks +
               Direct buffers + Mapped files +
               Native Libraries + Malloc overhead + ...

native memory fragmentation

Native memory tracking

jemalloc

installation and usage

download jemalloc-5.2.1.tar.bz2 from https://github.com/jemalloc/jemalloc/releases and extract the archive. In the extracted folder ($JMALLOC_FOLDER):

./configure --enable-prof
make

the executable and the library file are now found here:

  • $JMALLOC_FOLDER/bin/jeprof
  • $JMALLOC_FOLDER/lib/libjemalloc.so

to profile the cdmlib-remote-webapp instanc or cdmserver, add the following env variables to the eclipse jetty launcher:

LD_PRELOAD=$JMALLOC_FOLDER/lib/libjemalloc.so
MALLOC_CONF=prof:true,lg_prof_interval:24,lg_prof_sample:17

Compared to the above linked posts I am using the quite low lg_prof_interval of 224 (~16MB) to record the allocation fine grained. With this setting I could bin the massive allocations down to the cause Java_java_util_zip_Inflater_init (see below in results)

now start the launcher to run the application. jemalloc will write several files named like jeprof.*.heap in the working directory

the jeprof executable can create diagrams from these files:

bin/jeprof --show_bytes --gif /opt/java-oracle/jdk1.8/bin/java jeprof.*.heap > app-profiling.gif

see picture743-1.png

more details on the result and conclusions below in the chapter Diagnoses and results

perf

(was not valuable to pin down the root cause)

installation

apt install linux-tools-$(uname -r) linux-tools-generic

running

sudo sh -c "echo -1 > /proc/sys/kernel/perf_event_paranoid"  
sudo sh -c "echo 0 > /proc/sys/kernel/kptr_restrict"
perf record -g mvn jetty:run-forked 

analyze

perf report

and search for stack traces containing "Inflator"

Similar issues

MALLOC_ARENA Issue

The MALLOC_ARENA Issue is a known problem with glibc >= 2.10 your servers running jessie have glibc 2.19 installed!

Setting MALLOC_ARENA_MAX env variable to a low value (0-4) could help

export MALLOC_ARENA_MAX=4

but only lead to a decrease in performance but did not reduce memory consumption at all


Diagnoses and results

Diagnosis of edit-test as of 30.06.2015

settings for the cdmserver:

-Xmx4500M -XX:PermSize=512m -XX:MaxPermSize=1800m

system memory usage of the cdm-server process:

KiB Mem:  10266200 total,  9720060 used,   546140 free,    95836 buffers
KiB Swap:  2097148 total,  1870592 used,   226556 free.  1442588 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                   
31761 cdm       20   0 18.997g 6.710g   3084 S   0.0 68.5  42:35.69 jsvc 

Almost all of the Swap space is being used.

$ jmap -heap 31761

Attaching to process ID 31761, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 24.71-b01

using thread-local object allocation.
Garbage-First (G1) GC with 4 thread(s)

Heap Configuration:
   MinHeapFreeRatio = 40
   MaxHeapFreeRatio = 70
   MaxHeapSize      = 4718592000 (4500.0MB)
   NewSize          = 1363144 (1.2999954223632812MB)
   MaxNewSize       = 17592186044415 MB
   OldSize          = 5452592 (5.1999969482421875MB)
   NewRatio         = 2
   SurvivorRatio    = 8
   PermSize         = 536870912 (512.0MB)
   MaxPermSize      = 1887436800 (1800.0MB)
   G1HeapRegionSize = 1048576 (1.0MB)

Heap Usage:
G1 Heap:
   regions  = 4380
   capacity = 4592762880 (4380.0MB)
   used     = 3908545928 (3727.479866027832MB)
   free     = 684216952 (652.520133972168MB)
   85.10228004629754% used
G1 Young Generation:
Eden Space:
   regions  = 602
   capacity = 874512384 (834.0MB)
   used     = 631242752 (602.0MB)
   free     = 243269632 (232.0MB)
   72.18225419664269% used
Survivor Space:
   regions  = 10
   capacity = 10485760 (10.0MB)
   used     = 10485760 (10.0MB)
   free     = 0 (0.0MB)
   100.0% used
G1 Old Generation:
   regions  = 3116
   capacity = 3707764736 (3536.0MB)
   used     = 3266817416 (3115.479866027832MB)
   free     = 440947320 (420.52013397216797MB)
   88.10746227454275% used
Perm Generation:
   capacity = 1504706560 (1435.0MB)
   used     = 1504138704 (1434.4584503173828MB)
   free     = 567856 (0.5415496826171875MB)
   99.96226134615908% used

59361 interned Strings occupying 5910712 bytes.

so this is in total ~10GB of heap capacity for this process whereas G1 Old Generation and Perm Generation are together using ~5GB alone!!!

Diagnosing with jemalloc

Result:

cutout of attachment:app-profiling.gif

bin/jeprof --show_bytes /opt/java-oracle/jdk1.8/bin/java jeprof.*.heap

(jeprof) top
Total: 43983701067 B
40533531248  92.2%  92.2% 40533531248  92.2% je_prof_backtrace
3221225472   7.3%  99.5% 3221225472   7.3% init
138076277   0.3%  99.8% 138076277   0.3% readCEN
88637123   0.2% 100.0% 6607433823  15.0% Java_java_util_zip_Inflater_init
 2230945   0.0% 100.0%  2230945   0.0% allocate_dtv
       0   0.0% 100.0%  5177469   0.0% 0x00007f8b667661b8
       0   0.0% 100.0% 481181154   1.1% 0x00007f8b66778a33
       0   0.0% 100.0%  8653482   0.0% 0x00007f8b6677a43a
       0   0.0% 100.0%  8653482   0.0% 0x00007f8b6677a448
       0   0.0% 100.0% 10687701   0.0% 0x00007f8b66783835

This analysis clearly shows that the problem is related top Java_java_util_zip_Inflater_init which uses zlib.

Searching the web with this information reveals a bug in a range of oracle jdk versions:

The bug was reported for jdk8u40 (was not reproducible in jdku60) but persists in jdk8u131 and for sure in the latest jdk8 since oracle decided for not fixing this in jdk8 versions. In jdk9 ServiceLoader has exchanged by a new implementation which doesn't have this bug.

Switching to jdk9 as JRE is not a quick option, in the jetty bug report 575 a workaround is suggested which uses a custom cache configuration to disable url caching which effectively disables the buggy ServiceLoader:

Try creating an xml file (eg called "caches.xml") with the following contents:

<Configure id="Server" class="org.eclipse.jetty.server.Server">

 <Set class="org.eclipse.jetty.util.resource.Resource" name="defaultUseCaches">false</Set> 
 <New class="java.io.File">
   <Arg>
    <SystemProperty name="java.io.tmpdir"/>
   </Arg>
   <Call name="toURI">
     <Call id="url" name="toURL">
       <Call name="openConnection">
         <Set name="defaultUseCaches">false</Set>
       </Call>
     </Call>
   </Call>
 </New>
</Configure>

Then put that on the command line eg java -jar ../start.jar caches.xml

Applying this locally effectively reduced RRS from 1297,05078125 to 495,04296875 (compared with the measurement from comment 17) and thus freed 802,0078125 MB

pmap

analysis of cdmserver running 10 cdm instances with -Xmx3000M -XX:+UseG1GC with pmap:

sudo pmap -p 18803 -x > pmap-10-instances-q.txt
# summing up by category uing awk, categories are listed below in the table, 
# here at the example of .jar
egrep "\.jar" pmap-10-instances-q.txt | awk  'BEGIN{sum=0}{gsub("K","",$2); sum+=$2;}END{print sum;}'
KB MB
total as reported by pmap 12058940 11776,3085937
anon 11979648 11698,875
*.so 59888 58,484375
*.jar 15336 14,9765625
stack 136 0,1328125
java 8 0,0078125
*.cfs 148 0,14453125
*.js *.css *.ico 180 0,17578125
./locale/locale-archive 3564 3,48046875
/tmp/hsperfdata_usr/18803 32 0,03125
sum 11776,30

Conclusion

  1. The actual native memory consumption as a significant variance. For the test server running 18 cdm-webapp instances the RSS measure has a variance of ~1GB.
  2. The java8 memory leak was causing some of the RSS overhead but there is still a significant amount of native memory for which clear evidence could not be found, so the cause for this overhead is still unclear. The jdk8MemleakFix saved about 2GB of memory in the test server running 18 instances.
  3. As far as the current state of the analysis shows, it is very likely that the huge RSS overhead is caused by memory fragmentation. Using jemalloc insted of malloc can help to limit the native memory consumption a bit. --> SUCCESS
  4. Tuning memory allocation by setting MALLOC_* environment variable may help, but the settings as posted in stackoverflow - Java using much more memory than heap size (linked above) did not decrease the RSS level. More analysis is needed to find the appropriate parameter values.

memory-analysis.tar.gz (1.83 KB) Andreas Kohlbecker, 03/22/2016 04:38 PM

memory-footprint-comparison.ods (47 KB) Andreas Kohlbecker, 03/22/2016 04:39 PM

picture464-1.png View (63.3 KB) Andreas Kohlbecker, 11/04/2019 06:07 PM

picture323-1.png View (62.8 KB) Andreas Kohlbecker, 11/05/2019 11:24 AM

picture743-1.png View (77.1 KB) Andreas Kohlbecker, 11/05/2019 09:58 PM

WSTE_KevinGrigorenko_LinuxNative.pdf (391 KB) Andreas Kohlbecker, 11/11/2019 07:42 AM

testserver-18-instances-jemalloc.png View (59.1 KB) Andreas Kohlbecker, 11/11/2019 12:49 PM

test-server-18instances-malloc.png View (48.7 KB) Andreas Kohlbecker, 11/11/2019 12:50 PM


Related issues

Related to Edit - task #5375: cdm-server without jsvc In Progress 11/12/2015
Copied to Edit - task #8674: configure cdm-server docker image to run with jemalloc Closed 06/30/2015
Copied to Edit - bug #8682: cdm-server debian packages configured for jemalloc New 06/30/2015

Associated revisions

Revision d81181d8 (diff)
Added by Andreas Kohlbecker 15 days ago

fix #5048 disabling url caching for jdk1.8 to avoid memleak

Revision bbe6c98f (diff)
Added by Andreas Kohlbecker 14 days ago

ref #5048 disabling url caching per instance for jdk1.8 to avoid memleak

History

#1 Updated by Andreas Müller over 4 years ago

  • Target version changed from Release 3.7 to Release 3.8

#2 Updated by Andreas Kohlbecker about 4 years ago

  • Target version deleted (Release 3.8)

moving unresolved tickets to next milestone after release of 3.8.0

#3 Updated by Andreas Kohlbecker about 4 years ago

  • Severity changed from normal to critical

Just an Idea: Could this be related to the jsvc process which is starting the JVM as a linux service?

The edit-test server by the way has shutdown problems and in the past we had the impression that this could be due to the jsvc process. This was purely hypothetically and has never been confirmed.

#4 Updated by Andreas Kohlbecker almost 4 years ago

Replying to a.kohlbecker:

Just an Idea: Could this be related to the jsvc process which is starting the JVM as a linux service?

The edit-test server by the way has shutdown problems and in the past we had the impression that this could be due to the jsvc process. This was purely hypothetically and has never been confirmed.

I run a test to falsify this idea by starting the cdm-server.jar directly without making use of the jsvc:

su -l cdm
/opt/java-oracle/jdk1.7/bin/java -Xmx4500M -XX:PermSize=512m -XX:MaxPermSize=1800m -XX:+UseG1GC -XX:ParallelGCThreads=4 -XX:ConcGCThreads=4 -Duser.home=/opt/cdmserver -Dspring.profiles.active=remoting -Dlog4j.configuration=file:///opt/cdmserver/.cdmLibrary/log4j.properties -Dspring.profiles.active=remoting -jar /opt/cdmserver/cdm-server.jar -httpPort=8080 -logdir=/var/log/cdmserver -contextPathPrefix=cdmserver 

Result:

the VRAM and swap space usage was not influenced by keeping jsvc out of the process chain, so jsvc is for sure not the cause for the excessive memory consumption.

#5 Updated by Andreas Kohlbecker almost 4 years ago

  • Keywords changed from memory to memory, performance,

#6 Updated by Andreas Müller almost 4 years ago

  • Target version deleted (Release 3.12)

#7 Updated by Andreas Kohlbecker almost 4 years ago

latest examination shows that a significant about memory which is allocated for the cdm-server process seems to be used for the Arenas. I will add some more information on this topic to the ticket description.....

#8 Updated by Andreas Kohlbecker over 3 years ago

  • Target version changed from Release 4.0 to Release 4.1
  • Priority changed from Highest to Priority11
  • Severity changed from critical to major

recently another improvement of the memory consumption has been applied to cdmlib-remote-webapp:

Swagger was identified to consume a lot of memory after the upgrade to v4.2.3 (#5397), therefore the dynamic swagger documentation has been replaced by a static one:

* 7a3e2b3 - Tue, 22 Mar 2016 12:57:43 +0100 (3 hours ago) (HEAD, origin/develop, develop)
    #5657 futher fixes for the static swagger doc - Andreas Kohlbecker
* 07d9161 - Mon, 21 Mar 2016 17:45:42 +0100 (23 hours ago)
    #5657 incomplete static swagger documentation fixed - Andreas Kohlbecker
* f7540ea - Wed, 24 Feb 2016 16:22:34 +0100 (4 weeks ago)
    updating data in swagger doc - Andreas Kohlbecker
* 6e13f75 - Mon, 22 Feb 2016 17:56:42 +0100 (4 weeks ago)
|   index on develop: 4ccc0ea serving swagger api doc in production mode (default) from static files:   
   - default spring profile with StaticSwaggerApiDoc controller
   - swagger spring profiles with generically generated documentaiton only for integration test environments
   - creating the resulting war file after running the integration tests
   - during integration tests the SwaggerGroupsIT class fetches the dynamically provided documentation - Andreas Kohlbecker
* 4ccc0ea - Mon, 22 Feb 2016 17:34:19 +0100 (4 weeks ago)
    serving swagger api doc in production mode (default) from static files:
   - default spring profile with StaticSwaggerApiDoc controller
   - swagger spring profiles with generically generated documentaiton only for integration test environments
   - creating the resulting war file after running the integration tests
   - during integration tests the SwaggerGroupsIT class fetches the dynamically provided documentation - Andreas Kohlbecker
* a89dffc - Mon, 22 Feb 2016 17:34:19 +0100 (4 weeks ago) (static-swagger)
    serving swagger api doc in production mode (default) from static files:
   - default spring profile with StaticSwaggerApiDoc controller
   - swagger spring profiles with generically generated documentaiton only for integration test environments   - creating the resulting war file after running the integration tests
   - during integration tests the SwaggerGroupsIT class fetches the dynamically provided documentation - Andreas Kohlbecker

This spreadsheet attachment:" memory-footprint-comparison.ods" shows the analysis and final results.

The final JVM internal memory consumption per instance is now like:

  • heap per instance: 153 MB

  • heap cdmserver: 15 MB

  • perm gen space per instance (or native Meatspace memory on java 8): 55 MB

  • perm gen space cdmserver (or native Meatspace memory on java 8): 20 MB

These eu.etaxonomy.cdm.server.AssumedMemoryRequirements has been updated to these values

IMPORTANT: The memory consumption is not increasing with more instances!

The last issue to tackle is to find out why the jvm is using an very high amount of naive memory.

A new measurement as in the ticket description above or with the jvm-memory-details.sh (see memory-analysis.tar.gz) script running the server with java 8 could be interesting.

lowering priority

#9 Updated by Andreas Kohlbecker over 3 years ago

  • Target version changed from Release 4.1 to Reviewed Next Major Release

#10 Updated by Andreas Kohlbecker about 3 years ago

  • Tags changed from memory, performance to performance
  • Description updated (diff)

#11 Updated by Andreas Kohlbecker 11 months ago

  • Description updated (diff)
  • % Done changed from 0 to 20

#12 Updated by Andreas Kohlbecker 11 months ago

  • Description updated (diff)

#13 Updated by Andreas Kohlbecker 10 months ago

  • Related to task #5375: cdm-server without jsvc added

#14 Updated by Andreas Müller 9 months ago

  • Priority changed from Priority11 to Priority08

#15 Updated by Andreas Kohlbecker 18 days ago

  • Description updated (diff)

The diagnosis method described in this article may help analyzing the problem: https://technology.blog.gov.uk/2015/12/11/using-jemalloc-to-get-to-the-bottom-of-a-memory-leak/

#16 Updated by Andreas Kohlbecker 17 days ago

timeseries monitoring of the cdm-server running at edit-test with 6 instances:

This diagram shows:

  • blue: total memory allocated to the cdmserver process (RSS) measured with ps
  • green: (nonheap-plus-heap) sum of jvm heap + jvm nonheap memory usage as measured by JMX bean (the green area above the heap area thus shows the non heap in the diagram)
  • dark greyish green: (cdmsererver.heap-used) sum of jvm heap memory usage as measured by JMX bean

#17 Updated by Andreas Kohlbecker 17 days ago

a single cdmlib-remote-webapp instance started from within the eclipse ide:

  • RSS: 1297,05078125 MB
  • heap used: +/- 238MB
  • non heap: 93,5556640625MB

so the amount of allocated rss this is roughly 5 x used heap in this case and 2,6 x used heap in the situation measured in the comment above

#18 Updated by Andreas Kohlbecker 16 days ago

#19 Updated by Andreas Kohlbecker 16 days ago

  • Description updated (diff)

#20 Updated by Andreas Kohlbecker 16 days ago

#21 Updated by Andreas Kohlbecker 15 days ago

  • Description updated (diff)

#22 Updated by Andreas Kohlbecker 15 days ago

  • Description updated (diff)

#23 Updated by Andreas Kohlbecker 15 days ago

  • Description updated (diff)

#24 Updated by Andreas Kohlbecker 15 days ago

  • Description updated (diff)
  • Private changed from Yes to No

#25 Updated by Andreas Kohlbecker 15 days ago

  • Description updated (diff)

I will now equip edit-test and edit-integration with the caches.xml workaround to disable url caching...

#26 Updated by Andreas Kohlbecker 15 days ago

  • Status changed from New to In Progress
  • Target version changed from Reviewed Next Major Release to Release 5.11
  • % Done changed from 20 to 50

Andreas Kohlbecker wrote:

I will now equip edit-test and edit-integration with the caches.xml workaround to disable url caching...

this has no effect since the cdmserver does not pass the cache.xml to the jetty server.
The simplest solution thus is to apply the disablement of the url caching directly in the cdmserver Bootloader class.

#27 Updated by Andreas Kohlbecker 15 days ago

  • Status changed from In Progress to Resolved

#28 Updated by Andreas Kohlbecker 14 days ago

after applying the jdk8 memleak fix the effect on the cdm-server is not as good as for the cdmlib-remote-webapp alone:

application heap usage non heap usage RSS
cdmserver (1 instance) 360 MB 100 MB 1639 MB
cdmlib-remote-webapp 238 MB 94 MB 495 MB

it seems as if there is still another problem

#29 Updated by Andreas Müller 14 days ago

should we put back to "In Progress" or/and move to next milestone?

#30 Updated by Andreas Kohlbecker 13 days ago

  • Description updated (diff)

#31 Updated by Andreas Kohlbecker 13 days ago

  • Description updated (diff)

#32 Updated by Andreas Kohlbecker 13 days ago

  • Description updated (diff)

#33 Updated by Andreas Kohlbecker 13 days ago

  • Description updated (diff)

#34 Updated by Andreas Kohlbecker 12 days ago

  • Description updated (diff)

#35 Updated by Andreas Kohlbecker 10 days ago

  • Description updated (diff)

#37 Updated by Andreas Kohlbecker 10 days ago

  • Description updated (diff)

#38 Updated by Andreas Kohlbecker 10 days ago

  • Description updated (diff)

#39 Updated by Andreas Kohlbecker 10 days ago

  • Description updated (diff)

#40 Updated by Andreas Kohlbecker 10 days ago

  • Description updated (diff)

#41 Updated by Andreas Kohlbecker 10 days ago

  • Tags changed from performance to performance, memory

#44 Updated by Andreas Kohlbecker 10 days ago

  • Description updated (diff)

configuring the test server to use jemalloc instead of malloc? drastically reduced the native memory overhead:

test server with 18 instances running with malloc:

test server with 18 instances running with jemalloc:

The startup time of the cdm-server seems to be the same so there is so far no sign of performance degradation due to using jemalloc`

#45 Updated by Andreas Kohlbecker 10 days ago

  • Copied to task #8674: configure cdm-server docker image to run with jemalloc added

#46 Updated by Andreas Kohlbecker 10 days ago

  • Subject changed from cdm-server causes unusual high memory consumption to cdm-server causes unusual high native memory consumption for jvm

#47 Updated by Andreas Kohlbecker 7 days ago

  • Copied to bug #8682: cdm-server debian packages configured for jemalloc added

#48 Updated by Andreas Kohlbecker 7 days ago

  • Status changed from Resolved to Closed
  • % Done changed from 50 to 100

last remaining subtask copied to #8682. ticket can now be closed.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 40 MB)