Project

General

Profile

bug #7938

Updated by Andreas Kohlbecker over 5 years ago

This performance problem was observed at the edit production server but might also affect other installations. 

 Users where complaining about the server being very slow since a couple of days. 

 The reason for the server being "slow" is that two sub processes are consuming 100% CPU time of one cpu core each. 
 This time excessive garbage collection could be excluded as cause. 

 ![](Screenshot_20181205_112519.png) 
 ![](Screenshot_20181205_112425.png) 

 The cpu load is produced by the to subprocesses with the PIDs **9527** and **9528** (main process id is 9505) 

 A stake trace with native frames (`/opt/java-oracle/jdk1.8.0_91/bin/jstack -m    -F    9505`) included give first insight into what these processes are doing (attachment:jstack-with-native-frames.tdump): 


 ~~~ 
 ----------------- 9527 ----------------- 
 0x00007fe4a30c8f10    _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x2c0 
 0x00007fe4a30c9c74    _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4 
 0x00007fe4a31ac87a    _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea 
 0x00007fe4a31a9fcc    _ZN12PhaseIterGVN8optimizeEv + 0x1bc 
 0x00007fe4a2ce51a9    _ZN7Compile8OptimizeEv + 0x159 
 0x00007fe4a2ce6fea    _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca 
 0x00007fe4a2c34818    _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198 
 0x00007fe4a2cf14da    _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a 
 0x00007fe4a2cf2486    _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6 
 0x00007fe4a32ae77f    _ZN10JavaThread17thread_main_innerEv + 0xdf 
 0x00007fe4a32ae8ac    _ZN10JavaThread3runEv + 0x11c 
 0x00007fe4a3161f78    _ZL10java_startP6Thread + 0x108 
 ----------------- 9528 ----------------- 
 0x00007fe4a30c8ed6    _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x286 
 0x00007fe4a30c9c74    _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4 
 0x00007fe4a31ac87a    _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea 
 0x00007fe4a31a9fcc    _ZN12PhaseIterGVN8optimizeEv + 0x1bc 
 0x00007fe4a2ce51a9    _ZN7Compile8OptimizeEv + 0x159 
 0x00007fe4a2ce6fea    _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca 
 0x00007fe4a2c34818    _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198 
 0x00007fe4a2cf14da    _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a 
 0x00007fe4a2cf2486    _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6 
 0x00007fe4a32ae77f    _ZN10JavaThread17thread_main_innerEv + 0xdf 
 0x00007fe4a32ae8ac    _ZN10JavaThread3runEv + 0x11c 
 0x00007fe4a3161f78    _ZL10java_startP6Thread + 0x108 
 ~~~ 

 about 30 minutes later (attachment:jstack-verbose-with-native-frames-2.tdump) these native threads are still at the same point in the stack and seem to be stock in the    ['MemNode::can_see_stored_value'](https://hg.openjdk.java.net/jdk8/jdk8/hotspot/log/87ee5ee27509/src/share/vm/opto/memnode.cpp#l978)` c++ function: 

 ~~~ 
 ----------------- 9527 ----------------- 
 0x00007fe4a30c8f10 	 _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x2c0 
 0x00007fe4a30c9c74 	 _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4 
 0x00007fe4a31ac87a 	 _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea 
 0x00007fe4a31a9fcc 	 _ZN12PhaseIterGVN8optimizeEv + 0x1bc 
 0x00007fe4a2ce51a9 	 _ZN7Compile8OptimizeEv + 0x159 
 0x00007fe4a2ce6fea 	 _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca 
 0x00007fe4a2c34818 	 _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198 
 0x00007fe4a2cf14da 	 _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a 
 0x00007fe4a2cf2486 	 _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6 
 0x00007fe4a32ae77f 	 _ZN10JavaThread17thread_main_innerEv + 0xdf 
 0x00007fe4a32ae8ac 	 _ZN10JavaThread3runEv + 0x11c 
 0x00007fe4a3161f78 	 _ZL10java_startP6Thread + 0x108 
 ----------------- 9528 ----------------- 
 0x00007fe4a30c8eaa 	 _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x25a 
 0x00007fe4a30c9c74 	 _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4 
 0x00007fe4a31ac87a 	 _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea 
 0x00007fe4a31a9fcc 	 _ZN12PhaseIterGVN8optimizeEv + 0x1bc 
 0x00007fe4a2ce51a9 	 _ZN7Compile8OptimizeEv + 0x159 
 0x00007fe4a2ce6fea 	 _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca 
 0x00007fe4a2c34818 	 _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198 
 0x00007fe4a2cf14da 	 _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a 
 0x00007fe4a2cf2486 	 _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6 
 0x00007fe4a32ae77f 	 _ZN10JavaThread17thread_main_innerEv + 0xdf 
 0x00007fe4a32ae8ac 	 _ZN10JavaThread3runEv + 0x11c 
 0x00007fe4a3161f78 	 _ZL10java_startP6Thread + 0x108 
 ~~~ 

 These processes seem to be stuck in a loop while attempting to find a value in memory.  

 --- 

 Maybe interesting in in this context is that the `jstack` command threw some UnmappedAddressException: 

 ~~~ 
 sun.jvm.hotspot.debugger.UnmappedAddressException: ddf4471a42deeae8 
         at sun.jvm.hotspot.debugger.PageCache.checkPage(PageCache.java:208) 
         at sun.jvm.hotspot.debugger.PageCache.getData(PageCache.java:63) 
         at sun.jvm.hotspot.debugger.DebuggerBase.readBytes(DebuggerBase.java:225) 
         at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readCInteger(LinuxDebuggerLocal.java:498) 
         at sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:462) 
         at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readAddress(LinuxDebuggerLocal.java:433) 
         at sun.jvm.hotspot.debugger.linux.LinuxAddress.getAddressAt(LinuxAddress.java:74) 
         at sun.jvm.hotspot.debugger.linux.amd64.LinuxAMD64CFrame.sender(LinuxAMD64CFrame.java:68) 
         at sun.jvm.hotspot.tools.PStack.run(PStack.java:161) 
         at sun.jvm.hotspot.tools.PStack.run(PStack.java:58) 
         at sun.jvm.hotspot.tools.PStack.run(PStack.java:53) 
         at sun.jvm.hotspot.tools.JStack.run(JStack.java:66) 
         at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260) 
         at sun.jvm.hotspot.tools.Tool.start(Tool.java:223) 
         at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118) 
         at sun.jvm.hotspot.tools.JStack.main(JStack.java:92) 
         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) 
         at java.lang.reflect.Method.invoke(Method.java:498) 
         at sun.tools.jstack.JStack.runJStackTool(JStack.java:140) 
         at sun.tools.jstack.JStack.main(JStack.java:106) 
 ~~~  

 From reading https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8208091 these exceptions seem to be unrelated to `MemNode::can_see_stored_value` being stuck. 

 Watching both sub processes with `strace` generates no output, which is another strong indication for that these processes are stuck in a loop of the native method named above. 

 here is another stack trace taken with gdb: 

 ~~~ 
 gdb -batch -ex bt -p 9527 
          
 [Thread debugging using libthread_db enabled] 
 Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 
 0x00007fe4a30c8eaf in MemNode::can_see_stored_value(Node*, PhaseTransform*) const () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so 
 #0    0x00007fe4a30c8eaf in MemNode::can_see_stored_value(Node*, PhaseTransform*) const () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so 
 #1    0x00007fe4a30c9c74 in LoadNode::Value(PhaseTransform*) const () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so 
 #2    0x00007fe4a31ac87a in PhaseIterGVN::transform_old(Node*) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so 
 #3    0x00007fe4a31a9fcc in PhaseIterGVN::optimize() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so 
 #4    0x00007fe4a2ce51a9 in Compile::Optimize() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so 
 #5    0x00007fe4a2ce6fea in Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so 
 #6    0x00007fe4a2c34818 in C2Compiler::compile_method(ciEnv*, ciMethod*, int) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so 
 #7    0x00007fe4a2cf14da in CompileBroker::invoke_compiler_on_method(CompileTask*) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so 
 #8    0x00007fe4a2cf2486 in CompileBroker::compiler_thread_loop() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so 
 #9    0x00007fe4a32ae77f in JavaThread::thread_main_inner() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so 
 #10 0x00007fe4a32ae8ac in JavaThread::run() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so 
 #11 0x00007fe4a3161f78 in java_start(Thread*) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so 
 #12 0x00007fe4a481a064 in start_thread (arg=0x7fe43f921700) at pthread_create.c:309 
 #13 0x00007fe4a454f62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 
 ~~~ 


 The two blocked processes where spawned some time after the main java process. At current (20018-12-05 14:15) htop reports the folowing `TIME+` values for these processes: 

 * 9505 (main):    1291h = 53d 19h ==> ~ Freitag, 12. Oktober 2018 19:00 
 * 9527 : 509h = 21d 5h => Mittwoch, 14. November 2018 9:00 
 * 9528 : 494h = 20d 14h    => Donnerstag, 15. November 2018 00:00 

 so it seems as if the server has been started on Okt. 12th but systemctl reports that the server has been started on Nov 9th: 

 ~~~ 
 systemctl status cdmserver.service  
 ● cdmserver.service - LSB: Start CDM Server daemon 
    Loaded: loaded (/etc/init.d/cdmserver) 
    Active: active (running) since Fri 2018-11-09 10:47:24 CET; 3 weeks 5 days ago 
    CGroup: /system.slice/cdmserver.service 
            ├─9504 jsvc.exec -home /opt/java-oracle/jdk1.8 -user cdm -jvm server -cp /usr/share/java/commons-daem... 
            └─9505 jsvc.exec -home /opt/java-oracle/jdk1.8 -user cdm -jvm server -cp /usr/share/java/commons-daem... 
 ~~~ 







Back