bug #7938
Updated by Andreas Kohlbecker over 5 years ago
This performance problem was observed at the edit production server but might also affect other installations. Users where complaining about the server being very slow since a couple of days. The reason for the server being "slow" is that two sub processes are consuming 100% CPU time of one cpu core each. This time excessive garbage collection could be excluded as cause. ![](Screenshot_20181205_112519.png) ![](Screenshot_20181205_112425.png) The cpu load is produced by the to subprocesses with the PIDs **9527** and **9528** (main process id is 9505) A stake trace with native frames (`/opt/java-oracle/jdk1.8.0_91/bin/jstack -m -F 9505`) included give first insight into what these processes are doing (attachment:jstack-with-native-frames.tdump): ~~~ ----------------- 9527 ----------------- 0x00007fe4a30c8f10 _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x2c0 0x00007fe4a30c9c74 _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4 0x00007fe4a31ac87a _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea 0x00007fe4a31a9fcc _ZN12PhaseIterGVN8optimizeEv + 0x1bc 0x00007fe4a2ce51a9 _ZN7Compile8OptimizeEv + 0x159 0x00007fe4a2ce6fea _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca 0x00007fe4a2c34818 _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198 0x00007fe4a2cf14da _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a 0x00007fe4a2cf2486 _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6 0x00007fe4a32ae77f _ZN10JavaThread17thread_main_innerEv + 0xdf 0x00007fe4a32ae8ac _ZN10JavaThread3runEv + 0x11c 0x00007fe4a3161f78 _ZL10java_startP6Thread + 0x108 ----------------- 9528 ----------------- 0x00007fe4a30c8ed6 _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x286 0x00007fe4a30c9c74 _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4 0x00007fe4a31ac87a _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea 0x00007fe4a31a9fcc _ZN12PhaseIterGVN8optimizeEv + 0x1bc 0x00007fe4a2ce51a9 _ZN7Compile8OptimizeEv + 0x159 0x00007fe4a2ce6fea _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca 0x00007fe4a2c34818 _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198 0x00007fe4a2cf14da _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a 0x00007fe4a2cf2486 _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6 0x00007fe4a32ae77f _ZN10JavaThread17thread_main_innerEv + 0xdf 0x00007fe4a32ae8ac _ZN10JavaThread3runEv + 0x11c 0x00007fe4a3161f78 _ZL10java_startP6Thread + 0x108 ~~~ about 30 minutes later (attachment:jstack-verbose-with-native-frames-2.tdump) these native threads are still at the same point in the stack and seem to be stock in the ['MemNode::can_see_stored_value'](https://hg.openjdk.java.net/jdk8/jdk8/hotspot/log/87ee5ee27509/src/share/vm/opto/memnode.cpp#l978)` c++ function: ~~~ ----------------- 9527 ----------------- 0x00007fe4a30c8f10 _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x2c0 0x00007fe4a30c9c74 _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4 0x00007fe4a31ac87a _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea 0x00007fe4a31a9fcc _ZN12PhaseIterGVN8optimizeEv + 0x1bc 0x00007fe4a2ce51a9 _ZN7Compile8OptimizeEv + 0x159 0x00007fe4a2ce6fea _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca 0x00007fe4a2c34818 _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198 0x00007fe4a2cf14da _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a 0x00007fe4a2cf2486 _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6 0x00007fe4a32ae77f _ZN10JavaThread17thread_main_innerEv + 0xdf 0x00007fe4a32ae8ac _ZN10JavaThread3runEv + 0x11c 0x00007fe4a3161f78 _ZL10java_startP6Thread + 0x108 ----------------- 9528 ----------------- 0x00007fe4a30c8eaa _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x25a 0x00007fe4a30c9c74 _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4 0x00007fe4a31ac87a _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea 0x00007fe4a31a9fcc _ZN12PhaseIterGVN8optimizeEv + 0x1bc 0x00007fe4a2ce51a9 _ZN7Compile8OptimizeEv + 0x159 0x00007fe4a2ce6fea _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca 0x00007fe4a2c34818 _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198 0x00007fe4a2cf14da _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a 0x00007fe4a2cf2486 _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6 0x00007fe4a32ae77f _ZN10JavaThread17thread_main_innerEv + 0xdf 0x00007fe4a32ae8ac _ZN10JavaThread3runEv + 0x11c 0x00007fe4a3161f78 _ZL10java_startP6Thread + 0x108 ~~~ These processes are related to the C2 byte code compiler and seem to be stuck in a loop while attempting to find a value in memory. --- Maybe interesting in in this context is that the `jstack` command threw some UnmappedAddressException: ~~~ sun.jvm.hotspot.debugger.UnmappedAddressException: ddf4471a42deeae8 at sun.jvm.hotspot.debugger.PageCache.checkPage(PageCache.java:208) at sun.jvm.hotspot.debugger.PageCache.getData(PageCache.java:63) at sun.jvm.hotspot.debugger.DebuggerBase.readBytes(DebuggerBase.java:225) at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readCInteger(LinuxDebuggerLocal.java:498) at sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:462) at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readAddress(LinuxDebuggerLocal.java:433) at sun.jvm.hotspot.debugger.linux.LinuxAddress.getAddressAt(LinuxAddress.java:74) at sun.jvm.hotspot.debugger.linux.amd64.LinuxAMD64CFrame.sender(LinuxAMD64CFrame.java:68) at sun.jvm.hotspot.tools.PStack.run(PStack.java:161) at sun.jvm.hotspot.tools.PStack.run(PStack.java:58) at sun.jvm.hotspot.tools.PStack.run(PStack.java:53) at sun.jvm.hotspot.tools.JStack.run(JStack.java:66) at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260) at sun.jvm.hotspot.tools.Tool.start(Tool.java:223) at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118) at sun.jvm.hotspot.tools.JStack.main(JStack.java:92) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.tools.jstack.JStack.runJStackTool(JStack.java:140) at sun.tools.jstack.JStack.main(JStack.java:106) ~~~ From reading https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8208091 these exceptions seem to be unrelated to `MemNode::can_see_stored_value` being stuck. Watching both sub processes with `strace` generates no output, which is another strong indication for that these byte code compiler processes are stuck in a loop of the native method named above. here is another stack trace taken with gdb: ~~~ gdb -batch -ex bt -p 9527 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 0x00007fe4a30c8eaf in MemNode::can_see_stored_value(Node*, PhaseTransform*) const () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #0 0x00007fe4a30c8eaf in MemNode::can_see_stored_value(Node*, PhaseTransform*) const () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #1 0x00007fe4a30c9c74 in LoadNode::Value(PhaseTransform*) const () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #2 0x00007fe4a31ac87a in PhaseIterGVN::transform_old(Node*) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #3 0x00007fe4a31a9fcc in PhaseIterGVN::optimize() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #4 0x00007fe4a2ce51a9 in Compile::Optimize() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #5 0x00007fe4a2ce6fea in Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #6 0x00007fe4a2c34818 in C2Compiler::compile_method(ciEnv*, ciMethod*, int) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #7 0x00007fe4a2cf14da in CompileBroker::invoke_compiler_on_method(CompileTask*) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #8 0x00007fe4a2cf2486 in CompileBroker::compiler_thread_loop() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #9 0x00007fe4a32ae77f in JavaThread::thread_main_inner() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #10 0x00007fe4a32ae8ac in JavaThread::run() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #11 0x00007fe4a3161f78 in java_start(Thread*) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #12 0x00007fe4a481a064 in start_thread (arg=0x7fe43f921700) at pthread_create.c:309 #13 0x00007fe4a454f62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 ~~~ The two blocked processes where spawned some time after the main java process. At current (20018-12-05 14:15) htop reports the folowing `TIME+` values for these processes: * 9505 (main): 1291h = 53d 19h ==> ~ Freitag, 12. Oktober 2018 19:00 * 9527 : 509h = 21d 5h => Mittwoch, 14. November 2018 9:00 * 9528 : 494h = 20d 14h => Donnerstag, 15. November 2018 00:00 so it seems as if the server has been started on Okt. 12th but systemctl reports that the server has been started on Nov 9th: ~~~ systemctl status cdmserver.service ● cdmserver.service - LSB: Start CDM Server daemon Loaded: loaded (/etc/init.d/cdmserver) Active: active (running) since Fri 2018-11-09 10:47:24 CET; 3 weeks 5 days ago CGroup: /system.slice/cdmserver.service ├─9504 jsvc.exec -home /opt/java-oracle/jdk1.8 -user cdm -jvm server -cp /usr/share/java/commons-daem... └─9505 jsvc.exec -home /opt/java-oracle/jdk1.8 -user cdm -jvm server -cp /usr/share/java/commons-daem... ~~~ This matches the information reported by the jvm itself. According to the jvm the server is running since ~627h ~627 and thus was started on Nov 9th. The TIME+ value reported by htop is for sure the sum of all times the 4 cores have been utilized. ==> **The incident has happened ~5 days after the restart of the cdm-server.**