bug #7938
opencdmserver sub-processes (threads) with constant high cpu
20%
Description
This performance problem was observed at the edit production server but might also affect other installations.
Users where complaining about the server being very slow since a couple of days.
The reason for the server being "slow" is that two sub processes are consuming 100% CPU time of one cpu core each.
This time excessive garbage collection could be excluded as cause.
The cpu load is produced by the to subprocesses with the PIDs 9527 and 9528 (main process id is 9505)
A stake trace with native frames (/opt/java-oracle/jdk1.8.0_91/bin/jstack -m -F 9505
) included give first insight into what these processes are doing (jstack-with-native-frames.tdump):
----------------- 9527 ----------------- 0x00007fe4a30c8f10 _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x2c0 0x00007fe4a30c9c74 _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4 0x00007fe4a31ac87a _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea 0x00007fe4a31a9fcc _ZN12PhaseIterGVN8optimizeEv + 0x1bc 0x00007fe4a2ce51a9 _ZN7Compile8OptimizeEv + 0x159 0x00007fe4a2ce6fea _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca 0x00007fe4a2c34818 _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198 0x00007fe4a2cf14da _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a 0x00007fe4a2cf2486 _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6 0x00007fe4a32ae77f _ZN10JavaThread17thread_main_innerEv + 0xdf 0x00007fe4a32ae8ac _ZN10JavaThread3runEv + 0x11c 0x00007fe4a3161f78 _ZL10java_startP6Thread + 0x108 ----------------- 9528 ----------------- 0x00007fe4a30c8ed6 _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x286 0x00007fe4a30c9c74 _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4 0x00007fe4a31ac87a _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea 0x00007fe4a31a9fcc _ZN12PhaseIterGVN8optimizeEv + 0x1bc 0x00007fe4a2ce51a9 _ZN7Compile8OptimizeEv + 0x159 0x00007fe4a2ce6fea _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca 0x00007fe4a2c34818 _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198 0x00007fe4a2cf14da _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a 0x00007fe4a2cf2486 _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6 0x00007fe4a32ae77f _ZN10JavaThread17thread_main_innerEv + 0xdf 0x00007fe4a32ae8ac _ZN10JavaThread3runEv + 0x11c 0x00007fe4a3161f78 _ZL10java_startP6Thread + 0x108
about 30 minutes later (jstack-verbose-with-native-frames-2.tdump) these native threads are still at the same point in the stack and seem to be stock in the 'MemNode::can_see_stored_value'` c++ function:
----------------- 9527 ----------------- 0x00007fe4a30c8f10 _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x2c0 0x00007fe4a30c9c74 _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4 0x00007fe4a31ac87a _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea 0x00007fe4a31a9fcc _ZN12PhaseIterGVN8optimizeEv + 0x1bc 0x00007fe4a2ce51a9 _ZN7Compile8OptimizeEv + 0x159 0x00007fe4a2ce6fea _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca 0x00007fe4a2c34818 _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198 0x00007fe4a2cf14da _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a 0x00007fe4a2cf2486 _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6 0x00007fe4a32ae77f _ZN10JavaThread17thread_main_innerEv + 0xdf 0x00007fe4a32ae8ac _ZN10JavaThread3runEv + 0x11c 0x00007fe4a3161f78 _ZL10java_startP6Thread + 0x108 ----------------- 9528 ----------------- 0x00007fe4a30c8eaa _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x25a 0x00007fe4a30c9c74 _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4 0x00007fe4a31ac87a _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea 0x00007fe4a31a9fcc _ZN12PhaseIterGVN8optimizeEv + 0x1bc 0x00007fe4a2ce51a9 _ZN7Compile8OptimizeEv + 0x159 0x00007fe4a2ce6fea _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca 0x00007fe4a2c34818 _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198 0x00007fe4a2cf14da _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a 0x00007fe4a2cf2486 _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6 0x00007fe4a32ae77f _ZN10JavaThread17thread_main_innerEv + 0xdf 0x00007fe4a32ae8ac _ZN10JavaThread3runEv + 0x11c 0x00007fe4a3161f78 _ZL10java_startP6Thread + 0x108
These processes are related to the C2 byte code compiler and seem to be stuck in a loop while attempting to find a value in memory.
Maybe interesting in in this context is that the jstack
command threw some UnmappedAddressException:
sun.jvm.hotspot.debugger.UnmappedAddressException: ddf4471a42deeae8 at sun.jvm.hotspot.debugger.PageCache.checkPage(PageCache.java:208) at sun.jvm.hotspot.debugger.PageCache.getData(PageCache.java:63) at sun.jvm.hotspot.debugger.DebuggerBase.readBytes(DebuggerBase.java:225) at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readCInteger(LinuxDebuggerLocal.java:498) at sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:462) at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readAddress(LinuxDebuggerLocal.java:433) at sun.jvm.hotspot.debugger.linux.LinuxAddress.getAddressAt(LinuxAddress.java:74) at sun.jvm.hotspot.debugger.linux.amd64.LinuxAMD64CFrame.sender(LinuxAMD64CFrame.java:68) at sun.jvm.hotspot.tools.PStack.run(PStack.java:161) at sun.jvm.hotspot.tools.PStack.run(PStack.java:58) at sun.jvm.hotspot.tools.PStack.run(PStack.java:53) at sun.jvm.hotspot.tools.JStack.run(JStack.java:66) at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260) at sun.jvm.hotspot.tools.Tool.start(Tool.java:223) at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118) at sun.jvm.hotspot.tools.JStack.main(JStack.java:92) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at sun.tools.jstack.JStack.runJStackTool(JStack.java:140) at sun.tools.jstack.JStack.main(JStack.java:106)
From reading https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8208091 these exceptions seem to be unrelated to MemNode::can_see_stored_value
being stuck.
Watching both sub processes with strace
generates no output, which is another strong indication for that these byte code compiler processes are stuck in a loop of the native method named above.
here is another stack trace taken with gdb:
gdb -batch -ex bt -p 9527 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". 0x00007fe4a30c8eaf in MemNode::can_see_stored_value(Node*, PhaseTransform*) const () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #0 0x00007fe4a30c8eaf in MemNode::can_see_stored_value(Node*, PhaseTransform*) const () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #1 0x00007fe4a30c9c74 in LoadNode::Value(PhaseTransform*) const () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #2 0x00007fe4a31ac87a in PhaseIterGVN::transform_old(Node*) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #3 0x00007fe4a31a9fcc in PhaseIterGVN::optimize() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #4 0x00007fe4a2ce51a9 in Compile::Optimize() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #5 0x00007fe4a2ce6fea in Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #6 0x00007fe4a2c34818 in C2Compiler::compile_method(ciEnv*, ciMethod*, int) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #7 0x00007fe4a2cf14da in CompileBroker::invoke_compiler_on_method(CompileTask*) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #8 0x00007fe4a2cf2486 in CompileBroker::compiler_thread_loop() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #9 0x00007fe4a32ae77f in JavaThread::thread_main_inner() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #10 0x00007fe4a32ae8ac in JavaThread::run() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #11 0x00007fe4a3161f78 in java_start(Thread*) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so #12 0x00007fe4a481a064 in start_thread (arg=0x7fe43f921700) at pthread_create.c:309 #13 0x00007fe4a454f62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
The two blocked processes where spawned some time after the main java process. At current (20018-12-05 14:15) htop reports the folowing TIME+
values for these processes:
- 9505 (main): 1291h = 53d 19h ==> ~ Freitag, 12. Oktober 2018 19:00
- 9527 : 509h = 21d 5h => Mittwoch, 14. November 2018 9:00
- 9528 : 494h = 20d 14h => Donnerstag, 15. November 2018 00:00
so it seems as if the server has been started on Okt. 12th but systemctl reports that the server has been started on Nov 9th:
systemctl status cdmserver.service ● cdmserver.service - LSB: Start CDM Server daemon Loaded: loaded (/etc/init.d/cdmserver) Active: active (running) since Fri 2018-11-09 10:47:24 CET; 3 weeks 5 days ago CGroup: /system.slice/cdmserver.service ├─9504 jsvc.exec -home /opt/java-oracle/jdk1.8 -user cdm -jvm server -cp /usr/share/java/commons-daem... └─9505 jsvc.exec -home /opt/java-oracle/jdk1.8 -user cdm -jvm server -cp /usr/share/java/commons-daem...
This matches the information reported by the jvm itself. According to the jvm the server is running since ~627h and thus was started on Nov 9th.
The TIME+ value reported by htop is for sure the sum of all times the 4 cores have been utilized.
==> The incident has happened ~5 days after the restart of the cdm-server.
Files
Updated by Andreas Kohlbecker about 4 years ago
- File jstack-with-native-frames.tdump jstack-with-native-frames.tdump added
- Description updated (diff)
Updated by Andreas Kohlbecker about 4 years ago
- File jstack-verbose-with-native-frames-2.tdump jstack-verbose-with-native-frames-2.tdump added
- Description updated (diff)
Updated by Andreas Kohlbecker about 4 years ago
- Description updated (diff)
- % Done changed from 0 to 10
Updated by Andreas Kohlbecker about 4 years ago
- Description updated (diff)
- % Done changed from 10 to 20
The cdm-server is running as subprocess of jsvc
can this be a reason for the observed problem?
Updated by Andreas Kohlbecker about 4 years ago
I filed a bug report to oracle:
"We will review your report and have assigned it an internal review ID : 9058426. Depending upon the completeness of the report and our ability to reproduce the problem, either a new bug will be posted, or we will contact you for further information."
Updated by Andreas Kohlbecker about 4 years ago
- % Done changed from 20 to 0
[Answer vom Oracle:]
Dear Java Developer,
We have finished evaluating your report and have assigned it a Bug ID: JDK-8214939. The issue is visible on bugs.java.com at the following url JDK-8214939.
To provide more information about this issue, click here.
We work to resolve the issues that are submitted to us according to their impact to the community as a whole, and make no promises as to the time or release in which a bug might be fixed. If this issue has a significant impact on your project you may want to consider using one of the technical support offerings available at Oracle Support.
Regards,
Java Developer Support
Updated by Andreas Kohlbecker about 4 years ago
[Comment on bug report]:
Hi Andreas Kohlbecker,
Thanks for reporting this incident to Oracle.
This is with reference to issue JI-9058426, you have reported against us.
Without reproducer, it will be difficult for us to reproduce this issue at our end.
There is only one change happened in "memnode.cpp" in the method "MemNode::can_see_stored_value(Node* st, PhaseTransform* phase)" comparing with latest JDK code v/s JDK 8
JDK-8076188: Optimize arraycopy out for non escaping destination
Could you please confirm, if you still able to reproduce this issue on JDK11 [1] build?
Thanks,
Fairoz
[1] https://www.oracle.com/technetwork/java/javase/downloads/jdk11-downloads-5066655.html
Updated by Andreas Kohlbecker about 4 years ago
- Target version changed from Release 5.5 to Release 5.6
Updated by Andreas Kohlbecker almost 4 years ago
- Priority changed from Highest to Priority12
- Target version changed from Release 5.6 to Reviewed Next Major Release
this never occurred since 3 months, lowering priority