Project

General

Profile

bug #7938

cdmserver sub-processes (threads) with constant high cpu

Added by Andreas Kohlbecker 11 months ago. Updated 9 months ago.

Status:
New
Priority:
Priority12
Category:
cdmserver
Start date:
12/05/2018
Due date:
% Done:

20%

Severity:
critical
Found in Version:
Tags:

Description

This performance problem was observed at the edit production server but might also affect other installations.

Users where complaining about the server being very slow since a couple of days.

The reason for the server being "slow" is that two sub processes are consuming 100% CPU time of one cpu core each.
This time excessive garbage collection could be excluded as cause.


The cpu load is produced by the to subprocesses with the PIDs 9527 and 9528 (main process id is 9505)

A stake trace with native frames (/opt/java-oracle/jdk1.8.0_91/bin/jstack -m -F 9505) included give first insight into what these processes are doing (jstack-with-native-frames.tdump):

----------------- 9527 -----------------
0x00007fe4a30c8f10  _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x2c0
0x00007fe4a30c9c74  _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4
0x00007fe4a31ac87a  _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea
0x00007fe4a31a9fcc  _ZN12PhaseIterGVN8optimizeEv + 0x1bc
0x00007fe4a2ce51a9  _ZN7Compile8OptimizeEv + 0x159
0x00007fe4a2ce6fea  _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca
0x00007fe4a2c34818  _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198
0x00007fe4a2cf14da  _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a
0x00007fe4a2cf2486  _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6
0x00007fe4a32ae77f  _ZN10JavaThread17thread_main_innerEv + 0xdf
0x00007fe4a32ae8ac  _ZN10JavaThread3runEv + 0x11c
0x00007fe4a3161f78  _ZL10java_startP6Thread + 0x108
----------------- 9528 -----------------
0x00007fe4a30c8ed6  _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x286
0x00007fe4a30c9c74  _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4
0x00007fe4a31ac87a  _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea
0x00007fe4a31a9fcc  _ZN12PhaseIterGVN8optimizeEv + 0x1bc
0x00007fe4a2ce51a9  _ZN7Compile8OptimizeEv + 0x159
0x00007fe4a2ce6fea  _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca
0x00007fe4a2c34818  _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198
0x00007fe4a2cf14da  _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a
0x00007fe4a2cf2486  _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6
0x00007fe4a32ae77f  _ZN10JavaThread17thread_main_innerEv + 0xdf
0x00007fe4a32ae8ac  _ZN10JavaThread3runEv + 0x11c
0x00007fe4a3161f78  _ZL10java_startP6Thread + 0x108

about 30 minutes later (jstack-verbose-with-native-frames-2.tdump) these native threads are still at the same point in the stack and seem to be stock in the 'MemNode::can_see_stored_value'` c++ function:

----------------- 9527 -----------------
0x00007fe4a30c8f10  _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x2c0
0x00007fe4a30c9c74  _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4
0x00007fe4a31ac87a  _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea
0x00007fe4a31a9fcc  _ZN12PhaseIterGVN8optimizeEv + 0x1bc
0x00007fe4a2ce51a9  _ZN7Compile8OptimizeEv + 0x159
0x00007fe4a2ce6fea  _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca
0x00007fe4a2c34818  _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198
0x00007fe4a2cf14da  _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a
0x00007fe4a2cf2486  _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6
0x00007fe4a32ae77f  _ZN10JavaThread17thread_main_innerEv + 0xdf
0x00007fe4a32ae8ac  _ZN10JavaThread3runEv + 0x11c
0x00007fe4a3161f78  _ZL10java_startP6Thread + 0x108
----------------- 9528 -----------------
0x00007fe4a30c8eaa  _ZNK7MemNode20can_see_stored_valueEP4NodeP14PhaseTransform + 0x25a
0x00007fe4a30c9c74  _ZNK8LoadNode5ValueEP14PhaseTransform + 0xf4
0x00007fe4a31ac87a  _ZN12PhaseIterGVN13transform_oldEP4Node + 0xea
0x00007fe4a31a9fcc  _ZN12PhaseIterGVN8optimizeEv + 0x1bc
0x00007fe4a2ce51a9  _ZN7Compile8OptimizeEv + 0x159
0x00007fe4a2ce6fea  _ZN7CompileC1EP5ciEnvP10C2CompilerP8ciMethodibbb + 0x13ca
0x00007fe4a2c34818  _ZN10C2Compiler14compile_methodEP5ciEnvP8ciMethodi + 0x198
0x00007fe4a2cf14da  _ZN13CompileBroker25invoke_compiler_on_methodEP11CompileTask + 0xc9a
0x00007fe4a2cf2486  _ZN13CompileBroker20compiler_thread_loopEv + 0x5d6
0x00007fe4a32ae77f  _ZN10JavaThread17thread_main_innerEv + 0xdf
0x00007fe4a32ae8ac  _ZN10JavaThread3runEv + 0x11c
0x00007fe4a3161f78  _ZL10java_startP6Thread + 0x108

These processes are related to the C2 byte code compiler and seem to be stuck in a loop while attempting to find a value in memory.


Maybe interesting in in this context is that the jstack command threw some UnmappedAddressException:

sun.jvm.hotspot.debugger.UnmappedAddressException: ddf4471a42deeae8
        at sun.jvm.hotspot.debugger.PageCache.checkPage(PageCache.java:208)
        at sun.jvm.hotspot.debugger.PageCache.getData(PageCache.java:63)
        at sun.jvm.hotspot.debugger.DebuggerBase.readBytes(DebuggerBase.java:225)
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readCInteger(LinuxDebuggerLocal.java:498)
        at sun.jvm.hotspot.debugger.DebuggerBase.readAddressValue(DebuggerBase.java:462)
        at sun.jvm.hotspot.debugger.linux.LinuxDebuggerLocal.readAddress(LinuxDebuggerLocal.java:433)
        at sun.jvm.hotspot.debugger.linux.LinuxAddress.getAddressAt(LinuxAddress.java:74)
        at sun.jvm.hotspot.debugger.linux.amd64.LinuxAMD64CFrame.sender(LinuxAMD64CFrame.java:68)
        at sun.jvm.hotspot.tools.PStack.run(PStack.java:161)
        at sun.jvm.hotspot.tools.PStack.run(PStack.java:58)
        at sun.jvm.hotspot.tools.PStack.run(PStack.java:53)
        at sun.jvm.hotspot.tools.JStack.run(JStack.java:66)
        at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
        at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
        at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
        at sun.jvm.hotspot.tools.JStack.main(JStack.java:92)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at sun.tools.jstack.JStack.runJStackTool(JStack.java:140)
        at sun.tools.jstack.JStack.main(JStack.java:106)

From reading https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8208091 these exceptions seem to be unrelated to MemNode::can_see_stored_value being stuck.

Watching both sub processes with strace generates no output, which is another strong indication for that these byte code compiler processes are stuck in a loop of the native method named above.

here is another stack trace taken with gdb:

gdb -batch -ex bt -p 9527

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007fe4a30c8eaf in MemNode::can_see_stored_value(Node*, PhaseTransform*) const () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so
#0  0x00007fe4a30c8eaf in MemNode::can_see_stored_value(Node*, PhaseTransform*) const () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so
#1  0x00007fe4a30c9c74 in LoadNode::Value(PhaseTransform*) const () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so
#2  0x00007fe4a31ac87a in PhaseIterGVN::transform_old(Node*) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so
#3  0x00007fe4a31a9fcc in PhaseIterGVN::optimize() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so
#4  0x00007fe4a2ce51a9 in Compile::Optimize() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so
#5  0x00007fe4a2ce6fea in Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so
#6  0x00007fe4a2c34818 in C2Compiler::compile_method(ciEnv*, ciMethod*, int) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so
#7  0x00007fe4a2cf14da in CompileBroker::invoke_compiler_on_method(CompileTask*) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so
#8  0x00007fe4a2cf2486 in CompileBroker::compiler_thread_loop() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so
#9  0x00007fe4a32ae77f in JavaThread::thread_main_inner() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so
#10 0x00007fe4a32ae8ac in JavaThread::run() () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so
#11 0x00007fe4a3161f78 in java_start(Thread*) () from /opt/java-oracle/jdk1.8/jre/lib/amd64/server/libjvm.so
#12 0x00007fe4a481a064 in start_thread (arg=0x7fe43f921700) at pthread_create.c:309
#13 0x00007fe4a454f62d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

The two blocked processes where spawned some time after the main java process. At current (20018-12-05 14:15) htop reports the folowing TIME+ values for these processes:

  • 9505 (main): 1291h = 53d 19h ==> ~ Freitag, 12. Oktober 2018 19:00
  • 9527 : 509h = 21d 5h => Mittwoch, 14. November 2018 9:00
  • 9528 : 494h = 20d 14h => Donnerstag, 15. November 2018 00:00

so it seems as if the server has been started on Okt. 12th but systemctl reports that the server has been started on Nov 9th:

systemctl status cdmserver.service 
● cdmserver.service - LSB: Start CDM Server daemon
   Loaded: loaded (/etc/init.d/cdmserver)
   Active: active (running) since Fri 2018-11-09 10:47:24 CET; 3 weeks 5 days ago
   CGroup: /system.slice/cdmserver.service
           ├─9504 jsvc.exec -home /opt/java-oracle/jdk1.8 -user cdm -jvm server -cp /usr/share/java/commons-daem...
           └─9505 jsvc.exec -home /opt/java-oracle/jdk1.8 -user cdm -jvm server -cp /usr/share/java/commons-daem...

This matches the information reported by the jvm itself. According to the jvm the server is running since ~627h and thus was started on Nov 9th.
The TIME+ value reported by htop is for sure the sum of all times the 4 cores have been utilized.

==> The incident has happened ~5 days after the restart of the cdm-server.

Screenshot_20181205_112519.png View (50.5 KB) Andreas Kohlbecker, 12/05/2018 11:57 AM

Screenshot_20181205_112425.png View (150 KB) Andreas Kohlbecker, 12/05/2018 11:57 AM

jstack-with-native-frames.tdump (326 KB) Andreas Kohlbecker, 12/05/2018 11:59 AM

jstack-verbose-with-native-frames-2.tdump (327 KB) Andreas Kohlbecker, 12/05/2018 12:22 PM

History

#1 Updated by Andreas Kohlbecker 11 months ago

#2 Updated by Andreas Kohlbecker 11 months ago

  • Description updated (diff)

#3 Updated by Andreas Kohlbecker 11 months ago

#4 Updated by Andreas Kohlbecker 11 months ago

  • Description updated (diff)

#5 Updated by Andreas Kohlbecker 11 months ago

  • Description updated (diff)
  • % Done changed from 0 to 10

#6 Updated by Andreas Kohlbecker 11 months ago

  • Description updated (diff)

#7 Updated by Andreas Kohlbecker 11 months ago

  • Description updated (diff)
  • % Done changed from 10 to 20

The cdm-server is running as subprocess of jsvc can this be a reason for the observed problem?

#8 Updated by Andreas Kohlbecker 11 months ago

  • Description updated (diff)

#9 Updated by Andreas Kohlbecker 11 months ago

  • Description updated (diff)

#10 Updated by Andreas Kohlbecker 11 months ago

  • Description updated (diff)

#11 Updated by Andreas Kohlbecker 11 months ago

  • Description updated (diff)

#12 Updated by Andreas Kohlbecker 11 months ago

I filed a bug report to oracle:

"We will review your report and have assigned it an internal review ID : 9058426. Depending upon the completeness of the report and our ability to reproduce the problem, either a new bug will be posted, or we will contact you for further information."

#13 Updated by Andreas Kohlbecker 11 months ago

  • % Done changed from 20 to 0

[Answer vom Oracle:]

Dear Java Developer,

We have finished evaluating your report and have assigned it a Bug ID: JDK-8214939. The issue is visible on bugs.java.com at the following url JDK-8214939.
To provide more information about this issue, click here.
We work to resolve the issues that are submitted to us according to their impact to the community as a whole, and make no promises as to the time or release in which a bug might be fixed. If this issue has a significant impact on your project you may want to consider using one of the technical support offerings available at Oracle Support.

Regards,
Java Developer Support

#14 Updated by Andreas Kohlbecker 11 months ago

[Comment on bug report]:

Hi Andreas Kohlbecker,

Thanks for reporting this incident to Oracle.

This is with reference to issue JI-9058426, you have reported against us.

Without reproducer, it will be difficult for us to reproduce this issue at our end.

There is only one change happened in "memnode.cpp" in the method "MemNode::can_see_stored_value(Node* st, PhaseTransform* phase)" comparing with latest JDK code v/s JDK 8

JDK-8076188: Optimize arraycopy out for non escaping destination

Could you please confirm, if you still able to reproduce this issue on JDK11 [1] build?

Thanks,
Fairoz

[1] https://www.oracle.com/technetwork/java/javase/downloads/jdk11-downloads-5066655.html

#15 Updated by Andreas Kohlbecker 11 months ago

  • % Done changed from 0 to 20

#16 Updated by Andreas Kohlbecker 11 months ago

  • Description updated (diff)

#17 Updated by Andreas Kohlbecker 10 months ago

  • Target version changed from Release 5.5 to Release 5.6

#18 Updated by Andreas Kohlbecker 9 months ago

  • Priority changed from Highest to Priority12
  • Target version changed from Release 5.6 to Reviewed Next Major Release

this never occurred since 3 months, lowering priority

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 40 MB)