Monday, November 29, 2010

java: Alert on java crashes

The last couple of days I have noticed a number of irregular core dumps created in my system core file dump location /var/core









 -rw-------   1 root    root  2529790203 Nov 10 11:55 core_host1_java_1094_300_1289350401_28578
 -rw-------   1 root    root  2564932547 Nov 15 13:06 core_host1_java_1094_300_1289786684_1664
 -rw-------   1 root    root  2498732827 Nov 17 17:29 core_host1_java_9092_300_1289975232_5664
 -rw-------   1 root    root  2525420387 Nov 19 12:08 core_host3_java_1094_300_1290128885_16234


Depending on how you've setup your core file dump pattern, you can determine which process/application user account its comming from by just reading the file core name. eg

 # coreadm|grep pattern
    global core file pattern: /var/core/core_%n_%f_%u_%g_%t_%p
    init core file pattern: /var/core/core_%n_%f_%u_%g_%t_%p

 %n ; system node name uname -n
 %f ; executable filename
 %u ; uid
 %g ; gid
 %t ; time in seconds since 1970,1,1.
 %p ; PID
 
My core dump process is coming from a java process. Bugs can occur in a Java runtime environment and most administrators would want to get notified.
If you need to take a corrective action and diagnose further, you will need to be alerted at the time of incident.
The Java runtime has a number of useful options that can be used for this purpose. The first option is “-XX:OnOutOfMemoryError”, which allows a command to be run when the runtime environment incurs an out of memory condition. When this option is combined with the logger command line utility:

 java -XX:OnOutOfMemoryError=”logger Java process %p encountered an OOM condition” …

Syslog entries will be generated each time an Out Of memory (OOM) event occurs.

Another useful option is “-XX:OnError”, which allows a command to be run when the runtime environment incurs a fatal error (i.e., a hard crash). When this option is combined with the logger utility:

 java -XX:OnError=”logger -p Java process %p encountered a fatal condition” …

Syslog entries will be generated when a fatal event occur.

The options above allow you to run one or more commands when these errors are encountered, so you could chain together a utility (logger or mail) to generate alerts, and maybe a restarter script to start a new Java process.



No comments:

Post a Comment