[Linux-cluster] Strange Behavior

Robert Gil Robert.Gil at americanhm.com
Tue May 22 15:48:51 UTC 2007


I am getting some strange behavior on a 4 node cluster. When node dbs2
tries to connect to the cluster, node app3 either kernel panics or ccsd
and rgmanager crash. Node dbs2 says that the heartbeats drop off and it
goes to remove itself from the cluster. I am curious why node app3 would
crash, and what these SM messages are. Also why node dbs2 would connect
to the cluster, become quorate, and then drop off and crash node 1. Has
anyone seen this before?
 
 
/var/log/messages
 
May 22 11:34:36 melqsjssapp03 kernel: CMAN: node
melqsjssdbs02.americanhm.com rejoining
May 22 11:35:11 melqsjssapp03 kernel: CMAN: node
melqsjssdbs02.americanhm.com has been removed from the cluster : Missed
too many heartbeats
May 22 11:35:25 melqsjssapp03 kernel: CMAN: node
melqsjssapp03.americanhm.com has been removed from the cluster : No
response to messages
May 22 11:35:25 melqsjssapp03 kernel: CMAN: killed by NODEDOWN message
May 22 11:35:25 melqsjssapp03 kernel: CMAN: we are leaving the cluster.
No response to messages
May 22 11:35:25 melqsjssapp03 kernel: WARNING: dlm_emergency_shutdown
May 22 11:35:25 melqsjssapp03 kernel: WARNING: dlm_emergency_shutdown
May 22 11:35:25 melqsjssapp03 kernel: SM: 00000011 sm_stop: SG still
joined
May 22 11:35:25 melqsjssapp03 kernel: SM: 01000014 sm_stop: SG still
joined
May 22 11:35:25 melqsjssapp03 kernel: SM: 0200001a sm_stop: SG still
joined
May 22 11:35:25 melqsjssapp03 kernel: SM: 03000002 sm_stop: SG still
joined
May 22 11:35:25 melqsjssapp03 clurgmgrd[5179]: <warning> #67: Shutting
down uncleanly 
May 22 11:35:25 melqsjssapp03 ccsd[4630]: Cluster manager shutdown.
Attemping to reconnect... 
May 22 11:35:51 melqsjssapp03 ccsd[4630]: Unable to connect to cluster
infrastructure after 30 seconds. 
May 22 11:36:21 melqsjssapp03 ccsd[4630]: Unable to connect to cluster
infrastructure after 60 seconds.
 
Thanks,
 
Robert Gil
Linux Systems Administrator
American Home Mortgage
Phone: 631-622-8410
Cell: 631-827-5775
Fax: 516-495-5861
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/linux-cluster/attachments/20070522/48d94a8f/attachment.htm>


More information about the Linux-cluster mailing list