[Linux-cluster] VM Resource Failover

Thu Aug 14 14:49:30 UTC 2008

This is my first round of playing with clustering, I don't have a fencing agent.  So I need to run fence_ack_manual after I shut one of the nodes down?  This is where I get REALLY confused and don't understand how this works.   When I reboot one of the nodes it seems to hang on fencing for a few minutes, then I have a 50-50 chance that the node will come back up and rejoin the cluster and everything is fine or it will not rejoin the cluster and then I manually try to rejoin it and usually end up rebooting both machines at the same time.  Also should I add my iSCSI GFS shared space as a resource?  Will this automount at bootup?

Thanks for the help!

---

Chris Edwards
Smartech Corp.
Div. of AirNet Group
http://www.airnetgroup.com
http://www.smartechcorp.net
cedwards at smartechcorp.net
P:  423-664-7678 x114
C:  423-593-6964
F:  423-664-7680

-----Original Message-----
From: linux-cluster-bounces at redhat.com [mailto:linux-cluster-bounces at redhat.com] On Behalf Of Kevin Anderson
Sent: Thursday, August 14, 2008 10:36 AM
To: linux clustering
Subject: Re: [Linux-cluster] VM Resource Failover

Since you are using manual fencing, did you run fence_ack_manual after
killing the machine?  When the machine comes back up, the ack is
implied, but rgmanager will not be able to perform recovery operations
until fencing is complete.  Suggest you utilize a real fencing agent if
you want this to work seamlessly.

Kevin

On Thu, 2008-08-14 at 10:25 -0400, Chris Edwards wrote:
> I have been trying to simulate a xen VM failover,  I have a 2 machine
> cluster and 2 vm’s running.  If I issue a “ xm destroy ID”  the vm
> will automatically reboot to the other node.  But if I reboot one of
> the clusters to simulate a machine failure the vm never boots back up
> until the other machine comes online.  So here are my questions…
> 
>  
> 
> 1.      How do I get the cluster to boot the vm that has failed when
> one of the clustered machines are down?
> 
> 2.      When I do a “xm destroy ID” the cluster always reboots the vm
> onto the other cluster machine, is there any way for me to have it
> boot back to the machine its supposed to be running on without having
> to do a manual migrate?   Can It auto-migrate back to its original
> machine over time?
> 
>  
> 
>  
> 
> Here is the out put of my clustat during a reboot of one of the
> clusters…
> 
>  
> 
> Cluster Status for Xen @ Thu Aug 14 10:11:21 2008
> 
> Member Status: Quorate
> 
>  Member Name                             ID   Status
> 
>  ------ ----                             ---- ------
> 
>  xen1.smartechcorp.net                       1 Online, Local,
> rgmanager
> 
>  xen2.smartechcorp.net                       2 Offline
> 
>  Service Name                   Owner (Last)
> State         
> 
>  ------- ----                   ----- ------
> -----         
> 
>  vm:Linux1                      xen2.smartechcorp.net
> stopping      
> 
>  vm:Windows1                    xen1.smartechcorp.net
> started   
> 
>  
> 
> Here is my cluster.conf….
> 
>  
> 
> <?xml version="1.0"?>
> 
> <cluster alias="Xen" config_version="29" name="Xen">
> 
>         <fence_daemon clean_start="0" post_fail_delay="0"
> post_join_delay="-1"/>
> 
>         <clusternodes>
> 
>                 <clusternode name="xen1.smartechcorp.net" nodeid="1"
> votes="1">
> 
>                         <fence>
> 
>                                 <method name="1">
> 
>                                         <device name="manual"
> nodename="xen1.smartechcorp.net"/>
> 
>                                 </method>
> 
>                         </fence>
> 
>                 </clusternode>
> 
>                 <clusternode name="xen2.smartechcorp.net" nodeid="2"
> votes="1">
> 
>                         <fence>
> 
>                                 <method name="1">
> 
>                                         <device name="manual"
> nodename="xen2.smartechcorp.net"/>
> 
>                                 </method>
> 
>                         </fence>
> 
>                 </clusternode>
> 
>         </clusternodes>
> 
>         <cman expected_votes="1" two_node="1"/>
> 
>         <fencedevices>
> 
>                 <fencedevice agent="fence_manual" name="manual"/>
> 
>         </fencedevices>
> 
>         <rm>
> 
>                 <failoverdomains>
> 
>                         <failoverdomain name="bias-xen1"
> nofailback="0" ordered="1" restricted="0">
> 
>                                 <failoverdomainnode
> name="xen1.smartechcorp.net" priority="1"/>
> 
>                                 <failoverdomainnode
> name="xen2.smartechcorp.net" priority="2"/>
> 
>                         </failoverdomain>
> 
>                         <failoverdomain name="bias-xen2"
> nofailback="0" ordered="1" restricted="0">
> 
>                                 <failoverdomainnode
> name="xen1.smartechcorp.net" priority="2"/>
> 
>                                 <failoverdomainnode
> name="xen2.smartechcorp.net" priority="1"/>
> 
>                         </failoverdomain>
> 
>                 </failoverdomains>
> 
>                 <resources/>
> 
>                 <vm autostart="1" domain="bias-xen1" exclusive="0"
> migrate="live" name="Windows1" path="/var/lib/xen/images"
> recovery="relocate"/>
> 
>                 <vm autostart="1" domain="bias-xen2" exclusive="0"
> migrate="live" name="Linux1" path="/var/lib/xen/images"
> recovery="relocate"/>
> 
>         </rm>
> 
> </cluster>
> 
> Thanks for any help, this is driving me crazy!
> 
>  
> 
> ---
> 
>  
> 
> Chris Edwards
> Smartech Corp.
> Div. of AirNet Group
> 
> http://www.airnetgroup.com
> 
> http://www.smartechcorp.net
> 
> cedwards at smartechcorp.net
> P:  423-664-7678 x114
> 
> C:  423-593-6964
> 
> F:  423-664-7680
> 
> 
> --
> Linux-cluster mailing list
> Linux-cluster at redhat.com
> https://www.redhat.com/mailman/listinfo/linux-cluster

--
Linux-cluster mailing list
Linux-cluster at redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster