[Linux-cluster] Slowness above 500 RRDs

Fri May 4 15:43:10 UTC 2007

David Teigland <teigland at redhat.com> writes:

> On Thu, Apr 26, 2007 at 06:36:02PM +0200, Ferenc Wagner wrote:
> 
>> I'm working with three nodes: 1, 2 and 3.  Looks like the mount by 3
>> makes a big difference.  When the filesystem is mounted by 1 and 2
>> only, my test runs much faster.  Filesystem mounted by 3 alone is also
>> fast.  But 3 doesn't seem to cooperate with anyone else with
>> reasonable performance.
>> 
>> If I mount the filesystem on all three nodes and run the test on 1,
>> the network traffic of 2 and 3 is rather unbalanced: tcpdump receives
>> 19566 packets on 2 and 29181 on 3.  It's all 21064/tcp traffic, I can
>> provide detailed data if that seems useful.
>
> It sounds like your tests are mixing the effects of the flocks/plocks with
> the effects of gfs's own internal file locking.  If you want to test and
> compare flock/plock performance you need to make sure that gfs's internal
> dlm locks are always mastered on the same node (either locally, in which
> case it'll be fast, or remotely in which case it'll be slow).  The first
> node to use a lock will be the master of it.

I do the following:
 1. reboot all three nodes
 2. mount GFS on node 1, 2 and 3
 3. run the test on node 1 -> it's slow
 4. umount GFS on node 3
 5. run the test on node 1 -> it's fast

 6. reboot all three nodes
 7. mount GFS on node 1, 2 and 3
 8. run the test on node 1 -> it's slow
 9. umount GFS on node 2
10. run the test on node 1 -> it's slow again

I hope the above ensures that node 1 is always the master of all
locks.  So where could this discrepancy stem from?  I'll check whether
the boot order influences this, but really running out of ideas...

Btw, it there no way for a node to take the lock master role from
another (short of unmount the GFS volume on the original master)?
-- 
Thanks,
Feri.