[linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?

Fri Oct 4 03:54:40 UTC 2002

Gregory,

running "lvcreate --size 8G --snapshot --name db1_snap vg01" should give
a syntax error rather than "... doesn't exist".

Did you eventually run
"lvcreate --size 8G --snapshot --name db1_snap /dev/vg01/db1"
instead?

I guess the problem has disappeared after your reboot, right?
If so, are you able to repeat the problem?

Regards,
Heinz    -- The LVM Guy --

On Wed, Oct 02, 2002 at 10:22:30AM -0700, Gregory K. Ade wrote:
> I'm not sure what I found, or why it's happening, but I managed to
> excersize some or another bug in LVM 1.0.5...
> 
> We use home-rolled scripts for doing our system backups, and one of the
> steps creates snapshots of our database filesystems, so that we can dump
> the snapshots to tape and get a consistent backup image.  These scripts
> were misconfigured, and attempted to create a snapshot of a volume on a
> volume group that did not exist.
> 
> This machine is running Linux 2.4.19, patched with Broadcomm Gigabit
> drivers and LVM 1.0.5 (linux-2.4.19-VFS-lock.patch and
> lvm-1.0.5-2.4.19-1.burpr.patch, generated by running make in
> /usr/src/LVM/1.0.5/PATCHES).  I then compiled and installed the LVM
> userland tools from the sources.
> 
> This machine has one volume group, vg00, consisting of a single physical
> volume, /dev/sda4, which is itself a partition of ~100GB on a hardware
> RAID-10 array.
> 
> --->8--[ Cut Here ]--->8--
> root at burpr(pts/1):~ 34 # ls -al /dev/vg00
> total 47
> dr-xr-xr-x    2 root     root          232 Oct  2 02:55 ./
> drwxr-xr-x   15 root     root        46926 Oct  2 02:55 ../
> brw-rw----    1 root     disk      58,   5 Oct  2 02:55 dat
> brw-rw----    1 root     disk      58,   6 Oct  2 02:55 db1
> brw-rw----    1 root     disk      58,   7 Oct  2 02:55 db2
> crw-r-----    1 root     disk     109,   0 Oct  2 02:55 group
> brw-rw----    1 root     disk      58,   3 Oct  2 02:55 home
> brw-rw----    1 root     disk      58,   0 Oct  2 02:55 root
> brw-rw----    1 root     disk      58,   1 Oct  2 02:55 tmp
> brw-rw----    1 root     disk      58,   4 Oct  2 02:55 u
> brw-rw----    1 root     disk      58,   8 Oct  2 02:55 unifytmp
> brw-rw----    1 root     disk      58,   2 Oct  2 02:55 var
> --->8--[ Cut Here ]--->8--
> 
> The command which was errantly run was:
> 
> --->8--[ Cut Here ]--->8--
> lvcreate --size 8G --snapshot --name db1_snap vg01
> --->8--[ Cut Here ]--->8--
> 
> I got this output:
> 
> --->8--[ Cut Here ]--->8--
> lvcreate -- "/etc/lvmtab.d/vg01" doesn't exist
> lvcreate -- can't create logical volume: volume group "vg01" doesn't
> exist
> --->8--[ Cut Here ]--->8--
> 
> That's all well and good, and expected.  Well, I saw the backup scripts
> trying to do this, so I killed them off as cleanly as possible, fixed
> the configuration, and restarted them.  Only now, they got stuck on the
> first vgscan they tried to run.
> 
> Running vgdisplay by hand now, I seem to have "lost" 8GB from my vg. 
> vgdisplay shows 8GB less free than should be there if you add up the
> allocations to all the existing lv's.  lvscan segfaults, and vgscan
> hangs while trying to open /dev/lvm.  lvcreate hangs as well.  Running
> strace:
> 
> --->8--[ Cut Here ]--->8--
> root at burpr(pts/1):~ 51 # strace lvcreate --size 256M --snapshot --name
> unifytmp_snap /dev/vg00/unifytmp vg00
> --->8--[ Cut Here ]--->8--
> 
> ends up with a hang, and this is the last few lines of the trace:
> 
> --->8--[ Cut Here ]--->8--
> open("/dev/vg00/group", O_RDONLY)       = 3
> ioctl(3, 0xc004fe05, 0x80a40b8)         = 0
> close(3)                                = 0
> stat64("/dev/lvm", {st_mode=S_IFCHR|0640, st_rdev=makedev(109, 0), ...})
> = 0
> open("/dev/lvm", O_RDONLY)              = 3
> ioctl(3, 0x8004fe98, 0xbfffec22)        = 0
> close(3)                                = 0
> stat64("/dev/lvm", {st_mode=S_IFCHR|0640, st_rdev=makedev(109, 0), ...})
> = 0
> open("/dev/lvm", O_RDONLY)              = 3
> ioctl(3, 0xff00 <unfinished ...>
> --->8--[ Cut Here ]--->8--
> 
> The <unfinished ...> is when I gave up after 5 minutes and hit
> <control>-c.
> 
> I have complete straces available of vgscan, lvscan, and lvcreate, as
> well as the output of lvdisplay for each of the lv's I've got.  I also
> have a core file for lvscan, if that would help, too.
> 
> We are going to reboot the server over lunch today, hopefully that will
> clear out whatever kernel structures are gorked, but I'm really not
> happy that this happened in the first place, and hope someone here can
> point me to an answer.
> 
> The hardware is a Dell PowerEdge 6600 with PERC3/DC RAID controller (LSI
> MegaRAID), 6 15krpm 36GB disks in a RAID-10, 8GB memory, four 1.6GHz
> Xeon CPUs.  Running SuSE Linux Enterprise Server 7 (essentially a
> stripped-down SuSE 7.2), kernel.org's 2.4.19 + Broadcom and LVM patches,
> and LVM 1.0.5.
> 
> I haven't had any problems yet on another server (PowerEdge 2450, 2x
> P-III 1GHz, 2GB ram, same kernel & lvm, different raid controller).
> 
> I've tried to be thourough in my data collection; let me know if there's
> something more needed to debug this.
> 
> 
> TIA
> 
> --
> Gregory K. Ade <gkade at bigbrother.net>
> http://bigbrother.net/~gkade
> OpenPGP Key ID: EAF4844B  keyserver: pgpkeys.mit.edu
> 
> 

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Sistina Software Inc.
Senior Consultant/Developer                       Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen at Sistina.com                           +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-