[linux-lvm] strange behavior with 1.0.5 on Linux 2.4.19?
Heinz J . Mauelshagen
mauelshagen at sistina.com
Fri Oct 4 03:54:40 UTC 2002
Gregory,
running "lvcreate --size 8G --snapshot --name db1_snap vg01" should give
a syntax error rather than "... doesn't exist".
Did you eventually run
"lvcreate --size 8G --snapshot --name db1_snap /dev/vg01/db1"
instead?
I guess the problem has disappeared after your reboot, right?
If so, are you able to repeat the problem?
Regards,
Heinz -- The LVM Guy --
On Wed, Oct 02, 2002 at 10:22:30AM -0700, Gregory K. Ade wrote:
> I'm not sure what I found, or why it's happening, but I managed to
> excersize some or another bug in LVM 1.0.5...
>
> We use home-rolled scripts for doing our system backups, and one of the
> steps creates snapshots of our database filesystems, so that we can dump
> the snapshots to tape and get a consistent backup image. These scripts
> were misconfigured, and attempted to create a snapshot of a volume on a
> volume group that did not exist.
>
> This machine is running Linux 2.4.19, patched with Broadcomm Gigabit
> drivers and LVM 1.0.5 (linux-2.4.19-VFS-lock.patch and
> lvm-1.0.5-2.4.19-1.burpr.patch, generated by running make in
> /usr/src/LVM/1.0.5/PATCHES). I then compiled and installed the LVM
> userland tools from the sources.
>
> This machine has one volume group, vg00, consisting of a single physical
> volume, /dev/sda4, which is itself a partition of ~100GB on a hardware
> RAID-10 array.
>
> --->8--[ Cut Here ]--->8--
> root at burpr(pts/1):~ 34 # ls -al /dev/vg00
> total 47
> dr-xr-xr-x 2 root root 232 Oct 2 02:55 ./
> drwxr-xr-x 15 root root 46926 Oct 2 02:55 ../
> brw-rw---- 1 root disk 58, 5 Oct 2 02:55 dat
> brw-rw---- 1 root disk 58, 6 Oct 2 02:55 db1
> brw-rw---- 1 root disk 58, 7 Oct 2 02:55 db2
> crw-r----- 1 root disk 109, 0 Oct 2 02:55 group
> brw-rw---- 1 root disk 58, 3 Oct 2 02:55 home
> brw-rw---- 1 root disk 58, 0 Oct 2 02:55 root
> brw-rw---- 1 root disk 58, 1 Oct 2 02:55 tmp
> brw-rw---- 1 root disk 58, 4 Oct 2 02:55 u
> brw-rw---- 1 root disk 58, 8 Oct 2 02:55 unifytmp
> brw-rw---- 1 root disk 58, 2 Oct 2 02:55 var
> --->8--[ Cut Here ]--->8--
>
> The command which was errantly run was:
>
> --->8--[ Cut Here ]--->8--
> lvcreate --size 8G --snapshot --name db1_snap vg01
> --->8--[ Cut Here ]--->8--
>
> I got this output:
>
> --->8--[ Cut Here ]--->8--
> lvcreate -- "/etc/lvmtab.d/vg01" doesn't exist
> lvcreate -- can't create logical volume: volume group "vg01" doesn't
> exist
> --->8--[ Cut Here ]--->8--
>
> That's all well and good, and expected. Well, I saw the backup scripts
> trying to do this, so I killed them off as cleanly as possible, fixed
> the configuration, and restarted them. Only now, they got stuck on the
> first vgscan they tried to run.
>
> Running vgdisplay by hand now, I seem to have "lost" 8GB from my vg.
> vgdisplay shows 8GB less free than should be there if you add up the
> allocations to all the existing lv's. lvscan segfaults, and vgscan
> hangs while trying to open /dev/lvm. lvcreate hangs as well. Running
> strace:
>
> --->8--[ Cut Here ]--->8--
> root at burpr(pts/1):~ 51 # strace lvcreate --size 256M --snapshot --name
> unifytmp_snap /dev/vg00/unifytmp vg00
> --->8--[ Cut Here ]--->8--
>
> ends up with a hang, and this is the last few lines of the trace:
>
> --->8--[ Cut Here ]--->8--
> open("/dev/vg00/group", O_RDONLY) = 3
> ioctl(3, 0xc004fe05, 0x80a40b8) = 0
> close(3) = 0
> stat64("/dev/lvm", {st_mode=S_IFCHR|0640, st_rdev=makedev(109, 0), ...})
> = 0
> open("/dev/lvm", O_RDONLY) = 3
> ioctl(3, 0x8004fe98, 0xbfffec22) = 0
> close(3) = 0
> stat64("/dev/lvm", {st_mode=S_IFCHR|0640, st_rdev=makedev(109, 0), ...})
> = 0
> open("/dev/lvm", O_RDONLY) = 3
> ioctl(3, 0xff00 <unfinished ...>
> --->8--[ Cut Here ]--->8--
>
> The <unfinished ...> is when I gave up after 5 minutes and hit
> <control>-c.
>
> I have complete straces available of vgscan, lvscan, and lvcreate, as
> well as the output of lvdisplay for each of the lv's I've got. I also
> have a core file for lvscan, if that would help, too.
>
> We are going to reboot the server over lunch today, hopefully that will
> clear out whatever kernel structures are gorked, but I'm really not
> happy that this happened in the first place, and hope someone here can
> point me to an answer.
>
> The hardware is a Dell PowerEdge 6600 with PERC3/DC RAID controller (LSI
> MegaRAID), 6 15krpm 36GB disks in a RAID-10, 8GB memory, four 1.6GHz
> Xeon CPUs. Running SuSE Linux Enterprise Server 7 (essentially a
> stripped-down SuSE 7.2), kernel.org's 2.4.19 + Broadcom and LVM patches,
> and LVM 1.0.5.
>
> I haven't had any problems yet on another server (PowerEdge 2450, 2x
> P-III 1GHz, 2GB ram, same kernel & lvm, different raid controller).
>
> I've tried to be thourough in my data collection; let me know if there's
> something more needed to debug this.
>
>
> TIA
>
> --
> Gregory K. Ade <gkade at bigbrother.net>
> http://bigbrother.net/~gkade
> OpenPGP Key ID: EAF4844B keyserver: pgpkeys.mit.edu
>
>
*** Software bugs are stupid.
Nevertheless it needs not so stupid people to solve them ***
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Heinz Mauelshagen Sistina Software Inc.
Senior Consultant/Developer Am Sonnenhang 11
56242 Marienrachdorf
Germany
Mauelshagen at Sistina.com +49 2626 141200
FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
More information about the linux-lvm
mailing list