One of the most asked about topics to folks working on upstream container technologies is running Podman within a container. Most of this has historically been related to Docker in Docker (DIND), but now, people also want to run Podman in Podman (PINP) or Podman in Docker (PIND).
But Podman can be run in multiple ways, rootful and rootless. We end up with people wanting to run various combinations of rootful and rootless Podman:
- Rootful Podman in rootful Podman
- Rootless Podman in rootful Podman
- Rootful Podman in rootless Podman
- Rootless Podman in rootless Podman
You get the picture.
This blog will attempt to cover each combination, starting with a discussion of privileges. We'll start with the PINP scenario here in part one. In part two of the series, we'll cover similar ground but do so within the context of Kubernetes. Be sure to read both articles for a complete picture.
Container engines require privileges
In order to run a container engine like Podman within a container, the first thing you need to understand is that you need a fair amount of privilege.
- Containers require multiple UIDs. Most container images need more than one UID to work. For example, you might have an image with most of the files owned by root, but some owned by the apache user (UID=60).
- Container engines mount file systems and use the system call clone to create user namespaces.
Note: You might need a newer version of Podman. Examples in this blog were run with Podman 3.2.
Our test image
For the examples in this blog, we'll use the quay.io/podman/stable
image, which was built with the idea of finding the best way to run Podman within a container. You can examine how we build this image from the Dockerfile and containers.conf
image in the github.com repo.
# stable/Dockerfile
#
# Build a Podman container image from the latest
# stable version of Podman on the Fedoras Updates System.
# https://bodhi.fedoraproject.org/updates/?search=podman
# This image can be used to create a secured container
# that runs safely with privileges within the container.
#
FROM registry.fedoraproject.org/fedora:latest
# Don't include container-selinux and remove
# directories used by yum that are just taking
# up space.
RUN dnf -y update; yum -y reinstall shadow-utils; \
yum -y install podman fuse-overlayfs --exclude container-selinux; \
rm -rf /var/cache /var/log/dnf* /var/log/yum.*
RUN useradd podman; \
echo podman:10000:5000 > /etc/subuid; \
echo podman:10000:5000 > /etc/subgid;
VOLUME /var/lib/containers
VOLUME /home/podman/.local/share/containers
ADD https://raw.githubusercontent.com/containers/libpod/master/contrib/podmanimage/stable/containers.conf /etc/containers/containers.conf
ADD https://raw.githubusercontent.com/containers/libpod/master/contrib/podmanimage/stable/podman-containers.conf /home/podman/.config/containers/containers.conf
RUN chown podman:podman -R /home/podman
# chmod containers.conf and adjust storage.conf to enable Fuse storage.
RUN chmod 644 /etc/containers/containers.conf; sed -i -e 's|^#mount_program|mount_program|g' -e '/additionalimage.*/a "/var/lib/shared",' -e 's|^mountopt[[:space:]]*=.*$|mountopt = "nodev,fsync=0"|g' /etc/containers/storage.conf
RUN mkdir -p /var/lib/shared/overlay-images /var/lib/shared/overlay-layers /var/lib/shared/vfs-images /var/lib/shared/vfs-layers; touch /var/lib/shared/overlay-images/images.lock; touch /var/lib/shared/overlay-layers/layers.lock; touch /var/lib/shared/vfs-images/images.lock; touch /var/lib/shared/vfs-layers/layers.lock
ENV _CONTAINERS_USERNS_CONFIGURED=""
Let’s examine the Dockerfile.
FROM registry.fedoraproject.org/fedora:latest
# Don't include container-selinux and remove
# directories used by yum that are just taking
# up space.
RUN dnf -y update; yum -y reinstall shadow-utils; \
yum -y install podman fuse-overlayfs --exclude container-selinux; \
rm -rf /var/cache /var/log/dnf* /var/log/yum.*
First pull fedora latest, and then update to the latest packages. Note it reinstalls shadow-utils
, since there is a known issue in the shadow-utils
install on the Fedora image where the filecaps
on newsubuid
and newsubgid
are not set. Reinstalling shadow-utils
fixes the problem. Next, install Podman as well as the fuse-overlayfs
. We don’t install container-selinux
because it is not needed within the container.
RUN useradd podman; \
echo podman:10000:5000 > /etc/subuid; \
echo podman:10000:5000 > /etc/subgid;
Next I create a user podman
and set up the /etc/subuid
and /etc/subgid
files to use 5000 UIDs. This is used to set up User Namespace within the container. 5000 is an arbitrary number and potentially too small. We picked this number because it is smaller than the 65k allocated to rootless users. If you were only running the container as root, 65k would have been a better number.
VOLUME /var/lib/containers
VOLUME /home/podman/.local/share/containers
Since we can run rootfull and rootless containers with this image we create two volumes. Rootfull Podman uses /var/lib/containers
for it’s container storage and rootless uses /home/podman/.local/share/containers
. Overlay over overlay is often denied by the kernel, so this creates non overlay volumes to be used within the container.
ADD https://raw.githubusercontent.com/containers/libpod/master/contrib/podmanimage/stable/containers.conf /etc/containers/containers.conf
ADD https://raw.githubusercontent.com/containers/libpod/master/contrib/podmanimage/stable/podman-containers.conf /home/podman/.config/containers/containers.conf
I have pre-configured two containers.conf
files to make sure containers run easier in each mode.
The image is set up to run with fuse-overlayfs by default. In certain cases, you could run the kernel's overlay file system for rootful mode, and you'll soon be able to do this in rootless mode. However, for now, we use fuse-overlayfs as our container storage within the container. Other people have used VFS storage driver, but this is not that efficient.
The --privileged flag
The easiest way to run Podman inside of a container is to use the --privileged
flag.
Rootful Podman in rootful Podman with --privileged
# podman run --privileged quay.io/podman/stable podman run ubi8 echo hello
Resolved "ubi8-minimal" as an alias (/etc/containers/registries.conf.d/shortnames.conf)
Trying to pull registry.access.redhat.com/ubi8:latest...
Getting image source signatures
Copying blob sha256:a591faa84ab05242a17131e396a336da172b0e1ec66d921c9f130b7c4c24586d
Copying blob sha256:76b9354adec626b01ffb0faae4a217cebd616661fd90c4b54ba4415f53392fb8
Copying config sha256:dc080723f596f2407300cca2c19a17accad89edcf39f7b8b33e6472dd41e30f1
Writing manifest to image destination
Storing signatures
hello
To save time, since I will be doing a lot of experiments, I created a directory on my host ./mycontainers
, which I will volume mount into the container to be used and not have to pull the image each time.
# podman run --privileged -v ./mycontainers:/var/lib/containers quay.io/podman/stable podman run ubi8 echo hello
hello
Rootless Podman in rootful Podman with --privileged
The quay.io/podman/stable
image is set up with a podman user that you can use to run rootless containers.
# podman run --user podman --privileged quay.io/podman/stable podman run ubi8 echo hello
Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/shortnames.conf)
...
hello
Note in this case, the Podman running inside the container is running as the user podman. This is because the containerized Podman uses the user namespace to create a confined container within the privileged container.
Running rootless Podman in Docker with --privileged
Similar to rootful Podman, you can also run rootless Podman within Docker with the --privileged
option.
# docker run --privileged quay.io/podman/stable podman run ubi8 echo hello
Rootless Podman with Docker
# docker run --user podman --privileged quay.io/podman/stable podman run ubi8 echo hello
Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/shortnames.conf)
...
hello
Can we do this more securely?
Notice that even though we ran the outer containers --privileged
above, the inner containers are running in locked-down mode. The rootless Podman running within the container is really locked down and would have a very difficult time escaping. Given that, I am not a fan of using the --privileged
flag. I believe we can do better from a security perspective.
Running without the --privileged flag
Let's look at how we can remove the --privileged
flag for better security.
Rootful Podman in rootful Podman without --privileged
# podman run --cap-add=sys_admin,mknod --device=/dev/fuse --security-opt label=disable quay.io/podman/stable podman run ubi8-minimal echo hello
hello
We can eliminate the --privileged
flag from rootful Podman but still have to disable some security features to make rootful Podman within the container work.
- Capabilities:
--cap-add=sys_admin,mknod
We need to add two Linux capabilities.- CAP_SYS_ADMIN is required for the Podman running as root inside of the container to mount the required file systems.
- CAP_MKNOD is required for Podman running as root inside of the container to create the devices in
/dev
. (Note that Docker allows this by default).
- Devices: The
--device /dev/fuse
flag must use fuse-overlayfs inside the container. This option tells Podman on the host to add/dev/fuse
to the container so that containerized Podman can use it. - Disable SELinux: The
--security-opt label=disable
option tells the host's Podman to disable SElinux separation for the container. SELinux does not allow containerized processes to mount all of the file systems required to run inside a container.
Rootful Podman in Docker without --privileged
# docker run --cap-add=sys_admin --cap-add mknod --device=/dev/fuse --security-opt seccomp=unconfined --security-opt label=disable quay.io/podman/stable podman run ubi8-minimal echo hello
hello
- Note Docker does not support the comma separate
--cap-add
command, so I had to add sys_admin and mknod separately - Still needed
--device /dev/fuse
, since container defaults to/dev/fuse
- Docker always creates builtin volumes as owned by root:root, so we need to create a volume to mount for Podman in the container to be able to use for storage.
- As always, I need to disable SELinux separation
- Also need to disable
seccomp
, since Docker has a slightly stricterseccomp
policy than Podman. You could just use a Podman security policy by using--seccomp=/usr/share/containers/seccomp.json
# docker run --cap-add=sys_admin --cap-add mknod --device=/dev/fuse --security-opt seccomp=/usr/share/containers/seccomp.json --security-opt label=disable quay.io/podman/stable podman run ubi8-minimal echo hello
hello
Rootless Podman in rootful Podman without --privileged
Run non-privileged container with Podman inside using a non-root user using the user namespace.
# podman run --user podman --security-opt label=disable --security-opt unmask=ALL --device /dev/fuse -ti quay.io/podman/stable podman run -ti docker.io/busybox echo hello
hello
- Note that unlike the rooful within rootful case before, we don't have to add the dangerous security capabilities sys_admin and mknod
- In this case, I am running with
--user podman
, which automatically causes the Podman within the container to run within the user namespace - Still disabling SELinux since it blocks the mounting
- Still need
--device /dev/fuse
to use fuse-overlayfs within the container
Podman-remote in rootful Podman with a leaked Podman socket from the host
# podman run -v /run:/run --security-opt label=disable quay.io/podman/stable podman --remote run busybox echo hi
hi
In this case, we are leaking the /run
directory from the host into the container. This allows podman --remote
to communicate with the Podman socket on the host and start the container on the host OS. This is often how people execute Docker In Docker, especially Docker builds. You could also execute Podman builds this way and take advantage of images previously pulled to the system.
Note, however, this is extremely insecure. The processes within the container can totally take over the host machine.
- You still need to disable SELinux separation because SELinux would block the container processes from using sockets leaked in
/run
. - The
podman --remote
flag is added to tell Podman to work in remote mode. Note you could also just install thepodman-remote
executable into a container and use this.
[ Getting started with containers? Check out this free course. Deploying containerized applications: A technical overview. ]
Podman-remote in Docker with a leaked Podman socket from the host
# docker run -v /run:/run --security-opt label=disable quay.io/podman/stable podman --remote run busybox echo hi
hi
The same example works for a Docker container.
This example shows a fully locked down container—other than SELinux being disabled—with the Podman socket leaked into the container. SELinux would block this access, as it should.
# /bin/podman run --security-opt=label=disable -v /run/podman:/run/podman quay.io/podman/stable podman --remote run alpine echo hi
hi
Rootless Podman with containerized rootful Podman
$ podman run --privileged quay.io/podman/stable podman run ubi8 echo hello
Resolved "ubi8" as an alias (/etc/containers/registries.conf.d/shortnames.conf)
..
hello
Rootless Podman running rootless Podman
$ podman run --security-opt label=disable --user podman --device /dev/fuse quay.io/podman/stable podman run alpine echo hello
Final thoughts
Now you have some context for Podman in Podman options, using both rootful and rootless modes. in various combinations. You also have a better sense of the necessary privileges and the considerations surrounding the --privileged
flag.
Part two in this series looks at the use of Podman and Kubernetes. The article covers similar territory but within the context of Kubernetes.
[ Want to test your sysadmin skills? Take a skills assessment today. ]
저자 소개
Daniel Walsh has worked in the computer security field for over 30 years. Dan is a Senior Distinguished Engineer at Red Hat. He joined Red Hat in August 2001. Dan leads the Red Hat Container Engineering team since August 2013, but has been working on container technology for several years.
Dan helped developed sVirt, Secure Virtualization as well as the SELinux Sandbox back in RHEL6 an early desktop container tool. Previously, Dan worked Netect/Bindview's on Vulnerability Assessment Products and at Digital Equipment Corporation working on the Athena Project, AltaVista Firewall/Tunnel (VPN) Products. Dan has a BA in Mathematics from the College of the Holy Cross and a MS in Computer Science from Worcester Polytechnic Institute.
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.