Running containers with Podman and shareable systemd services

New features coming in Podman v1.7 make using systemd in conjunction with Podman even easier.

Posted: December 16, 2019 by Valentin Rothberg (Sudoer alumni, Red Hat)

Editor's note: The original version of this tutorial used Fedora/Alpine as the example container operating system. It has since been replaced with BusyBox which should have all the necessary functionality in order for this to function properly given the fast moving pace of the Podman project.

The ability to use systemd services to run and manage containers has been requested by users for many years. There were several attempts in Docker’s early days to allow running Docker containers with systemd, but that functionality turned out to be harder than expected. Why? Systemd must be aware of and have control over the processes running inside the systemd service to properly manage it. That’s especially important so systemd can know if the main process is running, and if it’s in a healthy state.

The problem is that Docker’s client-server architecture complicates things. All Docker commands are sent to the Docker daemon, which makes it almost impossible for systemd to control container processes. Moreover, successful execution of the Docker client does not necessarily imply that the container is up and running. Multiple attempts to improve the situation have been rejected, leaving a lot of room for improvement.

The good news is that Podman is an excellent choice for running containers, and especially so for running them in systemd services. Let’s take a look at how this works.

`systemd` service file generation

Podman’s fork and exec architecture allows systemd to properly control and manage container processes. In fact, Podman makes putting a container into a systemd service as simple as calling podman generate systemd $container. Let’s generate a service for a container:

$ podman create -d --name foo busybox:latest top
54502f309f3092d32b4c496ef3d099b270b2af7b5464e7cb4887bc16a4d38597
$ podman generate systemd --name foo
# container-foo.service
# autogenerated by Podman 1.6.2
# Tue Nov 19 15:49:15 CET 2019

[Unit]
Description=Podman container-foo.service
Documentation=man:podman-generate-systemd(1)

[Service]
Restart=on-failure
ExecStart=/usr/bin/podman start foo
ExecStop=/usr/bin/podman stop -t 10 foo
KillMode=none
Type=forking
PIDFile=/run/user/1000/overlay-containers/54502f309f3092d32b4c496ef3d099b270b2af7b5464e7cb4887bc16a4d38597/userdata/conmon.pid

[Install]
WantedBy=multi-user.target

The generated systemd service file can now be used to manage the foo container via systemd. We can copy the file to ~/.config/systemd/user/container-foo.service and start a rootless container via systemctl --user start container-foo.service.

Specific versus generic container services

The ability to generate systemd service files offers a lot of flexibility to users, and intentionally blurs the difference between a container and any other program or service on the host. Since Podman v1.6, we can also generate service files for pods that can conveniently be written to files via the --files flag. However, all of these generated files are specific to containers and pods that already exist. As shown in the example above, we first have to create a container or pod and can then generate specific service files. But what if we want to run a new container directly via the service? What if we want to share a service file with other users?

After collecting more experience in this domain and receiving feedback from the community, we sat down and reflected on how we can improve and provide a generic service skeleton that can be used in a backwards compatible fashion with already released versions of Podman in the wild. The good news is that we found such backwards compatible service files, which we shall have a closer look at now:

[Unit]
Description=Podman in Systemd

[Service]
Restart=on-failure
ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid
ExecStart=/usr/bin/podman run --conmon-pidfile  /%t/%n-pid  --cidfile /%t/%n-cid -d busybox:latest top
ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat /%t/%n-cid`"
KillMode=none
Type=forking
PIDFile=/%t/%n-pid

[Install]
WantedBy=multi-user.target

The upper service file sets the restart policy to on-failure, which instructs systemd to restart the service when, among other things, the service cannot be started or stopped cleanly, or when the process exits non-zero. The ExecStart line describes how we start the container, the ExecStop line describes how we stop and remove the container. In this example, we want to run a simple busybox:latest container in the background that runs top. But there are two more flags we should look at: --conmon-pidfile and --cidfile.

The --conmon-pidfile flag points to a path to store the process ID for the container’s conmon process. Conmon is a small monitoring tool that Podman uses to perform operations such as keeping ports and file descriptors open, streaming the container logs, and cleaning up once the container has finished. This command also returns the container’s exit code, which is essential for the systemd service use case, as we can use the conmon-pidfile as the PIDFile for the same service. If the container exits non-zero, conmon will as well, and systemd can report the correct service status and restart it if needed:

[Service]
Restart=on-failure
ExecStartPre=/usr/bin/rm -f  /%t/%n-pid  /%t/%n-cid
ExecStart=/usr/bin/podman run --conmon-pidfile  /%t/%n-pid  --cidfile /%t/%n-cid -d busybox:latest top
...
PIDFile=/%t/%n-pid

The --cidfile flag points to the path that stores the container ID. When running or creating a container, Podman writes the corresponding container ID to the specified path. Doing so allows us to write elegant and generic service files, because we can use the file for stopping or removing the container as well. In the previous example, the ExecStop line uses a shell trick (i.e., -c followed by a set of commands for shell interpretation) for stopping the container. Starting with the upcoming release of Podman v1.7, podman stop and podman rm support the --cidfile flag as well, so we don’t need the upper shell trickery anymore:

[Service]
Restart=on-failure
ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid
ExecStart=/usr/bin/podman run --conmon-pidfile /%t/%n-pid --cidfile  /%t/%n-cid  -d busybox:latest top
ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat  /%t/%n-cid`"
...

Now, let’s look at the specified paths to the conmon-pidfile and the cidfile, /%t/%n-pid and /%t/%n-cid, which deserve some explanation as well. In these statements, %t is the path to the run time directory’s root (i.e., /run/user/$UserID). This is where Podman also stores most of its runtime data. The %n portion is the full name of the service. Systemd guarantees uniqueness for service names, so we don’t need to worry about potential file name conflicts.

Assuming our service is named foo and has a user ID of 1000, the corresponding conmon-pidfile is placed in /run/user/1000/foo.service-pid, while the cidfile is placed in /run/user/1000/foo.service-cid.

Note: It’s important to set the kill mode to none. Otherwise, systemd will start competing with Podman to stop and kill the container processes. which can lead to various undesired side effects and invalid states.

A walk-through example

So much for theory—let’s have a look. First, make sure that the file is accessible to our non-root user.

$ cat ~/.config/systemd/user/container.service
[Unit]
Description=Podman in Systemd

[Service]
Restart=on-failure
ExecStartPre=/usr/bin/rm -f /%t/%n-pid /%t/%n-cid
ExecStart=/usr/bin/podman run --conmon-pidfile /%t/%n-pid --cidfile /%t/%n-cid -d busybox:latest top
ExecStop=/usr/bin/sh -c "/usr/bin/podman rm -f `cat /%t/%n-cid`"
KillMode=none
Type=forking
PIDFile=/%t/%n-pid

[Install]
WantedBy=multi-user.target

Now, we can load and start the service:

$ systemctl --user daemon-reload
$ systemctl --user start container.service
$ systemctl --user status container.service
● container.service - Podman in Systemd
   Loaded: loaded (/home/valentin/.config/systemd/user/container.service; disabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-11-18 15:32:56 CET; 1min 5s ago
  Process: 189705 ExecStartPre=/usr/bin/rm -f //run/user/1000/container.service-pid //run/user/1000/container.service-cid (code=exited, status=0/SUCCESS)
  Process: 189706 ExecStart=/usr/bin/podman run --conmon-pidfile //run/user/1000/container.service-pid --cidfile //run/user/1000/container.service-cid -d busybox:latest top (code=exited, status=0/SUCCESS)
 Main PID: 189731 (conmon)
   CGroup: /user.slice/user-1000.slice/user@1000.service/container.service
       	├─189724 /usr/bin/fuse-overlayfs [...]
       	├─189726 /usr/bin/slirp4netns [...]
       	├─189731 /usr/bin/conmon [...]
       	└─189737 top

$ podman ps
CONTAINER ID  IMAGE                        	COMMAND  CREATED     	STATUS         	PORTS  NAMES
f20988d59920  docker.io/library/busybox:latest  top  	12 seconds ago  Up 11 seconds ago     	funny_zhukovsky

Great! Systemd started the service successfully, and Podman reports the container as running as well. Note that I trimmed parts of the upper output for brevity. An important part is the Main PID, which points to the correct conmon process. Without explicitly pointing systemd to the correct process via the PIDFile option, systemd might wrongly choose another process in this cgroup as the main process. There are a few other processes listed (i.e., fuse-overlayfs, slirp4nets, and top), and they all run in the same cgroup. Fuse-overlayfs is an implementation of the overlay filesystem in user space via Fuse and slirp4nets allows unprivileged networking. Both of these tools are essential for running rootless containers with Podman.

Before properly stopping the service via systemctl --user stop container.service, let’s test the restart policy, which is set to on-failure. We can cause such a failure by killing the top process (i.e., 189737):

$ kill -9 189731
$ systemctl --user status container.service
● container.service - Podman in Systemd
   Loaded: loaded (/home/valentin/.config/systemd/user/container.service; disabled; vendor preset: enabled)
   Active: active (running) since Mon 2019-11-18 16:09:38 CET; 1min 3s ago [...]
Main PID: 191263 (conmon)

We can see that the Main PID has changed from 189731 to 191263. That’s an expected outcome, as we killed the container process, which hence exited non-zero. Conmon exited with the same exit code and systemd correctly restarted the service. Note that the service will also be restarted when we manually stop a container via podman stop $container, because the top binary in the busybox:latest container exits with 143 when stopped with SIGTERM. The top binary from other distributions (e.g., BusyBox) exits with 0 after SIGTERM, so systemd would not restart the service. Such behavioral differences are extremely important to consider when writing systemd services, so we need to be careful when setting the restart policy.

Back to work

The nice thing about the generic systemd service file presented in this article is that it is backwards compatible with versions of Podman running in the wild. May it be Red Hat Enterprise Linux, BusyBox, or Ubuntu, users can immediately follow the suggested format. Nonetheless, the Podman team is continuing to improve the support and user experience when running containers in systemd services. Try it out!

New to containers? Download the Containers Primer and learn the basics of Linux containers.

Topics: Containers Podman

Running containers with Podman and shareable systemd services

`systemd` service file generation

Specific versus generic container services

A walk-through example

Back to work

Valentin Rothberg

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.

Running containers with Podman and shareable systemd services

systemd service file generation

Specific versus generic container services

A walk-through example

Back to work

Valentin Rothberg

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.

Related Content

`systemd` service file generation