With Docker moving all of their official images to Alpine, base image size is a hot topic. Sure, having sane and minimal base images is important, but software supply chain hygiene is equally (if not more) important - interested to understand why?
Among other things, it's important in a production container environment to have provenance (i.e. knowledge of where your container images came from). Using
Dockerfiles is a great way to track and enforce provenance policies. Each Dockerfile has a FROM line which specifies its upstream image. In a production environment, we have a couple of basic strategies that we can employ:
- Wild West: let people build their Dockerfiles using a FROM line that specifies any image on the Internet anywhere
- Black List: let people build their Dockerfiles using a FROM line that specifies any image except from few known bad places
- White List: let people build their Dockerfiles using a FROM line that specifies only known good images
My sysadmin genes twitch at the first two strategies, so let's go with number three. Strategy three seems to provide the best software supply chain hygiene for several reasons:
- We can scan and approve images as they come into the environment
- We can limit our attack surface from a content perspective
- We can limit the size of our on disk cache on each container host
I addressed the first two points in my article on DCI, but if you can minimize the number of genesis images (i.e. core builds), you really should be able to get the size of the on disk Docker cache down to something like:
Core Build * N + Software Layers * M
Going back to my old college days in computer science - with a fairly low number of core builds, this should get your on disk image size down to approximately the size of the Software Layers themselves. This is kind of like Big O of M. If you let people pull images from all of the green earth, then yes you can (and likely will) have this problem of image sprawl, but if you practice good hygiene, you should be able to reduce your on disk image size significantly. So, while small base images are useful for demos, having a wild number of base image permutations all over the container environment will actually expose you to more disk usage, and a larger attack surface.
Supply Chain Hygiene
Notice in the below output that lighttpd-rhel7 and python-34-rhel7 are both Builder Images, which use the Source to Image tooling to produce child images. This enforces hygiene and guarantees that the rhel7 base image will only ever be downloaded and cached once on any given Docker host.
docker run --privileged -v /var/run/docker.sock:/var/run/docker.sock nate/dockviz images -i -t registry.access.redhat.com/rhel7:latest └─6c3a84d798dc Virtual Size: 201.7 MB Tags: registry.access.redhat.com:443/rhel7/rhel:latest, registry.access.redhat.com/rhel7:latest ├─eb1c00f5bd90 Virtual Size: 0.0 B │ └─309edfe8c834 Virtual Size: 0.0 B │ └─e55678196329 Virtual Size: 10.6 MB │ └─c0cfe103f6a8 Virtual Size: 26.9 MB │ └─e2a8aa8fc36d Virtual Size: 0.0 B │ └─013730cb9ca4 Virtual Size: 0.0 B │ └─e87b59df8cb9 Virtual Size: 1.2 KB │ └─3d001d53fb17 Virtual Size: 1.2 KB │ └─ec15cf3e6a54 Virtual Size: 447.0 B │ └─2562d5e87e75 Virtual Size: 447.0 B │ └─5de8fddd4822 Virtual Size: 0.0 B │ └─fbc86b0dcde9 Virtual Size: 0.0 B │ └─20e1f1bdc9cf Virtual Size: 0.0 B Tags: lighttpd-rhel7:latest │ └─99a442cabd6a Virtual Size: 298.0 B Tags: lighttpd-test-app:latest └─ce709b84e064 Virtual Size: 173.6 MB └─bae1743ada78 Virtual Size: 86.8 MB Tags: registry.access.redhat.com/rhscl/python-34-rhel7:latest └─cc9b43e09c6e Virtual Size: 743.5 KB Tags: python-34-rhel7-app:latest
Put differently, in a more qualitative way, if the Python image layer is 86MB and the underlying RHEL 7 base image is 200MB, then each of your hosts should have exactly 200MB of storage used up for the base image. Scaled across tens or hundreds of applications all standardized on a rhel7 base image, the 200MB for the base image "fades away". Since this base image should be shared across most or all derived images, the impact of the base image size has a very small impact on your clusters. I would instead optimize on functionality (yum, strace, system tap) rather than image size. Another byproduct is, the developer can offload the updating of things like glibc to the systems administrators.
Moral of today's story: hygiene matters (a lot)... so build all (or most) of your images off of a core build. They can limit your attack surface across your entire environment, provide your developers with increased flexibility, and provide operations a more manageable container environment.
저자 소개
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.