Skip to main content

Manage Python application dependencies in the cloud with this open source tool

Project Thoth uses artificial intelligence to help architects identify dependencies that could create problems in production.
Image
A computer surrounded by tools

Photo by Anna Shvets from Pexels

Project Thoth (also known as AIDevSecOps) is an open source community project sponsored by Red Hat that provides a public cloud-based resolver for Python application dependencies. The resolver uses reinforcement learning techniques to recommend a set of packages that fit the application's needs.

A reinforcement learning algorithm considers package metadata, even outside Python packaging standards, to satisfy user requests and provide a reliable application stack. Thoth can provide targeted software stack recommendations according to user needs, focusing on different aspects such as stability, security, or performance.

Architects incorporating Python applications into their designs may want to use Thoth to identify dependencies that could create problems in production.

Resolve dependencies in the cloud

Most popular Python dependency resolvers, such as pip, Pipenv, or Poetry, perform a local dependency resolution to the latest versions of packages using a backtracking algorithm. By contrast, Thoth resolves dependencies in the cloud, using its algorithm to recommend the best possible application stack according to the user's needs.

Resolving the latest package versions is not always optimal for building a reliable software stack adapted to particular use cases. For example, a data scientist using Python for quick model prototyping and testing might want to optimize the software stack of the developed application to focus on performance.

[ Learn about the Zero-Trust approach to designing security architectures. ]

Thoth can provide a recommendation based on performance criteria, considering the user's hardware (such as CPU or GPU) and the software in the runtime environment (including the operating system, Python version, base container image, CUDA, and so forth). When an application needs to be deployed in production, you can set Thoth's recommendation type (a configuration option responsible for resolving application dependencies) to "security" to provide a vulnerability-free software stack based on previously aggregated knowledge. Additional inputs to the resolution process can help the resolver resolve application dependencies with the desired quality.

Image
thoth advisor recommends python libraries
(Frido Pokorny, Maya Costantini, CC BY-SA 4.0)

The Thoth adviser uses hardware and software requirements to determine which application dependencies to resolve, as shown in the diagram above.

Heal Python applications

Thoth's resolution process is designed as a pipeline made from different types of pipeline units that can adjust the package resolution. Those pipeline units can add new packages to the dependency graph, fix underpinning issues or missing packages in a release, remove packages and package versions, or fix overpinning issues. They can also provide alternatives for specific cases, such as resolving tensorflow-gpu instead of intel-tensorflow for GPU-enabled environments or providing additional metadata to consumers.

[ Plan your next cloud project based on your current cloud results by asking these 4 essential cloud project questions. ]

Unlike pip-audit, Thoth can directly resolve application dependencies following security policies and then immediately act on security issues during the resolution process. The resolution pipeline units can be developed in Python or created using declarative YAML files. Similarly, as in the case of the Python Packaging Authority advisory database, they are machine consumable and used by the cloud resolver to enhance the package resolution.

These YAML files, called prescriptions, are publicly available in the thoth-station/prescriptions repository. Prescriptions fill the gap for curated additional package metadata and instrument the resolver to recommend a set of dependencies suitable for application-specific needs. Examples of recommendations provided by prescriptions include:

  • Data from Security Scorecards offered by the Open Source Security Foundation (OpenSSF)
  • Information about the GitHub community maintaining public projects
  • PyPI package meta-information

Build and extend containerized applications

Thoth's recommendations can also control the quality of Python application container images and provide more robust containerized runtime environments.

The project offers container image analyses that inspect what exists in the container image. The analysis extracts information about different elements, including the operating system, packages present, Python interpreters and their versions, the Application Binary Interface (ABI), libraries, and container image metadata.

This information is automatically extracted from container images, ready to be explored by developers and consumed by the cloud-based Python resolver, which offers recommendations based on the content available in container images. The container image analysis runs in an OpenShift cluster, and the results are computed using Thoth's package-extract component. Thoth supports extending prebuilt container images by adjusting advisories specifically for the containerized environment used.

[ Learn about modernizing your IT with managed cloud services .] 

Get recommendations

Here are a few examples of client tools that allow users to access Thoth's recommendation engine:

Use Thamos, Thoth's command-line interface

Users can access recommendations via the Thamos command line interface (CLI), available for download on PyPI. Thamos can be used to communicate with Thoth's backend to get advisories on an application software stack and help manage environments. It is configurable using a simple YAML file specifying the requirements format, preferred recommendation type, information related to the user's runtime environment, and more. Options include:

  • Getting advisories from Thoth's resolver
  • Showing dependency graphs and environments
  • Looking for available Thoth container images
  • Managing created virtual environments
  • Obtaining additional information about dependencies used in the project

Manage dependencies in Jupyter Notebooks

Data scientists using JupyterLab to work with Jupyter Notebooks can also benefit from Thoth's recommendations to manage their project dependencies with Thoth's jupyterlab-extension. This extension allows them to make their notebooks reproducible. The extension's user interface allows searching for packages and installing them, locking both direct and transitive dependencies and saving them in the notebook metadata. For notebooks with code that already uses dependencies, the extension can analyze the code and suggest the correct libraries to be installed.

Detect potential vulnerabilities

Developers who rely on packages maintained by external communities to build software know that these packages can represent a potential source of vulnerabilities in their applications. Thoth offers security recommendations to detect potential vulnerabilities in project dependencies. It uses knowledge aggregated by the Python Packaging Authority's advisory-db and OpenSSF Security Scorecards and different metrics computed to measure code quality and score packages on security during the resolution process.

The project offers security guidance for containerized applications via Quay Clair static analysis of security vulnerabilities. Quay allows users to create a registry of container images and uses Clair to check vulnerabilities in the hosted container images. The results of container security scans done by Clair are aggregated by a background aggregation logic for container images provided by the Thoth team.

The security and efficiency of Python applications are as critical to architects as they are to developers. Architects who familiarize themselves with Thoth and other developer tools strengthen their ability to guide enterprises in the right direction.

Reach out to the Thoth team

As part of Project Thoth, we are accumulating knowledge to help Python developers create healthy applications. Please subscribe to our YouTube channel or follow us on the @ThothStation Twitter handle if you would like to keep track of updates. As Thoth offers a public service, you can check this tutorial that walks you through Thoth's resolver features.

The project is in its early stages, and we continuously improve its stability and reliability. We would be happy for any feedback. To send us feedback or get involved in improving the Python ecosystem, please contact the Thoth Station support repository, or directly reach out to the Thoth team on Twitter. You can report any issues you've spotted in open source Python libraries to the support repository or directly write prescriptions for the resolver and send them to our prescriptions repository. By participating in these ways, you can help the Python cloud-based resolver develop better recommendations for the whole Python community.

Architects can provide unique feedback from a different perspective, so you are strongly encouraged to participate in the conversation.


This article is adapted from a series of articles previously published on the Red Hat Developer blog.

Author’s photo

Fridolín Pokorný

Fridolín works as a Senior Software Engineer in Red Hat's Office of the CTO's emerging technologies group. His background is in the security domain and his interests include security, machine learning, C/C++, Python, and distributed systems. He also loves nature, road cycling, and good books. More about me

Author’s photo

Maya Costantini

Maya is a Software Engineer in Red Hat's Emerging Technologies Security team. She is passionate about Python, open source, and software supply chain security. More about me

Navigate the shifting technology landscape. Read An architect's guide to multicloud infrastructure.

OUR BEST CONTENT, DELIVERED TO YOUR INBOX

Privacy Statement