When people talk about artificial intelligence (AI), they’re usually talking about the combination of a chat bot, providing input and output, and a large language model (LLM), providing data that the chat bot can use to form sentences. AI without LLM isn’t very useful, and that’s why much of the conversation around the legalities and ethics of AI are concerned with what’s being used to build the “knowledge” used by generative AI (gen AI). How can you be sure that the data a gen AI uses to formulate its answers is reliable, trustworthy, and unencumbered by copyright? The best way to either audit or specialize the knowledge base of AI is to use open source, and that’s what the InstructLab project makes possible.
What is InstructLab?
InstructLab is an open source AI project that promotes universal modeling with open contribution. Its stated goal is to enable anyone to shape gen AI, whether you need an open source LLM due to concerns over intellectual property and copyright, privacy, reliability, subject matter expertise, accessibility or anything else. Designing a complete LLM is a big task, so the best way to build an open LLM is to build it in the open. Because InstructLab is open source, you can contribute to it and help make open source language models the best choice for gen AI. Here are three ways you can get started with InstructLab today.
Share your expertise
AI uses probability to construct its responses and it bases each answer on factual information serving as a model. The collection of facts used by AI is part of a LLM. For InstructLab to be the best basis of AI-powered content, it must provide an exhaustive LLM. Building an LLM requires the construction of a data bank of reliable content. In InstructLab terminology, this is called a taxonomy, which includes the two primary categories of skill and knowledge.
A skill in InstructLab is performative. When you create a skill for InstructLab, you teach it how to do something specific, like rearranging words in a sentence while maintaining the same meaning, finding two words that rhyme or converting a string to camel case.
Knowledge is a collection of facts, with citation of a reliable source. When you create knowledge for a language model, you provide the model data it can use to answer direct questions.
Both skill and knowledge are stored as yet another markup language (YAML), a minimalist file format consisting of key and value pairs (a “mapping”) and lists (a “sequence”). Here’s a simple example of knowledge expressed in YAML:
---
version: 2
created_by: tux
domain: flowers
seed_examples:
- answer: 'A carnation is a herbaceous perennial plant.'
question: 'What kind of plant is a carnation?'
- answer: 'Dianthus caryophyllus'
question: 'What is the scientific name for a carnation?'
task_description: 'teach a language model about carnations'
document:
repo: https://github.com/juliadenham/Summit_knowledge
commit: 195fc4d83a40d8a1b60062e66e06cfc0bc9c8d35
patterns:
- dianthus_caryophyllus.md
Here’s a simple example of a skill expressed as YAML:
---
version: 2
task_description: 'Teach the model how to rhyme.'
created_by: juliadenham
seed_examples:
- question: What are 5 words that rhyme with horn?
answer: warn, torn, born, thorn, and corn.
- question: What are 5 words that rhyme with cat?
answer: bat, gnat, rat, vat, and mat.
- question: What are 5 words that rhyme with poor?
answer: door, shore, core, bore, and tore.
- question: What are 5 words that rhyme with bank?
answer: tank, rank, prank, sank, and drank.
- question: What are 5 words that rhyme with bake?
answer: wake, lake, steak, make, and quake.
Compare the YAML examples of knowledge and skill. Knowledge contains verifiable data on a specific topic. A skill contains examples of a specific task.
After reading the contribution guide, you can create a qna.yaml
file of your own, and submit it to InstructLab for inclusion in the LLM. You may have to revise your work to ensure it can be processed and integrated into the project, and getting familiar with tools like yamllint is useful, but with just a little effort, you can make a meaningful contribution to open source AI.
Run an AI locally with the ilab command
Setting up an AI is a fairly complex and manual process, but with InstructLab it’s easier than you might expect. You need to be familiar with Python tools like virtual environments and pip, and you must be comfortable in a terminal environment such as Bash. You also must have CUDA (or a similar parallel computing framework) set up on your system, and plenty of drive space (the LLM is 5 GB, and growing).
Follow the install guide on the InstructLab repository, and then interact with AI and the InstructLab model, and then report on bugs and feature requests.
Contribute code
At the moment, the InstructLab project consists of 12 repositories. There’s the command-line interface ilab
, a Python library for synthetic data generation, design documents, taxonomy files and the JSON schema for the taxonomy YAML and more. If you’re a programmer, then you might find issues or feature requests in unclosed bug reports that you could help resolve.
For your first contribution, it often makes sense to solve a minor issue in anticipation that you’ll use the bulk of your time understanding the development team’s process. Bugs requiring only a simple fix are tagged with good first issue
, so use is:open is:issue label:"good first issue
" as a filter when looking for a good entry point. There’s also a guide for first-time contributors that explains in detail how to set up your dev environment and, just as importantly, how to test your new code before requesting a merge.
Open source AI is within reach, and as with any form of open source it stands to place the control and terms of AI into the hands of users. If you deal in a specialized domain, general AI may not have the knowledge or skill required to be useful to your users. If you deal with sensitive data, then general AI may not even have access to the information your users need. With InstructLab, you can help build a universal and open LLM, or even build your own. Whatever your goal, get started with InstructLab today!
À propos de l'auteur
Seth Kenlon is a Linux geek, open source enthusiast, free culture advocate, and tabletop gamer. Between gigs in the film industry and the tech industry (not necessarily exclusive of one another), he likes to design games and hack on code (also not necessarily exclusive of one another).
Contenu similaire
Parcourir par canal
Automatisation
Les dernières nouveautés en matière d'automatisation informatique pour les technologies, les équipes et les environnements
Intelligence artificielle
Actualité sur les plateformes qui permettent aux clients d'exécuter des charges de travail d'IA sur tout type d'environnement
Cloud hybride ouvert
Découvrez comment créer un avenir flexible grâce au cloud hybride
Sécurité
Les dernières actualités sur la façon dont nous réduisons les risques dans tous les environnements et technologies
Edge computing
Actualité sur les plateformes qui simplifient les opérations en périphérie
Infrastructure
Les dernières nouveautés sur la plateforme Linux d'entreprise leader au monde
Applications
À l’intérieur de nos solutions aux défis d’application les plus difficiles
Programmes originaux
Histoires passionnantes de créateurs et de leaders de technologies d'entreprise
Produits
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Services cloud
- Voir tous les produits
Outils
- Formation et certification
- Mon compte
- Assistance client
- Ressources développeurs
- Rechercher un partenaire
- Red Hat Ecosystem Catalog
- Calculateur de valeur Red Hat
- Documentation
Essayer, acheter et vendre
Communication
- Contacter le service commercial
- Contactez notre service clientèle
- Contacter le service de formation
- Réseaux sociaux
À propos de Red Hat
Premier éditeur mondial de solutions Open Source pour les entreprises, nous fournissons des technologies Linux, cloud, de conteneurs et Kubernetes. Nous proposons des solutions stables qui aident les entreprises à jongler avec les divers environnements et plateformes, du cœur du datacenter à la périphérie du réseau.
Sélectionner une langue
Red Hat legal and privacy links
- À propos de Red Hat
- Carrières
- Événements
- Bureaux
- Contacter Red Hat
- Lire le blog Red Hat
- Diversité, équité et inclusion
- Cool Stuff Store
- Red Hat Summit