When people talk about artificial intelligence (AI), they’re usually talking about the combination of a chat bot, providing input and output, and a large language model (LLM), providing data that the chat bot can use to form sentences. AI without LLM isn’t very useful, and that’s why much of the conversation around the legalities and ethics of AI are concerned with what’s being used to build the “knowledge” used by generative AI (gen AI). How can you be sure that the data a gen AI uses to formulate its answers is reliable, trustworthy, and unencumbered by copyright? The best way to either audit or specialize the knowledge base of AI is to use open source, and that’s what the InstructLab project makes possible.
What is InstructLab?
InstructLab is an open source AI project that promotes universal modeling with open contribution. Its stated goal is to enable anyone to shape gen AI, whether you need an open source LLM due to concerns over intellectual property and copyright, privacy, reliability, subject matter expertise, accessibility or anything else. Designing a complete LLM is a big task, so the best way to build an open LLM is to build it in the open. Because InstructLab is open source, you can contribute to it and help make open source language models the best choice for gen AI. Here are three ways you can get started with InstructLab today.
Share your expertise
AI uses probability to construct its responses and it bases each answer on factual information serving as a model. The collection of facts used by AI is part of a LLM. For InstructLab to be the best basis of AI-powered content, it must provide an exhaustive LLM. Building an LLM requires the construction of a data bank of reliable content. In InstructLab terminology, this is called a taxonomy, which includes the two primary categories of skill and knowledge.
A skill in InstructLab is performative. When you create a skill for InstructLab, you teach it how to do something specific, like rearranging words in a sentence while maintaining the same meaning, finding two words that rhyme or converting a string to camel case.
Knowledge is a collection of facts, with citation of a reliable source. When you create knowledge for a language model, you provide the model data it can use to answer direct questions.
Both skill and knowledge are stored as yet another markup language (YAML), a minimalist file format consisting of key and value pairs (a “mapping”) and lists (a “sequence”). Here’s a simple example of knowledge expressed in YAML:
---
version: 2
created_by: tux
domain: flowers
seed_examples:
- answer: 'A carnation is a herbaceous perennial plant.'
question: 'What kind of plant is a carnation?'
- answer: 'Dianthus caryophyllus'
question: 'What is the scientific name for a carnation?'
task_description: 'teach a language model about carnations'
document:
repo: https://github.com/juliadenham/Summit_knowledge
commit: 195fc4d83a40d8a1b60062e66e06cfc0bc9c8d35
patterns:
- dianthus_caryophyllus.md
Here’s a simple example of a skill expressed as YAML:
---
version: 2
task_description: 'Teach the model how to rhyme.'
created_by: juliadenham
seed_examples:
- question: What are 5 words that rhyme with horn?
answer: warn, torn, born, thorn, and corn.
- question: What are 5 words that rhyme with cat?
answer: bat, gnat, rat, vat, and mat.
- question: What are 5 words that rhyme with poor?
answer: door, shore, core, bore, and tore.
- question: What are 5 words that rhyme with bank?
answer: tank, rank, prank, sank, and drank.
- question: What are 5 words that rhyme with bake?
answer: wake, lake, steak, make, and quake.
Compare the YAML examples of knowledge and skill. Knowledge contains verifiable data on a specific topic. A skill contains examples of a specific task.
After reading the contribution guide, you can create a qna.yaml
file of your own, and submit it to InstructLab for inclusion in the LLM. You may have to revise your work to ensure it can be processed and integrated into the project, and getting familiar with tools like yamllint is useful, but with just a little effort, you can make a meaningful contribution to open source AI.
Run an AI locally with the ilab command
Setting up an AI is a fairly complex and manual process, but with InstructLab it’s easier than you might expect. You need to be familiar with Python tools like virtual environments and pip, and you must be comfortable in a terminal environment such as Bash. You also must have CUDA (or a similar parallel computing framework) set up on your system, and plenty of drive space (the LLM is 5 GB, and growing).
Follow the install guide on the InstructLab repository, and then interact with AI and the InstructLab model, and then report on bugs and feature requests.
Contribute code
At the moment, the InstructLab project consists of 12 repositories. There’s the command-line interface ilab
, a Python library for synthetic data generation, design documents, taxonomy files and the JSON schema for the taxonomy YAML and more. If you’re a programmer, then you might find issues or feature requests in unclosed bug reports that you could help resolve.
For your first contribution, it often makes sense to solve a minor issue in anticipation that you’ll use the bulk of your time understanding the development team’s process. Bugs requiring only a simple fix are tagged with good first issue
, so use is:open is:issue label:"good first issue
" as a filter when looking for a good entry point. There’s also a guide for first-time contributors that explains in detail how to set up your dev environment and, just as importantly, how to test your new code before requesting a merge.
Open source AI is within reach, and as with any form of open source it stands to place the control and terms of AI into the hands of users. If you deal in a specialized domain, general AI may not have the knowledge or skill required to be useful to your users. If you deal with sensitive data, then general AI may not even have access to the information your users need. With InstructLab, you can help build a universal and open LLM, or even build your own. Whatever your goal, get started with InstructLab today!
執筆者紹介
Seth Kenlon is a Linux geek, open source enthusiast, free culture advocate, and tabletop gamer. Between gigs in the film industry and the tech industry (not necessarily exclusive of one another), he likes to design games and hack on code (also not necessarily exclusive of one another).
類似検索
チャンネル別に見る
自動化
テクノロジー、チームおよび環境に関する IT 自動化の最新情報
AI (人工知能)
お客様が AI ワークロードをどこでも自由に実行することを可能にするプラットフォームについてのアップデート
オープン・ハイブリッドクラウド
ハイブリッドクラウドで柔軟に未来を築く方法をご確認ください。
セキュリティ
環境やテクノロジー全体に及ぶリスクを軽減する方法に関する最新情報
エッジコンピューティング
エッジでの運用を単純化するプラットフォームのアップデート
インフラストラクチャ
世界有数のエンタープライズ向け Linux プラットフォームの最新情報
アプリケーション
アプリケーションの最も困難な課題に対する Red Hat ソリューションの詳細
オリジナル番組
エンタープライズ向けテクノロジーのメーカーやリーダーによるストーリー
製品
ツール
試用、購入、販売
コミュニケーション
Red Hat について
エンタープライズ・オープンソース・ソリューションのプロバイダーとして世界をリードする Red Hat は、Linux、クラウド、コンテナ、Kubernetes などのテクノロジーを提供しています。Red Hat は強化されたソリューションを提供し、コアデータセンターからネットワークエッジまで、企業が複数のプラットフォームおよび環境間で容易に運用できるようにしています。
言語を選択してください
Red Hat legal and privacy links
- Red Hat について
- 採用情報
- イベント
- 各国のオフィス
- Red Hat へのお問い合わせ
- Red Hat ブログ
- ダイバーシティ、エクイティ、およびインクルージョン
- Cool Stuff Store
- Red Hat Summit