Apache Kafka has had a major impact in a short time.
With the project reporting more than 60% of Fortune 100 companies using it today to modernize their data architecture, Kafka has proven to be a popular event streaming platform across a range of industries.
Apache Kafka is an open source software platform developed by the Apache Software Foundation that can publish, subscribe to, store, and process streams of records in real time. Some use cases of Apache Kafka include messaging, website activity tracking, metrics, and log aggregation.
Basically, if you want to move massive amounts of data quickly and at scale, you want Apache Kafka. But that’s not to say you won’t encounter some challenges when using it.
Let’s take a look at some of the common challenges of using Apache Kafka in this article—and what options are available to help you deal with them.
1) Apache Kafka is hard to set up (and learn)
Setting up and managing Apache Kafka no cakewalk.
Are you going to put it on a physical machine? A cloud? What are the considerations to think about for both? You have to figure out networking requirements, setting up the right interfaces, segregating in terms of security.
And then, when you have it all up and running, you still have to tackle Day 2 operations and know how to diagnose and resolve problems as they arise. It’s complicated.
2) Apache Kafka isn’t super developer-friendly
Developers new to Apache Kafka might find it difficult to grasp the concept of Kafka brokers, clusters, partitions, topics, logs, and so on. The learning curve is steep. You’ll need extensive training to learn Kafka’s basic foundations and the core elements of an event streaming architecture.
To better understand this particular challenge, we first need to familiarize ourselves with how Kafka works.
At a high level, Apache Kafka streams data from sources (called producers) to targets (called consumers). Producers can push data into (and consumers can pull data out of) what we call Kafka topics, where our data is stored and published.
Each piece of data in a Kafka topic is a key-value pair, where the value can be serialized into data formats like Avro, JSON, or Protobuf. The structure of the data format is what we call a schema.
A problem developers might run into when trying to use different schemas to encode data is consumers being unable to understand producers: your downstream consumers will start breaking if the producer schema is different. In other words, there is no data verification in Kafka.
Another area where developers might have trouble working with Kafka is in protocol support. Because Kafka works in a Java Virtual Machine (JVM) ecosystem, the main programming language of the client is Java. This could be a problem if your preferred language is Python or C, for example.
While there are open source clients available in other languages, these don’t come with Kafka itself. You’ll have to set up and update these drivers manually for full protocol support.
3) Spinning up Kafka connectors takes time and energy
To move large collections of data into and out of Apache Kafka, there’s a tool called Kafka Connect. Kafka Connect lets you run and build connectors (which tell you where data should be copied to and from) for your Kafka cluster.
Sounds great, right?
The problem is, manually creating or running connectors for Kafka Connect clusters can take up a lot of operational bandwidth that you might not have. Your team will have to spin up these connectors, provision the right infrastructure for them, and generally deal with the day-to-day operations of the cluster. Focusing on larger business challenges becomes difficult when you and your teams have to deal with running and operating Kafka Connect on a rudimentary level.
What can help solve these challenges?
Red Hat Openshift Streams for Kafka is designed to alleviate the pain of doing all the administration work for Kafka.
With Openshift Streams for Kafka, you don’t have to worry about taking the time to get Kafka up and running. You can get started with Kafka right away with a free trial (no strings attached or credit card required) and connect to it from any application.
And if you decide that you want to stick around with Openshift Streams for Apache Kafka, you also don’t need to worry about maintaining it all day. Red Hat's 24/7 global Site Reliability Engineering team fully manages daily operations, including:
-
Monitoring
-
Logging
-
Upgrades
-
And patching, to proactively address issues and quickly solve problems.
Red Hat Cloud Services delivers a streamlined developer experience and ensures consistency in data handling across applications. Red Hat OpenShift Service Registry, for example, enables development teams to publish, discover, and communicate using well-defined data schemas with Apache Kafka.
Lastly, while Openshift Streams for Apache Kafka does not yet handle running and operating Kafka connectors, it is in the overall roadmap. Be sure to check the Openshift Streams for Apache Kafka homepage for updates and more resources.
執筆者紹介
Bill Cozens is a recent UNC-Chapel Hill grad interning as an Associate Blog Editor for the Red Hat Blog.
チャンネル別に見る
自動化
テクノロジー、チームおよび環境に関する IT 自動化の最新情報
AI (人工知能)
お客様が AI ワークロードをどこでも自由に実行することを可能にするプラットフォームについてのアップデート
オープン・ハイブリッドクラウド
ハイブリッドクラウドで柔軟に未来を築く方法をご確認ください。
セキュリティ
環境やテクノロジー全体に及ぶリスクを軽減する方法に関する最新情報
エッジコンピューティング
エッジでの運用を単純化するプラットフォームのアップデート
インフラストラクチャ
世界有数のエンタープライズ向け Linux プラットフォームの最新情報
アプリケーション
アプリケーションの最も困難な課題に対する Red Hat ソリューションの詳細
オリジナル番組
エンタープライズ向けテクノロジーのメーカーやリーダーによるストーリー
製品
ツール
試用、購入、販売
コミュニケーション
Red Hat について
エンタープライズ・オープンソース・ソリューションのプロバイダーとして世界をリードする Red Hat は、Linux、クラウド、コンテナ、Kubernetes などのテクノロジーを提供しています。Red Hat は強化されたソリューションを提供し、コアデータセンターからネットワークエッジまで、企業が複数のプラットフォームおよび環境間で容易に運用できるようにしています。
言語を選択してください
Red Hat legal and privacy links
- Red Hat について
- 採用情報
- イベント
- 各国のオフィス
- Red Hat へのお問い合わせ
- Red Hat ブログ
- ダイバーシティ、エクイティ、およびインクルージョン
- Cool Stuff Store
- Red Hat Summit