订阅内容

In my first article, Part I: Like a Rainbow In The Dark, I described which problems distributed tracing can help you solve. I also provided some strategies for  adopting this observability superpower without getting overwhelmed. As you continue down that path, though, there are some myths that need to be busted, and complex concepts demystified.

Let's start with a common phrase that sounds good at first but ultimately isn't useful. This statement can both come from the engineering trenches or whoever is calling the shots: “I want great observability without touching a line of code”.

That’s not going to happen. There are observability tools and agents that work without touching your code base, but without proper instrumentation, it’s not as effective as it could be. If you're afraid of the cost of instrumenting your code, remember that when you add an inoffensive debug log to your code, you're instrumenting it. Instrumenting code isn't something to be avoided, it's just something you need to do right.

Another common phrase is “With a mesh, I have tracing and that’s all the observability I need”.

A service mesh can be thought of as a no-code solution for everything. What you need to understand is how far it can carry you, and where you need to go after you've gotten started with it. Spoiler alert: in our next article, we'll explore different approaches and recommendations to instrument your code.

This is a mesh

service mesh is a wonderful net to throw your microservices into. In the same way Kubernetes has those tedious lifecycle checks and mechanisms to restart your applications when they go down, the mesh helps you avoid implementing retries, load balancing, mTLS certificate management, and many more networking headaches.

Service mesh

One good first approach is to think of a mesh as a set of networking service definitions, rules, and sidecars managed by a bunch of controllers. A service mesh can have many different implementations and can serve many use cases. Two examples:

  • Load balancing: When you leverage the service mesh, there’s only one server to talk to at the other side (from an application’s point of view). There's just one connection to handle. The mesh can split that connection between many other services without you touching a line of code for your legacy application
  • In-cluster mTLS communications: A mesh can make your developing life easier by absolving you of tasks you'd normally have to code manually. You only need to implement services that talk in plain HTTP to the mesh, which is the component doing all the real networking work (including certificate management)

This can be pretty convenient when you don’t want to implement the same thing over and over again in all your microservices.

Luckily from an end-user perspective, using a service mesh can be as simple as annotating your workloads (literally adding one line to their definition). When these workloads are scheduled for installation, the service mesh detects them and injects a sidecar into them. This sidecar gets in the middle of every communication you want and handles a lot of little details for you.

How can a service mesh help with observability?

A mesh has the capability to intervene in an internal network, and to apply configurations from a central location. This is powerful, but it can also become very complex. The Kiali project provides a console to manage  Istio, one of the most common service mesh implementations.

Dependency graph in Kiali

It didn’t take long for users to realize that observability is crucial to understand the mesh, so Kiala has invested a lot in providing an observability platform for Kiala and OpenShift consoles. Providing seamless integrations with common open source components is a great step towards a real democratization of this technology, which can otherwise be both complex and expensive. 

A pillar of observability is distributed tracing, and conveniently Istio is instrumented with distributed tracing so it's possible to observe the service mesh itself!

Connecting the dots

It's crucial for you to be able to observe all your networking, as well as your applications inside the network. With the tools described so far, you've already got a lot of information, with useful and fancy service maps, success rates, and SLOs. But now it's time to understand what Istio does, and what you need to do to be able to observe your applications.

Applications -> Traces view in Kiali

As I explained in my previous article, the main component of distributed tracing is context propagation both between services and inside an application. That's the secret that enables you to track all the paths traveled by your requests into the application. The propagation mechanism for HTTP is defined in the W3C TraceContext specification in the form of standardized headers.

Because Istio injects a sidecar in your workloads running on the service mesh, it's able to intercept all communications going through it. This provides the ability to handle distributed tracing context propagation:

  • Is there a trace-id? Search for standardized distributed tracing headers
  • Propagate the header if present, or create a new one
  • Create its own span pertaining to this trace, so you can easily identify this transaction in the larger scheme of things
Distributed tracing header injection in Istio

Yes, the service mesh does the tracing for you! Except there’s still a missing piece in this puzzle: If you don’t touch your code, then your application isn't propagating the context.

Consider the example we used in the first picture to describe the mesh:

  • The sidecar of App 1 may receive (or not) a standardized tracing context. Upon reception, this proxy sidecar creates a span and attaches it to the existing trace-id. If a tracing context is not received, a new one is created. This context is sent to the next in line: The actual App 1
  • App 1 receives the message, processes it, and sends a request to App 2. Because we didn’t instrument App 1, no tracing headers were sent
  • App 2 proxy sidecar does not detect a trace-id. It creates a span, believing it to be the first span because there's been no context. This new trace is sent to the next application, and so on
Context propagation in a not instrumented application in a service mesh with distributed tracing

When all these spans are sent to the backend, they're not connected, so they're visualized as separate transactions instead of a single one divided into several steps.

To unleash the true value of distributed tracing, this problem needs to be resolved.

The service mesh is able to propagate or even create some traces and spans but  without propagating the context, the story is still disconnected. This is why the  “Traces view in Kiali” graph in the screenshot above is not very useful: There are two different sets of dots (nearly forming a solid line) when there should only be one. The ones at the top correspond to App 1, and the bottom ones to App 2. If you were propagating context, only one set of traces would have been shown.

But this still adds a lot of value. By leveraging the first steps in the service mesh, you can:

  • Easily deploy and configure a minimal observability stack
  • Generate useful spans
  • Have a view of the service mesh topology
  • Generate metrics for the traffic traveling between our services

Very useful tutorials can be found in the official Kiali docs.

Key takeaways

In the next article, we'll stop with the theory and demonstrate just how easy it can be to propagate the context! For now, here are four key takeaways about the service mesh and observability:

  1. There are great solutions to get started in modern observability.
  2. One of those approaches is to enable instrumentation in the service mesh.
  3. Propagating the context is critical to have a full understanding of a transaction.
  4. Kiali is a useful tool that helps you to manage and observe a service mesh and applications deployed into it, integrating perfectly with any modern observability stack.

关于作者

Jose is a Senior Product Manager at Red Hat OpenShift, with a focus on Observability and Sustainability. His work is deeply related to manage the OpenTelemetry, distributed tracing and power monitoring products in Red Hat OpenShift.

His expertise has been built from previous gigs as a Software Architect, Tech Lead and Product Owner in the telecommunications industry, all the way from the software programming trenches where agile ways of working, a sound CI platform, best testing practices with observability at the center have presented themselves as the main principles that drive every modern successful project.

With a heavy scientific background on physics and a PhD in Computational Materials Engineering, curiousity, openness and a pragmatic view are always expected. Beyond the boardroom, he is a C++ enthusiast and a creative force, contributing symphonic and electronic touches as a keyboardist in metal bands, when he is not playing videogames or lowering lap times at his simracing cockpit.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

按频道浏览

automation icon

自动化

有关技术、团队和环境 IT 自动化的最新信息

AI icon

人工智能

平台更新使客户可以在任何地方运行人工智能工作负载

open hybrid cloud icon

开放混合云

了解我们如何利用混合云构建更灵活的未来

security icon

安全防护

有关我们如何跨环境和技术减少风险的最新信息

edge icon

边缘计算

简化边缘运维的平台更新

Infrastructure icon

基础架构

全球领先企业 Linux 平台的最新动态

application development icon

应用领域

我们针对最严峻的应用挑战的解决方案

Original series icon

原创节目

关于企业技术领域的创客和领导者们有趣的故事