Linkerd’s role as a service mesh makes it a great source of data around system performance and runtime behavior. This is especially true in polyglot or heterogeneous environments, where instrumenting each language or framework can be quite difficult. Rather than instrumenting each of your apps directly, the service mesh can provide a uniform, standard layer of application tracing and metrics data, which can be collected by systems like Zipkin and Prometheus.
We’re happy to announce that, one year after version 0.1.0 was released, Linkerd has processed over 100 billion production requests in companies around the world. Happy birthday, Linkerd! Let’s take a look at all that we’ve accomplished over the past year.
Today we’re happy to release Linkerd 0.9.0, our best release yet! This release is jam packed with internal efficiency upgrades and major improvements to the admin dashboard. We also took this opportunity to make some backwards incompatible changes to simplify Linkerd configuration. See the bottom of this post for a detailed guide on what changes you’ll need to make to your config to upgrade from 0.8.* to 0.9.0.
The development of distributed systems is full of strange paradoxes. The reasoning we develop as engineers working on a single computer can break down in unexpected ways when applied to systems made of many computers. In this article, we’ll examine one such case—how the introduction of an additional network hop can actually decrease the end-to-end response time of a distributed system.
Cross-posted on the Cloud Native Computing Foundation blog.
Today, the Cloud Native Computing Foundation’s (CNCF) Technical Oversight Committee (TOC) voted to accept linkerd as its fifth hosted project, alongside Kubernetes, Prometheus, OpenTracing and Fluentd.
One of the inevitabilities of moving to a microservices architecture is that you’ll start to encounter partial failures—failures of one or more instances of a service. These partial failures can quickly escalate to full-blown production outages. In this post, we’ll show how circuit breaking can be used to mitigate this type of failure, and we’ll give some example circuit breaking strategies and show how they affect success rate.
Staging new code before exposing it to production traffic is a critical part of building reliable, low-downtime software. Unfortunately, with microservices, the addition of each new service increases the complexity of the staging process, as the dependency graph between services grows quadratically with the number of services. In this article, we’ll show you how one of linkerd’s most powerful features, per-request routing, allows you to neatly sidestep this problem.
Linkerd, our service mesh for cloud-native applications, needs to handle very high volumes of production traffic over extended periods of time. In this post, we’ll describe the load testing strategies and tools we use to ensure linkerd can meet this goal. We’ll review some of the problems we faced when trying to use popular load testers. Finally, we’ll introduce slow_cooker, an open source load tester written in Go, which is designed for long-running load tests and lifecycle issue identification.
We’re happy to announce that we’ve released linkerd 0.8.4! With this release, two important notes. First, Kubernetes and Consul support are now officially production-grade features—high time coming, since they’re actually already used widely in production. Second, this release features some significant improvements to linkerd’s HTTP/2 and gRPC support, especially around backpressure and request cancelation.