Industrial-strength operability for cloud-native applications

One of the inevitabilities of moving to a microservices architecture is that you’ll start to encounter partial failures—failures of one or more instances of a service. These partial failures can quickly escalate to full-blown production outages. In this post, we’ll show how circuit breaking can be used to mitigate this type of failure, and we’ll give some example circuit breaking strategies and show how they affect success rate.

In March 2016 at Kubecon EU, I gave my my first public talk on linkerd. At the end of this talk, like most of the other 20+ talks I gave in 2016, I presented a high-level linkerd roadmap that aspirationally included HTTP/2 & gRPC integration. As we enter 2017, I’m pleased to say that we’ve reached this initial goal. Let me take this opportunity to summarize what I think is novel about these technologies and how they relate to the future of linkerd service meshes.

Staging new code before exposing it to production traffic is a critical part of building reliable, low-downtime software. Unfortunately, with microservices, the addition of each new service increases the complexity of the staging process, as the dependency graph between services grows quadratically with the number of services. In this article, we’ll show you how one of linkerd’s most powerful features, per-request routing, allows you to neatly sidestep this problem.

Risha Mars 6 January 2017 Read more »

Linkerd, our service mesh for cloud-native applications, needs to handle very high volumes of production traffic over extended periods of time. In this post, we’ll describe the load testing strategies and tools we use to ensure linkerd can meet this goal. We’ll review some of the problems we faced when trying to use popular load testers. Finally, we’ll introduce slow_cooker, an open source load tester written in Go, which is designed for long-running load tests and lifecycle issue identification.

We’re happy to announce that we’ve released linkerd 0.8.4! With this release, two important notes. First, Kubernetes and Consul support are now officially production-grade features—high time coming, since they’re actually already used widely in production. Second, this release features some significant improvements to linkerd’s HTTP/2 and gRPC support, especially around backpressure and request cancelation.

In this post we’ll show you how to use a service mesh of linkerd instances to handle ingress traffic on Kubernetes, distributing traffic across every instance in the mesh. We’ll also walk through an example that showcases linkerd’s advanced routing capabilities by creating a dogfood environment that routes certain requests to a newer version of the underlying application, e.g. for internal, pre-release testing.

Risha Mars 18 November 2016 Read more »

Beyond service discovery, top-line metrics, and TLS, linkerd also has a powerful routing language, called dtabs, that can be used to alter the ways that requests—even individual requests—flow through the application topology. In this article, we’ll show you how to use linkerd as a service mesh to do blue-green deployments of new code as the final step of a CI/CD pipeline.

Sarah Brown 4 November 2016 Read more »

In this article, we’ll show you how to use linkerd as a service mesh to add TLS to all service-to-service HTTP calls, without modifying any application code.

Alex Leong 24 October 2016 Read more »

In our recent post about linkerd on Kubernetes, A Service Mesh for Kubernetes, Part I: Top-line Service Metrics, observant readers noticed that linkerd was installed using DaemonSets rather than as a sidecar process. In this post, we’ll explain why (and how!) we do this.

Alex Leong 14 October 2016 Read more »

In our previous post, linkerd as a service mesh for Kubernetes, we showed you how to use linkerd on Kubernetes for drop-in service discovery and monitoring. In this post, we’ll show you how to get the same features on DC/OS, and discuss how this compares with DNS-based solutions like Mesos-DNS.