Understanding Service mesh and Istio

As a software developer, sometimes I find infrastructure networking quite confusing. In this article, I am going to have a deeper dive to understand about service mesh and istio. A service mesh is a dedicated infrastructure layer for handling service-to-service communication. It’s responsible for the reliable delivery of requests through the complex technology of services that comprise a modern, cloud native application.

In practice, the service mesh is typically implemented as an array of lightweight network proxies that are deployed alongside application code, without the application needing to be aware. Service mesh is not a mesh of services, rather a mesh of proxies into which services can plug, abstracting away the network layer. The service mesh fulfils three primary functions: enrichment of flow control, security and observability.

Control plane is the flow of management information and configuration from some central logical unit to the proxies that make up the mesh. Data plane is the flow of application data between microservices in the mesh, facilitated by the sidecar proxy.

One of the commonly used service mesh is isito. Istio is an open source project spawned by Google and IBM partnership with Lyft (Envoy proxy). It provides a transparent infrastructure layer to manage inter service communication on Kubernetes, abstracting elements like flow control and security away from engineers. It works by hijacking network traffic bound for a pod and proxying it through an intelligent layer 7 proxy (Envoy), mounted as a sidecar to the main container. It applies advanced routing and policy rules to manage traffic, improving security and resiliency and making use of mutual TLS communication and between peers and implementing various timeout and retry logic on network endpoints.

Immediate gains are to be made in the three core functions of a service mesh: flow control, security and obserability. However, the pattern is not without its downsides: added complexity, operational overhead with staff needing to become well acquainted. Besides, there is a latency at the link level, it is introduced at the order of milliseconds that may not be acceptable. It requires platform adaptation as it doesn’t necessarily just work out of the box. It has a large number of proxies with extra resource usage, such as memory and CPU.

Istio control plane is a board term corresponding to all functions of istio concerned with the configuration of the data plane, management and implementation of istio’s rules based API, and ensuring that workloads in the mesh are kept up to date.

The control plane is run as a deployment, istiod, that fulfills the following functions: 1. Service discovery and proxy configuration with istio pilot (discovery). 2. Proxy bootstrap, lifecycle management and certificate rotation with istio polit (agent). 3. Certificate generation, validation and rotation with istio-citadel. 4. API resource and configuration validation with istio-gallery. 5. Istio proxy (Envoy) injection. In older versions of istio the pilot, citadel and galley functions were spliited into their own deployments but have since been condensed into one deployment in istiod.

The istio data plane consists simply of all Envoy proxy instances running on clusters that communicate with the istio control plane. Once proxies are configured and synchonised by the control plane they manage all relevant inbound and outbound network traffic to pod applying advanced Layer 7 routing and policies rules to network traffic. All routing and policy configuration is configured through the control planes’ Rules Based API which it then compiles into Envoy specific configuration before pushing over gRPC streams to the proxy instances. The compilation involves reducing the abstract model stored in the control plane down to Envoy Filters.

In order to be entirely transparent to application developers, Istio needs to process some or all inbound/outbound network traffic without developers needing to make any changes to their application. This is achieved via traffic hijacking, Istio manipulates the IP tables of all pods in the service mesh and effectively reroutes all traffic to Envoy. All inbound traffic is routed to the proxy port 15006 and all outbound traffic is routed to the proxy port 15001. There are two different ways this manipulation is done, the first is through an init container (istio-init) and the second is through Istio’s Containers Networking Interface (CNI) plugin.

Flow control capabilities in Istio can be divided in three ways: 1. Request routing and traffic diversion, i.e. canary rollouts of new services. 2. Resilience with circuit breakers, timeouts, and retries. 3. Debug with traffic mirroring and fault injection. Istio exposes a number of Rule based APIs, managed by the control plane, by which a user/operator can configure traffic management in the mesh.

Flow control destination rules are applied after virtual services rules are evaluated so that they apply to traffic real destinations, in the case of a rewrite, or subnet etc. They apply a traffic policy to a host service, applicable based on a number of matching subnets. Traffic policies can be on load balancing strategy, connection pool settings, outlier detection, tls settings and port specific settings.

Service Entries are used to manually add a host to Istio’s internal service registry, such as RDS instances running outside the cluster, which allow enrichment of traffic to this destination, allowing for timeouts / retries etc. Sidecars are a way to define the set of ports and protocols that a workload proxy accepts, or to limit the set of services that a given Envoy can reach.

Istio exposes a Security API (formerly authentication) to configure policy at varing levels of granularity. Three modes of TLS communication that Istio alloys are: 1. DISABLE with strictly plaintext. 2. PERMISSIVE with connection can be either plaintext or mTLS. 3. STRICT with connection is mTLS and client cert must be presented. Default is PERMISSIVE if unset. If looking through Istio request metrics, you can see that a request was proxyd over mTLS by looking at the ‘Connection Security Policy’ label.

Overall, this is a brief deep dive about service mesh and istio. There is still a lot more to explore. Thank you for reading this articule and let’s keep learning!