What it takes to run Stateful Workloads on Kubernetes: Thinking Beyond Just Storage
In the past few years, we have seen the emergence of close to 30 different CNCF certified vendors providing storage for stateful applications running on Kubernetes. With such broad vendor support, one would expect to see several organizations successfully running the most commonly used stateful workloads such as Splunk, ElasticSearch, Postgres, Cloudera, MongoDB, Oracle, etc on Kubernetes. But beyond some simple lab deployments, we rarely see production deployments of these workloads. Why?
Real challenges for running Stateful workloads on Kubernetes
If attaching storage to pods and keeping it available in the event of node or pod failure is all it takes to run Stateful workloads in production, then there are 30 different storage vendors offering Kubernetes’ CSI integrated storage stack. One can arbitrarily pick any one of them and they’d be all set. However, Stateful workloads are long-running and running them in production requires that their lifecycle is managed across (1) data protection, (2) data availability, (3) data security, and (4) performance SLA needs. These fall into the realm of Data Management services that one needs to provide to successfully run stateful workloads on Kubernetes. However, there are two challenges with the approach taken by most CSI storage solutions:
- Per-volume Data Management: When snapshots, clones, backups, etc, are done per-volume they are neither application nor crash-consistent. That is, when a database uses more than one volume (which is the case with most databases), each volume’s snapshot is taken at a slightly different time when the CSI solution iterates over each volume and takes a snapshot one at a time. This architectural approach, taken by most CSI storage solutions, means that in the event of rolling back a database to a previous snapshot the filesystem or the database have to perform time-consuming and expensive fsck or crash recovery operations before that rollback can complete. Similarly, at the time of taking a snapshot, there could be pending dirty data in the application’s caches that haven’t made it to disk. Taking uncoordinated per-volume, per-pod snapshot means that one partition of the database might have flushed its cache to disk where the other has not. This results in an application in-consistent snapshot. Using that snapshot to rollback, clone or backup would potentially result in loss of certain transactions or each partition of the database seeing a slightly different view of the data. The correct way to snapshot stateful workloads is to have the storage subsystem create a multi-volume consistency group and trigger a two-phase protocol to trigger pre and post-snapshot stages were pending dirty data across volumes/pods is flushed to disk and each volume has a logically similar timeframe in which they are snapshotted. Robin Storage stack uses this architectural approach because it was built from the ground up to understand distributed multi-volume databases. Unlike alternate offerings from other vendors who have built their snapshotting on top of single-volume, single-host BTFS technology, which does not offer a way to intercept IOs in a manner to create a distributed logical consistency group spanning both multiple-volumes and multiple-hosts on which those volumes are mounted.
- Lack of Application-centric Data Management adds complexity and hurts productivity: Storage-only data management, that is volume level snapshots, clones, backups, etc have existed in the storage industry for over 2 decades. Redoing them in the context of Kubernetes using the same architectural models for VMs is hardly innovative. Moreover, Kubernetes offers a way for developers to treat infrastructure, including storage, as invisible. Developers build their apps to be infrastructure agnostic and treat infrastructure as code. In this model, they want to programmatically define how their application should scale or be made highly available with the expectation that the underlying infrastructure knows how to self-assemble itself to meet the desired end-state for that application. This model has proven that it brings a level of operational simplicity for DevOps teams for their Stateless workloads that result in smaller teams managing a large number of stateless workloads with relative ease. Extending this model further to Stateful workloads, such as Databases and Big Data, where this same level of simplicity would yield even bigger results in terms of agility and efficiency is highly desirable. However, just snapshotting and cloning storage volumes is just the starting point. One also needs to snapshot the other constructs such as Application Metadata, Configuration, SLA policies, etc along with Volume snapshots to be able to very quickly rollback an entire application to a previous state, or clone it so that one has a fully functional running database from a previously taken snapshot. Doing volume only data management means that one still needs to re-configure and re-wire all the other components together using the volumes to be able to really use the application. This goes against the agility and efficiency expected of a platform like Kubernetes.
Therefore, what one needs is a CSI compliant storage stack that natively understands the notion of an application and performs data management on entire applications, not just storage volumes. Achieving this requires a radically different “application-aware” architecture at the storage layer. But there is a problem that one needs to address first — Kubernetes does not have a proper notion of an application that the storage stack can leverage.
Defining An “Application” in Kubernetes
Kubernetes provides many useful constructs such as Pods, Controllers, PersistentVolumes etc. to help you manage your applications. However, there is no construct for an “Application”, i.e. a single entity that consists of all the resources that form an application. Users have to manually map the resources to an application and manage each resource individually or through label selectors for any lifecycle operation. The lack of a proper Application construct in Kubernetes poses a problem when it comes to performing operations that encompass a group of resources.
Frameworks such as Helm and Operators try to solve this problem by packaging resources together, but they do not solve it beyond the initial deployment. For example, how would one snapshot, clone or backup an entire helm release that spans PersistVolumeClaims, Secrets, ConfigMaps, StatefulSet, Pods, Services etc? Or how about snapshotting a web-tier, app-tier and database-tier each deployed separately using 3 different kubectl manifest files or helm charts?
To facilitate this, Robin supercharges Kubernetes with the notion of an Application. An Application is a collection of Kubernetes resources that form a single unit on which a DevOps engineer can perform Data Management operations. Robin has the most extensible constructs to define an application. The following different types of applications are supported by Robin:
Robin Storage: Manage App and Data as a Single Entity
Robin Storage is a purpose-built container-native storage solution that brings advanced data management capabilities to Kubernetes. It is a CSI-compliant block storage solution with bare-metal performance that seamlessly integrates with Kubernetes-native administrative tooling such as Kubectl, Helm Charts, and Operators through standard APIs.
Robin Storage is application-aware. The “Application” construct, as defined above, provides the context for all Robin Storage operations. All lifecycle operations are performed by treating app and data as a single entity. For example, when you snapshot a MongoDB application, Robin Storage captures the entire application topology and its configuration (i.e., specs of Pod, Service, StatefulSet, Secrets, ConfigMaps, etc), and all data volumes (PersistentVolumeClaims) to create a point-in-time application checkpoint.
By creating the “App and Data” single entity, Robin Storage brings advanced data management capabilities to Kubernetes. It seamlessly integrates with Kubernetes-native administrative tooling such as Helm Charts to register Helm Releases as “Apps”. Robin Storage provides automated provisioning, point-in-time snapshots, backup and recovery, application cloning, QoS guarantee, and multi-cloud migration for stateful applications on Kubernetes.