In 2015, two months before the release of docker 1.9 which included container networking, we had a need for a distributed metadata storage solution. The inner working of libnetwork required informations to be accessible to docker engines in a distributed fashion, in order to discover and manipulate libnetwork objects (create/update/delete/discover overlay networks).

Considering the time constraints, we discussed and settled on a few possibilities:

  • Embed etcd (an existing distributed key/value store) directly inside the engine.

    The overkill solution, we thought about embedding the whole k/v store, with commands to bootstrap etcd hidden behind docker flags. The idea was not very appealing because the dependency was huge.

  • Create a distributed object store with the etcd's core Raft library (which is the central piece making everything work together).

    The elegant solution. Time consuming, but this was also the most humanly bearable considering the very few engineers contributing to the existing Swarm store backends. Having a unified distributed store accessible to a docker engine would have allowed to contain the maintenance effort on one component, rather than plenty. This also had the advantage of having a tiny footprint compared to embedding etcd.

  • Use the Swarm (legacy) codebase to access existing key/value stores like Consul, etcd and zookeeper through their API. The goal being to create a single unifying API to rule them all.

    The not so elegant solution. Technically feasible and the most time efficient, also maintaining compatibility with the same key/value stores Swarm was already compatible with (to handle node discovery in the cluster). Technical issues would arise though: key namespacing and formatting was a problem and watches were handled very differently from a key/value store to another.

I was personally strongly leaning towards creating our own distributed object store using etcd's raft library (which at a later stage finally happened with Swarmkit). This was interesting at the time for two reasons:

  • The Raft subsystem of etcd was recently overhauled by a contractor at CoreOS (Blake Mizerany). As a result, etcd was much more stable than with early versions, which were tainted by raft instabilities. The library was slightly more complex to use than simply implementing raft from scratch, but projects such as CockroachDB started to back the library and contribute to it. It had good chances to be increasingly more usable and stable in the future. An alternative was Consul's Raft library, but it was gathering very few contributions at the time.

  • The footprint of the library is contained compared to the full etcd codebase, which is large and includes different API versions, deprecated flags, etc.

The rationale was convenience: it is hard and unpractical to maintain multiple backends, especially with so few engineers taking care of that part at the time. Ultimately, the idea was not well received. Time constraints and lack of expertise on the topic (including myself) was making everybody feel uneasy about introducing such a critical piece to the engine.

Embedding etcd was not well received either for obvious reasons, so we settled on developing an abstraction library with the existing Swarm codebase (from the Consul, etcd and Zookeeper individual pieces). libkv was born.

libnetwork and libkv were quite controversial. Articles started appearing soon after the docker 1.9 release, mainly criticizing their API as well as the fact they were relying on Strong Consistency guarantees (using Consensus protocols).

For example: see Why Kubernetes doesn't use libnetwork or the Weave docker networking technical deep dive.

In retrospect, both articles had valid points:

  • The API for libkv was quite low-level. Of course not for simple calls such as Put/Get/etc. but it was for Lock/Watch and the like. It also didn't make much sense for Kubernetes to use libnetwork: it was unpractical to rely on various Key/Value stores and their choice to stick with etcd proved to be a key part to the success of the project.

  • Using Strong Consistency is often unnecessary. It trades off Availability for Consistency, which means that the application/data could both be rendered unavailable due to network partitions or critical machine failures. Use Consensus protocols only when necessary.

The interesting part was that even though libkv was used for specific tasks such as metadata-management, distributed watches and locks, Leader Election, etc. projects started using it for the simple "Put/Get" interface to access multiple key/value stores. A perfect example of such a project is Traefik. The most complex and "low-level" parts of the library were left unused.

In the meantime, Docker finally got an internal Distributed Store using etcd's raft library. libkv was deprecated (although no mention of it was made) and only used for backward compatibility with Docker Swarm legacy and libnetwork during the transition to the new store.

So today, I decided to give it a small kick and merge contributions such as a new shiny Redis backend (thanks to hsinhoyeh) as well as other patches onto my fork. I'm not sure how all this is going to evolve, but I'm willing to review and merge new contributions. The logic is that as long as this is useful to somebody, then it's useful to keep it alive, even reviewing and merging contributions once in a while.

If you are one of these projects using libkv but relying on a private or public fork, let me know things you'd like to include or fix.

The fork is located here.