Introducing zetcd

philips · on May 20, 2017

A big use case of this that we are thinking about is enabling people to use the etcd Operator[1], which makes it simple to run etcd clusters on Kubernetes, to back their ZooKeeper applications.

The neat thing about the etcd Operator is you can define a cluster and the etcd Operator takes care of normal operations by using the Kubernetes API.

  apiVersion: "etcd.coreos.com/v1beta1"
  kind: "Cluster"
  metadata:
    name: "example-etcd-cluster"
  spec:
    size: 5
    version: "3.1.8"

Pretty neat!

Anyways, the zetcd project is still super young but would love more folks to try it out. As the post says folks have already tried using Kafka, Mesos, and others.

[1] https://coreos.com/blog/introducing-the-etcd-operator.html

agibsonccc · on May 20, 2017

Who are you targeting with this? The "hesitant" zookeeper folks that already depend on etcd? Are you hoping to unseat cdh here? Pardon the naive question here - I never bump in to k8s selling to traditional enterprise hadoop customers.

I'd also never pick kubernetes for my "from scratch" cluster due to already being reliant on the JVM stack. I actually like the idea of giving an IT department that already understands zookeeper a mesos cluster with DC/OS.

That being said - k8s has a ton of momentum but it seems to be mainly with startups or maybe niche teams (prove me wrong here?) outside of google. It would be great to understand what you guys are looking at for things like this. Right now it feels like k8s and a lot of the other startups in this space like pachyderm are trying to compete with the hadoop ecosystem (which is great! competition forces innovation which is good for the ecosystem as a whole)

jaz46 · on May 20, 2017

Kubernetes is actually getting a solid amount of large-tech and early adopter enterprise deployment. That's still pretty nascent, but it's picking up quickly. Happy to discuss in more details offline the adoption we're seeing.

The reason you don't bump into k8s while selling to Hadoop users is that Hadoop isn't something you'd run on a container-based stack (at least not right now and IMO it wont be). There are lots of Hadoop users who run containers for their application infra (as opposed to data infra). Pachyderm's whole pitch is that containerized data infra can be really powerful and that enterprises will want to unify their stack to all be containerized and k8s is THE answer for the orchestration layer.

P.S. Despite all my opinions above, I actually agree with your initial question around who zetcd is actually targeting. I don't have a clear picture of that.

bogomipz · on May 21, 2017

>"Pachyderm's whole pitch is that containerized data infra can be really powerful and that enterprises will want to unify their stack to all be containerized and k8s is THE answer for the orchestration layer."

Doesn't Pachyderm predate K8s though? Is this a recent development? Have they shifted focus then?

agibsonccc · on May 20, 2017

Sure! Feel free to reach out. I'm just commenting on a wider trend I'm seeing with parallels to the hadoop ecosystem popping up written in go that are container based. I agree you don't tend to run hadoop and co on containers. We tend to see the app side as well though. We do both microservices as well as hadoop infra.

philips · on May 20, 2017

Non-startup companies are adopting Kubernetes. You can see some of their stories on the Tectonic Summit website[1]: Ticketmaster, eBay, Concur, SAP, BNY Mellon, MLS, etc.

I will try to reply in the morning in depth on the other points.

[1] https://coreos.com/summit/

agibsonccc · on May 20, 2017

Appreciated! I'm wondering if these are just one off teams though? We have "enterprise adoption" for our software but it doesn't mean company wide. One thing hadoop has been able to do is actually get deployed at scale. You can have small teams within companies using k8s for their apps. Some other parts of these companies can be too conservative to actually deploy new tech. The "nascent" adoption usually means innovation labs and 1 off deployments for certain teams.

What I'm trying to gauge here is k8s as an actual "company wide platform". I would love for it to be something I can depend on to be at an enterprise in a few years. It's great technology but still feels like it needs to be beaten up a bit yet.

smarterclayton · on May 20, 2017

I work on OpenShift (which is k8s with tenancy) and there's a good mix of "production apps", "dense development clusters", and "single app experimentation" out there. Like all things about the future, it's here, just not evenly distributed.

You'd be surprised how many services you interact with on a daily basis are running on k8s (whole or in part).

It's still early, and many of the adopters today in large companies just happened to be making modernization efforts of their app-dev / app-deploy pipelines and moved to k8s or OpenShift. That said, it's certainly not ubiquitous yet.

agibsonccc · on May 20, 2017

This sounds more palatable to me. I definitely know it has traction and I wouldn't be surprised to see it powering quite a few of the bigger services but it still feels like a big part of the earlyadopter phase yet. This is line with what I have seen. I know it's "out there" but it's not exactly "RHEL" yet ;).

manojlds · on May 20, 2017

So I can run a Kubernetes cluster on Mesos and have the Zookeeper for Mesos deployed on the Kubernetes cluster using the etcd operator and zetcd

Joking, of course.

koolba · on May 20, 2017

The funny thing about the situation you describe is that there are real world examples of similar circular dependencies.

I recall GitHub having an issue like that where their build pipeline used Bower which is hosted on GitHub. When shit hit the fan and a build broke the site, they could build the "fix" as Bower didn't work.

jzelinskie · on May 20, 2017

My own experience working at CoreOS is that many of our projects exploit self-referentiality as it's a particularly useful property.

Off the top of my head:

- Quay.io, our registry service, is built and deployed by itself

- Clair, our static analysis tool for detecting security vulnerabilities, analyzes itself

- Tectonic, our enterprise Kubernetes distro, is "self-driving" and manages itself

- discovery.etcd.io, a service we run to make it easier to bootstrap new etcd quorums, is just a quorum of etcd nodes

takeda · on May 21, 2017

I think you are missing the point. It's like running docker registry on kubernetes.

If for some reason the cluster goes down, bootstrapping it might be a bit difficult.

koolba · on May 21, 2017

Yes that's the exact point I was trying to make. Things are fine until they're not, at which point it's surgery and tribal knowledge to fix them.

ibotty · on May 21, 2017

Of course you are joking, but you should keep in mind, that _in the real world_ you would operate the etcd kubernetes needs also via etcd operator!

philips · on May 20, 2017

Oh, and here is a video explanation of the etcd Operator https://youtu.be/Uf7PiHXqmnw?t=11

Randgalt · on May 20, 2017

I did some quick testing with Apache Curator (note: I'm the main author of Curator) and it looks like zetcd isn't implementing the create2 opcode and several others. If the goal is to really be ZK compatible it has a long way to go. I'm not sure who the target of this is. But, I'll keep following and try to get the Curator tests to run when it's further along.

philips · on May 20, 2017

Issue filed: https://github.com/coreos/zetcd/issues/49

tyingq · on May 19, 2017

Interesting. Maybe hashicorp should release an etcd compatible layer for Consul. Or CockroachDB even. Both have one thing etcd does not...inbuilt WAN support.

philips · on May 20, 2017

etcd can cross WAN links with tuning for the expected latencies. Tuning latencies is required to ensure the leader election algorithms know when to trigger a failure[1].

What is your use case?

[1] https://coreos.com/etcd/docs/latest/tuning.html

tyingq · on May 20, 2017

Yes, you can sort of detune the whole cluster. That's not quite the same as Consul's and CockroachDB's specific WAN awareness. Those two take different approaches, but do specifically understand and compensate.

>What is your use case?

Contract work, so use case varies. I agree that etcd is often the right answer.

philips · on May 20, 2017

Right, etcd's entire focus is on being a consensus database for distributed systems needing coordination, locking, etc. So, eventually consistent WAN replication hasn't really been a focus.

I do think this sort of cross-cluster key replication is useful and we offer it as a userspace external tool called make-mirror[1].

[1] https://github.com/coreos/etcd/blob/master/etcdctl/README.md...

tyingq · on May 20, 2017

Worth noting that CockroachDB isn't using an eventually consistent model. Yes, it's not the same thing as etcd, but I can see some potential use case overlap.

ideal0227 · on May 20, 2017

There is some overlapping. But we should choose solution wisely :P.

Here is a doc [https://github.com/coreos/etcd/blob/master/Documentation/lea...] comparing etcd with other systems, including CockroachDB.

I work on etcd.

jacksonnic · on May 20, 2017

We might have just been talking about that

tyingq · on May 20, 2017

"We" as in Hashicorp?

jacksonnic · on May 23, 2017

Yeah, few people internally would like to tackle this

callumjones · on May 19, 2017

Great idea, we only run Zookeeper because of Kafka but everything else is in Consul so the idea of this existing is pretty neat.

wink · on May 20, 2017

Same here, Storm and Kafka - everything else is in consul - and 2 single sources of truth.. well :)

ninjakeyboard · on May 20, 2017

So how does zetcd handle the zookeeper session? It's in zetcd and then the "ephemeral nodes" are removed from etcd if the zk client session expires? This is handled by the proxy?

This seems to be the major design compatibility item between the two is that the zookeeper protocol has a session state and so supports ephemeral nodes. If a gc pauses the jvm for a long time the client session will expire. In etcd there is ttl instead (zookeeper always lacked ttl as it wasn't needed)

philips · on May 20, 2017

A Go routine is spawned which runs the etcd client's KeepAlive method to tie a session to a lease in the proxy. The code is here: https://github.com/coreos/zetcd/blob/d33e3b836a2a2de8a8ec077...

Disclaimer: The person who wrote this is AFK at the moment so I might be completely wrong ;)

Randgalt · on May 20, 2017

> zookeeper always lacked ttl

ZooKeeper 3.5.3 supports TTLs

moderation · on May 20, 2017

Another reason to use zetcd over Zookeeper is security. I don't believe it is possible yet to use TLS for Zookeeper-to-Zookeeper communication (clients can connnect to Zookeeper using TLS). The Jira covering this feature [0]. This will increasingly be a problem for those running Zookeeper for Mesos, Kafka etc. from a security and risk point of view.

[0] https://issues.apache.org/jira/browse/ZOOKEEPER-236

batbomb · on May 19, 2017

A few years ago I tried to reimplement the server connections directly in Zookeeper to be HTTP connections and a REST-like API and try to use long-polling and some other things. I got some code written, but no prototype up and I had to put it aside. This was initially before etcd was really going (I think it was at version 0.2 when I started the work). I had always hoped somebody else would try the same thing, but no such luck.