Jepsen result for etcd 3.4.3
alexis richardson
Big round of applause to everyone involved in this. Please do share
any thoughts back with CNCF on what can be done to help etcd continue to be a trusted component of the K8s environment. One side question from me -- I think it would be good to understand more about recommended etcd set-ups at different scales of k8s cluster (10, 50, 150, 500+ nodes) and how to deal with n/w partitions. alexis On Thu, Jan 30, 2020 at 3:28 PM Brandon Philips <brandon.philips@...> wrote:
|
|
Brandon Philips <brandon.philips@...>
On Thu, Jan 30, 2020 at 7:40 AM Alexis Richardson <alexis@...> wrote: One side question from me -- I think it would be good to understand There are many variables on how much workload an API server puts on etcd but there is a ballpark guess doc here: What do you mean by dealing with network partitions? Does the failure guide section on network partitions help? https://etcd.io/docs/v3.4.0/op-guide/failures/ Thanks, Brandon
|
|
Bart Smykla <bartek@...>
I’m happy to see the report. Great job!
Bart Smykla
From: "kubernetes-dev@..." <kubernetes-dev@...> on behalf of Brandon Philips <brandon.philips@...>
On Thu, Jan 30, 2020 at 7:40 AM Alexis Richardson <alexis@...> wrote:
There are many variables on how much workload an API server puts on etcd but there is a ballpark guess doc here:
What do you mean by dealing with network partitions? Does the failure guide section on network partitions help? https://etcd.io/docs/v3.4.0/op-guide/failures/
Thanks,
Brandon --
|
|
alexis richardson
Brandon,
On Fri, Jan 31, 2020 at 1:00 AM Brandon Philips <brandon.philips@...> wrote: This is great, I don't know how I missed it before. QQ: Are your example configs each using only one VM per cluster? That is not totally clear from the docs. What do you mean by dealing with network partitions? Does the failure guide section on network partitions help? https://etcd.io/docs/v3.4.0/op-guide/failures/Thank-you. I've seen this --- I guess what I am asking about is practical guidance on how often to expect different types of network failures, and what to expect, how long etc. I understand this is hard to do in a general manner. alexis
|
|
Brandon Philips <brandon.philips@...>
On Fri, Jan 31, 2020 at 1:35 AM Alexis Richardson <alexis@...> wrote: On Fri, Jan 31, 2020 at 1:00 AM Brandon Philips The example configs are for the etcd nodes needed to support a Kubernetes cluster of X size. Because etcd doesn't scale horizontally the guide, I believe, covers recommended sized clusters up to 5 nodes. Happy to take a PR to clarify something though. > What do you mean by dealing with network partitions? Does the failure guide section on network partitions help? https://etcd.io/docs/v3.4.0/op-guide/failures/ That all depends on the network not on etcd. All etcd can guarantee is it will tolerate a network partition of arbitrary length of time and will only allow writes on the side of the partition where the majority of members can still talk to each other. Once the partition recovers health checks As far as tuning for a particular network see the tuning guide: https://etcd.io/docs/v3.4.0/tuning/ Thank You, Brandon
|
|
alexis richardson
OK - thanks! If we see anything useful in the user base we'll file a PR.
On Fri, Jan 31, 2020 at 4:11 PM Brandon Philips <brandon.philips@...> wrote:
|
|
Bart Smykla <bartek@...>
It's great to read the report! Good job and thank you for everyone involved!
Bart Smykla On 31/01/2020, 18:17, "kubernetes-dev@... on behalf of Alexis Richardson" <kubernetes-dev@... on behalf of alexis@...> wrote: OK - thanks! If we see anything useful in the user base we'll file a PR. On Fri, Jan 31, 2020 at 4:11 PM Brandon Philips <brandon.philips@...> wrote: > > On Fri, Jan 31, 2020 at 1:35 AM Alexis Richardson <alexis@...> wrote: >> >> On Fri, Jan 31, 2020 at 1:00 AM Brandon Philips >> <brandon.philips@...> wrote: >> > >> > On Thu, Jan 30, 2020 at 7:40 AM Alexis Richardson <alexis@...> wrote: >> >> >> >> One side question from me -- I think it would be good to understand >> >> more about recommended etcd set-ups at different scales of k8s cluster >> >> (10, 50, 150, 500+ nodes) and how to deal with n/w partitions. >> > >> > >> > There are many variables on how much workload an API server puts on etcd but there is a ballpark guess doc here: >> > >> > https://etcd.io/docs/v3.4.0/op-guide/hardware/ >> >> This is great, I don't know how I missed it before. QQ: Are your >> example configs each using only one VM per cluster? That is not >> totally clear from the docs. > > > The example configs are for the etcd nodes needed to support a Kubernetes cluster of X size. Because etcd doesn't scale horizontally the guide, I believe, covers recommended sized clusters up to 5 nodes. Happy to take a PR to clarify something though. > >> >> > What do you mean by dealing with network partitions? Does the failure guide section on network partitions help? https://etcd.io/docs/v3.4.0/op-guide/failures/ >> >> Thank-you. I've seen this --- I guess what I am asking about is >> practical guidance on how often to expect different types of network >> failures, and what to expect, how long etc. I understand this is hard >> to do in a general manner. > > > That all depends on the network not on etcd. All etcd can guarantee is it will tolerate a network partition of arbitrary length of time and will only allow writes on the side of the partition where the majority of members can still talk to each other. Once the partition recovers health checks > > As far as tuning for a particular network see the tuning guide: https://etcd.io/docs/v3.4.0/tuning/ > > Thank You, > > Brandon > > -- You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@.... To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAOSi4U7f-g3FbdrtxhDg9jJdE38uKt_cZfPud%2BPKf_9sxhvSqQ%40mail.gmail.com.
|
|