Jepsen result for etcd 3.4.3


alexis richardson
 

Big round of applause to everyone involved in this. Please do share
any thoughts back with CNCF on what can be done to help etcd continue
to be a trusted component of the K8s environment.

One side question from me -- I think it would be good to understand
more about recommended etcd set-ups at different scales of k8s cluster
(10, 50, 150, 500+ nodes) and how to deal with n/w partitions.

alexis

On Thu, Jan 30, 2020 at 3:28 PM Brandon Philips
<brandon.philips@...> wrote:

Hello Kubernetes Dev-

In the last few weeks the etcd maintainers have been working with Kyle at Jepsen to test the project. Please read the blog post from Xiang for the full results: https://etcd.io/blog/jepsen-343-results/

Cheers,

Brandon

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAHHNuYddgTeF_Mnez0%3D2TdLhOvT5UoNvzh3WAeP_ytnDS1nohQ%40mail.gmail.com.


Brandon Philips <brandon.philips@...>
 

On Thu, Jan 30, 2020 at 7:40 AM Alexis Richardson <alexis@...> wrote:
One side question from me -- I think it would be good to understand
more about recommended etcd set-ups at different scales of k8s cluster
(10, 50, 150, 500+ nodes) and how to deal with n/w partitions.

There are many variables on how much workload an API server puts on etcd but there is a ballpark guess doc here:


What do you mean by dealing with network partitions? Does the failure guide section on network partitions help? https://etcd.io/docs/v3.4.0/op-guide/failures/

Thanks,

Brandon


Bart Smykla <bartek@...>
 

I’m happy to see the report. Great job!

 

Bart Smykla

 

From: "kubernetes-dev@..." <kubernetes-dev@...> on behalf of Brandon Philips <brandon.philips@...>
Date: Friday, 31 January 2020 at 02:00
To: Alexis Richardson <alexis@...>
Cc: Alexis Richardson via cncf-toc <cncf-toc@...>, Kubernetes developer/contributor discussion <kubernetes-dev@...>
Subject: Re: Jepsen result for etcd 3.4.3

 

On Thu, Jan 30, 2020 at 7:40 AM Alexis Richardson <alexis@...> wrote:

One side question from me -- I think it would be good to understand
more about recommended etcd set-ups at different scales of k8s cluster
(10, 50, 150, 500+ nodes) and how to deal with n/w partitions.

 

There are many variables on how much workload an API server puts on etcd but there is a ballpark guess doc here:

 

 

What do you mean by dealing with network partitions? Does the failure guide section on network partitions help? https://etcd.io/docs/v3.4.0/op-guide/failures/

 

Thanks,

 

Brandon

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAHHNuYeQV%2B3EMVfxbJUPbnn_nhbBZQd51Lw%2BxT3F_XXwU%2BPspw%40mail.gmail.com.


alexis richardson
 

Brandon,


On Fri, Jan 31, 2020 at 1:00 AM Brandon Philips
<brandon.philips@...> wrote:

On Thu, Jan 30, 2020 at 7:40 AM Alexis Richardson <alexis@...> wrote:

One side question from me -- I think it would be good to understand
more about recommended etcd set-ups at different scales of k8s cluster
(10, 50, 150, 500+ nodes) and how to deal with n/w partitions.

There are many variables on how much workload an API server puts on etcd but there is a ballpark guess doc here:

https://etcd.io/docs/v3.4.0/op-guide/hardware/
This is great, I don't know how I missed it before. QQ: Are your
example configs each using only one VM per cluster? That is not
totally clear from the docs.


What do you mean by dealing with network partitions? Does the failure guide section on network partitions help? https://etcd.io/docs/v3.4.0/op-guide/failures/
Thank-you. I've seen this --- I guess what I am asking about is
practical guidance on how often to expect different types of network
failures, and what to expect, how long etc. I understand this is hard
to do in a general manner.

alexis


Brandon Philips <brandon.philips@...>
 

On Fri, Jan 31, 2020 at 1:35 AM Alexis Richardson <alexis@...> wrote:
On Fri, Jan 31, 2020 at 1:00 AM Brandon Philips
<brandon.philips@...> wrote:
>
> On Thu, Jan 30, 2020 at 7:40 AM Alexis Richardson <alexis@...> wrote:
>>
>> One side question from me -- I think it would be good to understand
>> more about recommended etcd set-ups at different scales of k8s cluster
>> (10, 50, 150, 500+ nodes) and how to deal with n/w partitions.
>
>
> There are many variables on how much workload an API server puts on etcd but there is a ballpark guess doc here:
>
> https://etcd.io/docs/v3.4.0/op-guide/hardware/

This is great, I don't know how I missed it before.  QQ: Are your
example configs each using only one VM per cluster?  That is not
totally clear from the docs.

The example configs are for the etcd nodes needed to support a Kubernetes cluster of X size. Because etcd doesn't scale horizontally the guide, I believe, covers recommended sized clusters up to 5 nodes. Happy to take a PR to clarify something though.
 
> What do you mean by dealing with network partitions? Does the failure guide section on network partitions help? https://etcd.io/docs/v3.4.0/op-guide/failures/

Thank-you.  I've seen this --- I guess what I am asking about is
practical guidance on how often to expect different types of network
failures, and what to expect, how long etc.  I understand this is hard
to do in a general manner.

That all depends on the network not on etcd. All etcd can guarantee is it will tolerate a network partition of arbitrary length of time and will only allow writes on the side of the partition where the majority of members can still talk to each other. Once the partition recovers health checks 

As far as tuning for a particular network see the tuning guide: https://etcd.io/docs/v3.4.0/tuning/

Thank You,

Brandon

 


alexis richardson
 

OK - thanks! If we see anything useful in the user base we'll file a PR.

On Fri, Jan 31, 2020 at 4:11 PM Brandon Philips
<brandon.philips@...> wrote:

On Fri, Jan 31, 2020 at 1:35 AM Alexis Richardson <alexis@...> wrote:

On Fri, Jan 31, 2020 at 1:00 AM Brandon Philips
<brandon.philips@...> wrote:

On Thu, Jan 30, 2020 at 7:40 AM Alexis Richardson <alexis@...> wrote:

One side question from me -- I think it would be good to understand
more about recommended etcd set-ups at different scales of k8s cluster
(10, 50, 150, 500+ nodes) and how to deal with n/w partitions.

There are many variables on how much workload an API server puts on etcd but there is a ballpark guess doc here:

https://etcd.io/docs/v3.4.0/op-guide/hardware/
This is great, I don't know how I missed it before. QQ: Are your
example configs each using only one VM per cluster? That is not
totally clear from the docs.

The example configs are for the etcd nodes needed to support a Kubernetes cluster of X size. Because etcd doesn't scale horizontally the guide, I believe, covers recommended sized clusters up to 5 nodes. Happy to take a PR to clarify something though.


What do you mean by dealing with network partitions? Does the failure guide section on network partitions help? https://etcd.io/docs/v3.4.0/op-guide/failures/
Thank-you. I've seen this --- I guess what I am asking about is
practical guidance on how often to expect different types of network
failures, and what to expect, how long etc. I understand this is hard
to do in a general manner.

That all depends on the network not on etcd. All etcd can guarantee is it will tolerate a network partition of arbitrary length of time and will only allow writes on the side of the partition where the majority of members can still talk to each other. Once the partition recovers health checks

As far as tuning for a particular network see the tuning guide: https://etcd.io/docs/v3.4.0/tuning/

Thank You,

Brandon


Bart Smykla <bartek@...>
 

It's great to read the report! Good job and thank you for everyone involved!

Bart Smykla

On 31/01/2020, 18:17, "kubernetes-dev@... on behalf of Alexis Richardson" <kubernetes-dev@... on behalf of alexis@...> wrote:

OK - thanks! If we see anything useful in the user base we'll file a PR.

On Fri, Jan 31, 2020 at 4:11 PM Brandon Philips
<brandon.philips@...> wrote:
>
> On Fri, Jan 31, 2020 at 1:35 AM Alexis Richardson <alexis@...> wrote:
>>
>> On Fri, Jan 31, 2020 at 1:00 AM Brandon Philips
>> <brandon.philips@...> wrote:
>> >
>> > On Thu, Jan 30, 2020 at 7:40 AM Alexis Richardson <alexis@...> wrote:
>> >>
>> >> One side question from me -- I think it would be good to understand
>> >> more about recommended etcd set-ups at different scales of k8s cluster
>> >> (10, 50, 150, 500+ nodes) and how to deal with n/w partitions.
>> >
>> >
>> > There are many variables on how much workload an API server puts on etcd but there is a ballpark guess doc here:
>> >
>> > https://etcd.io/docs/v3.4.0/op-guide/hardware/
>>
>> This is great, I don't know how I missed it before. QQ: Are your
>> example configs each using only one VM per cluster? That is not
>> totally clear from the docs.
>
>
> The example configs are for the etcd nodes needed to support a Kubernetes cluster of X size. Because etcd doesn't scale horizontally the guide, I believe, covers recommended sized clusters up to 5 nodes. Happy to take a PR to clarify something though.
>
>>
>> > What do you mean by dealing with network partitions? Does the failure guide section on network partitions help? https://etcd.io/docs/v3.4.0/op-guide/failures/
>>
>> Thank-you. I've seen this --- I guess what I am asking about is
>> practical guidance on how often to expect different types of network
>> failures, and what to expect, how long etc. I understand this is hard
>> to do in a general manner.
>
>
> That all depends on the network not on etcd. All etcd can guarantee is it will tolerate a network partition of arbitrary length of time and will only allow writes on the side of the partition where the majority of members can still talk to each other. Once the partition recovers health checks
>
> As far as tuning for a particular network see the tuning guide: https://etcd.io/docs/v3.4.0/tuning/
>
> Thank You,
>
> Brandon
>
>

--
You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CAOSi4U7f-g3FbdrtxhDg9jJdE38uKt_cZfPud%2BPKf_9sxhvSqQ%40mail.gmail.com.