Checking In with etcd folks! (was Re: [k8s-steering] Worrying state of Etcd community)


Davanum Srinivas
 

Marek, Sahdev, 
It has been a few months since this email about lack of enough hands to help with etcd. Has the situation improved at all? 

Paris, RichiH,
Any feedback you are hearing with your Dev Rep hats on?

ChrisA,
Looks like this came up in both TOC and GB meetings, but we have feedback that the folks working hard on etcd are not really seeing changes in their day-to-day work. Anything we can do from the CNCF side to help? 

Do we all want to meet on a call? I can offer up an upcoming TOC call to talk about this? Please don't wait for the call to discuss this, feel free to send your thoughts/ideas/status here on this thread.

thanks,
Dims

On Mon, Mar 7, 2022 at 1:11 PM 'Marek Siarkowicz' via steering <steering@...> wrote:

We (@serathius, @ptabor) are reaching out to K8s steering committee to bring to their attention recent changes in and the current state of the etcd community. 


In the last few months, primary maintainers Gyuho Lee (@gyuho, Amazon, announcement) and Sam Batschelet (@hexfusion, Red Hat) have stopped actively participating in the project. This leaves the project with only one active and two occasionally-reviewing maintainers, Marek Siarkowicz (@serathius, Google),  Piotr Tabor (@ptabor, Google), both are relatively new to the project (1 month and 1 year of tenure) and Sahdev P Zala (spzala@, IBM). Other maintainers are either dormant or have very minimal activity over the last six months. The project is effectively unmaintained.


This lack of maintainers is impacting the community:

  • Cannot make important project decisions (like conflict resolution) based on governance as it requires a supermajority of maintainers to agree. This has especially bad impact on the design process, where major proposals don’t get enough feedback and scrutiny. Due to lack of maintainer activity, we cannot introduce a proper approval process, resulting in important features getting reviews from only one maintainer. For example #13168 was reviewed by only @ptabor (relatively new maintainer) and @lilic (reviewer, no longer active in project).

  • Unable to reliably triage issues and release bug fixes. Fixes for critical bugs can take months to be released, causing users to lose trust and not adopt new releases. For example v3.5 was released with multiple critical bugs (#13196, #13192) and it took the community over a quarter to release fixes, making it unusable in production. As of v1.23.3 Kubernetes still recommends the mostly broken Etcd version v3.5.0 (#106589). 

  • Slowed or blocked contributions. In theory all changes should be reviewed by 2 maintainers before submitting. A second view-point is especially important for Etcd, to ensure security and correctness of changes, as they can be difficult to verify. We have been forced to break this rule and rely on lazy consensus, making the whole process error prone. In case of a mistake we are only able to verify them via prod-releases (which are 2 years apart).  There is no healthy feedback loop due to maintainers changing too frequently.


Etcd is a critical dependency of Kubernetes. If the situation in etcd doesn’t improve it will create a significant risk for the future of the K8s project. This may impede improvements in K8s reliability or other areas that require changes on the etcd side. It may also lead to a situation where a severe etcd bug, like data corruption, gets detected after it’s already present in tens or hundreds of thousands of Kubernetes clusters around the globe. This could irreparably break users' trust in Kubernetes.

We're hoping that by bringing this to attention we can start discussing and planning making proper steps to mitigate the issue. Thanks, Marek 

--
You received this message because you are subscribed to the Google Groups "steering" group.
To unsubscribe from this group and stop receiving emails from it, send an email to steering+unsubscribe@....
To view this discussion on the web visit https://groups.google.com/a/kubernetes.io/d/msgid/steering/CAJs3Yt1%3DvTgMAMvY6Lk%3D5L3X7fhg9FV%2BHKMCb4Et-AX-TNWf%3DA%40mail.gmail.com.


--
Davanum Srinivas :: https://twitter.com/dims

Join cncf-toc@lists.cncf.io to automatically receive all group messages.