toggle quoted messageShow quoted text
I would be weary of requiring quantification of scalability or performance as a graduation project requirement, mainly because in my experience it's virtually impossible to accurately do so given the number of different configurations and deployment targets (on prem, IaaS, different HW types, etc.) that end users end up deploying the software to.
I would much rather see us get crisper around the underlying attributes that we care about and be more clear about the requirements (not leave so many things up to the "judgement" of the TOC). These are attributes like security auditing and CVE procedures, CI, test coverage % and type of test coverage (unit tests, integration tests, fuzz tests, etc.), code review procedures, stability of the master branch, PR review/merge throughput, number of end users, scale of end users, number of full-time maintainers, number of maintainer orgs, governance and dispute resolution model, etc.
I have lots of opinions on how I would personally quantify some of the things I listed above as a graduation requirement, but it's probably more prudent to start with a list of things that we care to quantify.
I think “production ready” has a lot to do with known, accurate and well-documented limitations, as opposed to no limitations at all, or vague claims, or an absolute set of metrics that need to be met (e.g. around scalability) for all projects.
As a contrived concrete example, a project that reliably and demonstrably scales to say 100 nodes, and clearly publishes data supporting that fact, might be perfectly production-ready for a user that has no intention of ever exceeding that scale for their
use of that project (and vastly more appealing than another project that makes vague and exaggerated claims about scalability, which turn out not to be true in practical use cases).
For that reason I like the CII model, which is more about clearing articulating what’s there, and what’s not, than it is about checking off a bunch of “must-have" checkboxes. Clearly there will be at least a few “must-have” checkboxes, but I think there
will be vastly more “do we understand and clearly document this limitation” type items. And then the overall question around whether, given the known limitations, a project is useful for a sufficiently significant set of production use cases.
Until now, we have tended to use the number and size of claimed or actual production use cases as an approximation of the answer to the aforementioned question.
Welcome new TOC members!
I didn't participate in some recent project graduation votes because I didn't feel I had adequate information to make a decision. In one case, due diligence that had been performed hadn't been documented or presented. In another, the content
of the application (basically a checklist and a list of users) didn't seem sufficient, despite nominally meeting our criteria.
Our current criteria are here:
There is a proposal to add a security audit to the requirements, which is a good step:
But I think we need to start with revisiting what we want graduation to mean to users, and then ensure that the criteria ensure those attributes. I should also add that whatever criteria we come up with, we should ensure the CNCF helps projects meet those
Our criteria imply that we want users to be able to use the projects in relatively critical (probably should be defined) so-called "production" use cases. How should we ensure that is the case?
I've recently heard from a user that they didn't think most software in the ecosystem was usable in production due to lack of scalability, reliability, security, and other issues. I also heard from a security engineer that they wouldn't trust most open source
due to lack of rigorous review processes, especially of dependencies. Within Kubernetes, we've found that CVEs don't appear to be tracked for Golang libraries
Does wide usage of a project suggest that these issues have been overcome? That's not clear to me, particularly since Kubernetes itself needs plenty of improvement.
I've started to look more stringent CII criteria:
One possible approach is for us to require the gold standard, and then work with CII to ensure it covers some of the relevant criteria, or to define an even more rigorous "platinum" level.
We also might want a scalability standard. Is 100 nodes/instances/something sufficiently scalable? 1000?
I also assume we want users to value the CNCF graduated status. As is, it's hard for an external observer to tell whether we're a rubber stamp or made a well informed decision. Perhaps it's worth providing a rationale/justification statement rather than