Re: Operator Framework questions
toggle quoted message Show quoted text
I am an early engineer on the operator idea, and drove the inception of SDK/OLM etc. at CoreOS .
OperatorHub is designed to be operator first, which means the users need to understand the operator details, and to discover operator itself, then the CRD/services. I feel one thing we should try is not to first discover operator but first to discover the CRD (the resource which describes the actual service), and then verify (or install) the corresponding operator (through Helm or whatever popular package manager). That might be what most users need I guess.
------------------------------------------------------------------From:Matt Farina <matt@...>Sent At:2019 Oct. 15 (Tue.) 07:36To:Daniel Messer <dmesser@...>; cncf-sig-app-delivery <cncf-sig-app-delivery@...>Subject:Re: [cncf-sig-app-delivery] Operator Framework questionsDaniel, thanks for the added color. I have more questions...Operators are meant to be used by end user to drive value - so there needs to be an on-cluster mechanism to discover which Operators with which services are available.This and some of the other comments sounds similar to a service catalog along with elements of OLM (e.g., the metering and chargeback operators). I'm curious how much of the service catalog ideals are tied to all of this.If you don't mind, I'd like to look at a couple examples that's are different.WordPress is an all to common example so I'll use that. Let's say someone is operating a WorPress as a service site. A customer comes along to their UI and signs up for a new site. Their web app tells an operator in Kubernetes to create a new site. When that site is ready it's shared with the new customer. The lifecycle of this WordPress site is managed by the operator while the UI, billing, and other elements are handled by customer facing applications.For a second example, what if someone needs just one instance of an application but they want that instance to be fully managed. That includes failure handling that is usually encoded in a run book.How would these examples fit with the Operator SDK and OLM?There are a bunch of Operator lifecycle aspects, mainly around CRDs, that OLM takes care of so you don't have to. E.g. what happens when you attempt to install an Operator and there is already a CRD on cluster installed - is it safe to proceed? Is there maybe another Operator already present owning these CRDs? How would you discover this relationship?What happens when a CRD gets an update, e.g. in a new version of a CRD or a versioned CRD? Does this break existing workloads?The the details documented on this anywhere? I would love to read them. For example, let's assume you have two of these WordPress operators/controllers running in a cluster. They are namespace scoped and do not overlap. How do you handle the CRD being cluster scoped? How do upgrades of the CRD work to make sure it works for all operators/controllers? Or, are some assumptions made that situations like this will not occur?If you have some time I would appreciate digging into these forms of situations. This is in part because people need guidance on working with CRDs and some of that falls into Kubernetes SIG Apps domain where I spend some of my time.Thanks,Matt FarinaOn Tue, Oct 15, 2019, at 5:39 AM, Daniel Messer wrote:On Tue, Oct 15, 2019 at 1:39 AM Matt Farina <matt@...> wrote:Please excuse my ignorance on some of this software. I have a number of questions now that I'm starting to look around.OLM is a package manager for operatorsOut of curiosity, why create another package manager? What features are missing from Helm? If some feature was missing I'd love to talk about it to learn the context and assumptions around it.There are a bunch of Operator lifecycle aspects, mainly around CRDs, that OLM takes care of so you don't have to. E.g. what happens when you attempt to install an Operator and there is already a CRD on cluster installed - is it safe to proceed? Is there maybe another Operator already present owning these CRDs? How would you discover this relationship?What happens when a CRD gets an update, e.g. in a new version of a CRD or a versioned CRD? Does this break existing workloads?Such as dependent Operators: in order to drive developers to work towards API (as they do with Kubernetes) it would be nice to express a dependency on another Operators' CRDs, not the Operator itself.Operators are meant to be used by end user to drive value - so there needs to be an on-cluster mechanism to discover which Operators with which services are available. Since these users are usually not cluster-admins, there should be a safe way to do that without these privileges.Operators also support the concept of watching N namespaces while being installed in a different namespace. This is a configuration that prevents privilege escalation where you'd normally just be able to hijack the potentially very privileged ServiceAccount that got deployed in your namespace. This configuration needs to be expressed in some standardized way.OLM also comes with a first-class concept of automatic over-the-air updates of Operators in the background as part of regularly updated catalogs. This is due to long-lived nature of Operator workloads themselves.Hope this sheds some lights of the intentions of OLM. It is a package manager as well, but primarily it provides the additional guardrails needed in order to deploy, discovery and run Operators from a central point on cluster. I think it's worth pursuing designs where the packaging format of helm can be re-used, though. The use case is different though, since Operators are not regular applications but behave more like kernel extensions.When I start to look at OLM I see things like Install Plans and Subscriptions. This looks a lot like Kubernetes Service Catalog. Is there any kind of write-up comparing the two?You can do a lot with operators. If OLM has a lot of similarities to Service Catalog and has a bit of that intent in mind, what about the use cases aren't related to the service catalog type setup?Should I ask these questions over on the TOC issue instead?Thanks,MattOn Mon, Oct 14, 2019, at 6:03 PM, Michael Hrivnak wrote:On Mon, Oct 14, 2019 at 4:50 PM Jay Pipes <jaypipes@...> wrote:On Mon, 2019-10-14 at 16:41 -0400, Michael Hrivnak wrote:> The "app store" is merely the> discovery mechanism; how many different tools will a cluster owner> want to be involved in managing and upgrading the workloads on their> cluster? Different cluster owners will have different tolerances, but> many will want to standardize on one mechanism for controlling what> versions of software are running in their cluster.Agreed. This is why I think separating the method of discovery andlifecycle management from the operator-sdk is critical.Creating an operator and creating its packaging are at least logically separate, even if one tool can do both things. When you create an operator with operator-sdk, it does not have any packaging unless you explicitly run "operator-sdk olm-catalog gen-csv" with some arguments. Adding OLM packaging is optional and does not happen by default.Perhaps a developer would like to use the operator-sdk to develop theirOperator, but provide a non-OLM way of packaging that code. Decouplingthe SDK from the method of packaging and managing the code that isproduced by the SDK seems to enable that, no?This is already easy to do. To start, after you've made an operator with operator-sdk, you can install it manually by running "kubectl create -f" on a few of the manifests that operator-sdk creates for you. They just contain the CRD, RBAC resources, and a Deployment. You could then turn those manifests into a helm chart that installs your operator, or use any other tool you choose.--
Principal Software Engineer, RHCE
He / Him / His
Product Manager OpenShift