Pre-proposal idea: introduce a new flag for helm uninstall "--fail-on-error" and a new release status "failed-uninstall"


mikeshngdev@...
 

Hi all,

Following the HIP guide. I am looking for feedback regarding my idea below:

The current Helm uninstall implementation is marking release record as uninstalled even if Helm detected errors during the uninstall process. See:

https://github.com/helm/helm/blob/v3.4.2/pkg/action/uninstall.go#L122-L127

https://github.com/helm/helm/blob/v3.4.2/pkg/action/uninstall.go#L144-L150


If the release record is marked as uninstalled then re-attempting to helm uninstall doesn't actually delete any resources (just purge the release record) . See:

https://github.com/helm/helm/blob/v3.4.2/pkg/action/uninstall.go#L81-L91

I think it's very useful for scripts or automation tools to be able to re-attempt a helm uninstall that resulted in error.

That's why I am sharing my idea of adding a new flag to helm uninstall "--fail-on-error".
If this flag is enabled, and the helm uninstall is successful then it doesn't change any behavior.

If this flag is enabled, and the helm uninstall completed with errors then update the release record to "failed-uninstall" status.

This allows the next helm uninstall attempt to actually perform resources delete.

For reference, I initially suggested https://github.com/helm/helm/issues/9201 but after thinking about it I think this "--fail-on-error" idea is much clearer and cleaner.

Thank you for reading and please provide some feedback. I am willing to drive the implementation if needed.

Mike Ng


Joshua Packer
 
Edited

Hi Mike,

This seems like it could be very useful, especially during dev where there are more likely to be issues in the uninstall of the resources. Keeping the release would allow would give us a starting point to work from to recover, as you mention in your idea. Being an opt-in strategy, remove adverse affects for those with processes that expect the existing behavior.

Joshua Packer


Joe Lanford
 

Big +1 from me!

When trying to automate chart uninstalls (e.g. in the context of operator-sdk's helm-operator), it is fairly unintuitive and complex to set things up in such a way to guarantee resource cleanup. Off the top of my head, the following must happen:
1. All uninstall attempts must set `--keep-history=true`
2. Prior to making each subsequent uninstall attempt, the latest release status must be set to something other than `Uninstalled`.