Troubleshooting a Helm installation


bert.laverman@...
 

Hello Helm!
I have a Helm installation set up to install some of our DevOps toolchain, which generally works fine and as such tends to get "ignored" on the Ops side. As a result helm commands are only run rarely (once every few months).

My current task started with "Let's add a PostGreSQL installation, and then ran into "Bad Certificate" errors. Unfortunately there is no Helm-for-Helm, and the documentation describes how to setup/install Helm securely pretty well, but not so much how to find out why it stopped working. All applications I have installed work fine, but Helm itself refuses to do anything. My last bit of work was on refreshing the certificates (which naturally ran out unnoticed), and I could reinstall our Ingress Controller after that, but 4 weeks onwards it refuses to budge.

"helm list --tls --debug" simply responds with
[debug] Created tunnel using local port: '55563'

[debug] SERVER: "127.0.0.1:55563"

[debug] Key="/Users/bertlaverman/.helm/key.pem", Cert="/Users/bertlaverman/.helm/cert.pem", CA="/Users/bertlaverman/.helm/ca.pem"

Error: remote error: tls: bad certificate

This does not really tell me anything beyond that I have some kind of SSL related issue. Which certificate is in trouble, Helm or Tiller, is unknown, nor what exactly is Helm's beef with it.
Who can help me with some basic troubleshooting guidelines?

I don't mind cleaning the resulting story up for addition to the docs, but for now I'm kind of looking at having to spend days on trying to find out what happened here.

Cheers,
Bert


Joe Thompson <kensey@...>
 

On Oct 30, 2019, at 05:26, bert.laverman@... wrote:

My current task started with "Let's add a PostGreSQL installation, and then ran into "Bad Certificate" errors. Unfortunately there is no Helm-for-Helm, and the documentation describes how to setup/install Helm securely pretty well, but not so much how to find out why it stopped working. All applications I have installed work fine, but Helm itself refuses to do anything. My last bit of work was on refreshing the certificates (which naturally ran out unnoticed), and I could reinstall our Ingress Controller after that, but 4 weeks onwards it refuses to budge.

"helm list --tls --debug" simply responds with
[debug] Created tunnel using local port: '55563'

[debug] SERVER: "127.0.0.1:55563"

[debug] Key="/Users/bertlaverman/.helm/key.pem", Cert="/Users/bertlaverman/.helm/cert.pem", CA="/Users/bertlaverman/.helm/ca.pem"

Error: remote error: tls: bad certificate

This does not really tell me anything beyond that I have some kind of SSL related issue. Which certificate is in trouble, Helm or Tiller, is unknown, nor what exactly is Helm's beef with it.
Who can help me with some basic troubleshooting guidelines?
First place I’d start would be watching the Tiller pod logs, just to see if there’s anything obvious there (`kubectl get logs -f [tiller pod]` while you’re trying to use the Helm client). But from Tiller’s perspective I suspect everything is fine, and this is the client complaining *about* Tiller. But it might also be Tiller, complaining that the client cert is bad in some way (e.g. not signed by a trusted CA).

So the next step is, check the certificates Tiller is using to identify itself and the CA it’s using to verify client certificates (Tiller mounts a Secret with the necessary data, so you should see that Secret and its mount info referenced in the pod spec if you inspect it with `kubectl get pod -o yaml [tiller pod]`, then you can inspect and decode the secret and see if what’s there is correct). Make sure the CA that generated your `helm` client certs is one Tiller trusts, and that the CA that generated the Helm certs is one the client trusts.

Past that point there are some more advanced things you can do like dumping the Tiller cert directly with `openssl s_client`, but I’ll bet that after checking the above you’ll have already found the problem and know how to fix it. — Joe


Josh Dolitsky
 

I might also recommend bypassing your remote Tiller entirely to reduce a lot of ops headache. I've been using the helm-tiller plugin which starts a Tiller server locally for the lifetime of your command.

You just prefix all of your helm commands with "helm tiller run" - for example "helm tiller run helm list". This should ideally pick up your existing releases.

Beyond that, you could also try to start using Helm 3 (rc1), and convert your releases using the helm-2to3 plugin with the --tiller-out-cluster flag.

Josh


On Wed, Oct 30, 2019 at 7:37 AM Joe Thompson <kensey@...> wrote:
On Oct 30, 2019, at 05:26, bert.laverman@... wrote:
>
> My current task started with "Let's add a PostGreSQL installation, and then ran into "Bad Certificate" errors. Unfortunately there is no Helm-for-Helm, and the documentation describes how to setup/install Helm securely pretty well, but not so much how to find out why it stopped working. All applications I have installed work fine, but Helm itself refuses to do anything. My last bit of work was on refreshing the certificates (which naturally ran out unnoticed), and I could reinstall our Ingress Controller after that, but 4 weeks onwards it refuses to budge.
>
> "helm list --tls --debug" simply responds with
> [debug] Created tunnel using local port: '55563'
>
> [debug] SERVER: "127.0.0.1:55563"
>
> [debug] Key="/Users/bertlaverman/.helm/key.pem", Cert="/Users/bertlaverman/.helm/cert.pem", CA="/Users/bertlaverman/.helm/ca.pem"
>
> Error: remote error: tls: bad certificate
>
> This does not really tell me anything beyond that I have some kind of SSL related issue. Which certificate is in trouble, Helm or Tiller, is unknown, nor what exactly is Helm's beef with it.
> Who can help me with some basic troubleshooting guidelines?

First place I’d start would be watching the Tiller pod logs, just to see if there’s anything obvious there (`kubectl get logs -f [tiller pod]` while you’re trying to use the Helm client).  But from Tiller’s perspective I suspect everything is fine, and this is the client complaining *about* Tiller.  But it might also be Tiller, complaining that the client cert is bad in some way (e.g. not signed by a trusted CA).

So the next step is, check the certificates Tiller is using to identify itself and the CA it’s using to verify client certificates (Tiller mounts a Secret with the necessary data, so you should see that Secret and its mount info referenced in the pod spec if you inspect it with `kubectl get pod -o yaml [tiller pod]`, then you can inspect and decode the secret and see if what’s there is correct).  Make sure the CA that generated your `helm` client certs is one Tiller trusts, and that the CA that generated the Helm certs is one the client trusts.

Past that point there are some more advanced things you can do like dumping the Tiller cert directly with `openssl s_client`, but I’ll bet that after checking the above you’ll have already found the problem and know how to fix it. — Joe



Matt Fisher <matt.fisher@...>
 

There's a section about this in the troubleshooting documentation: https://helm.sh/docs/tiller_ssl/#troubleshooting

Let us know if this helps.

1504220459230_microsoft.png

Matthew Fisher

Caffeinated Software Engineer

Microsoft Canada


From: cncf-helm@... <cncf-helm@...> on behalf of bert.laverman via Lists.Cncf.Io <bert.laverman=axoniq.io@...>
Sent: Wednesday, October 30, 2019 2:26 AM
To: cncf-helm@... <cncf-helm@...>
Cc: cncf-helm@... <cncf-helm@...>
Subject: [cncf-helm] Troubleshooting a Helm installation
 
Hello Helm!
I have a Helm installation set up to install some of our DevOps toolchain, which generally works fine and as such tends to get "ignored" on the Ops side. As a result helm commands are only run rarely (once every few months).

My current task started with "Let's add a PostGreSQL installation, and then ran into "Bad Certificate" errors. Unfortunately there is no Helm-for-Helm, and the documentation describes how to setup/install Helm securely pretty well, but not so much how to find out why it stopped working. All applications I have installed work fine, but Helm itself refuses to do anything. My last bit of work was on refreshing the certificates (which naturally ran out unnoticed), and I could reinstall our Ingress Controller after that, but 4 weeks onwards it refuses to budge.

"helm list --tls --debug" simply responds with
[debug] Created tunnel using local port: '55563'

[debug] SERVER: "127.0.0.1:55563"

[debug] Key="/Users/bertlaverman/.helm/key.pem", Cert="/Users/bertlaverman/.helm/cert.pem", CA="/Users/bertlaverman/.helm/ca.pem"

Error: remote error: tls: bad certificate

This does not really tell me anything beyond that I have some kind of SSL related issue. Which certificate is in trouble, Helm or Tiller, is unknown, nor what exactly is Helm's beef with it.
Who can help me with some basic troubleshooting guidelines?

I don't mind cleaning the resulting story up for addition to the docs, but for now I'm kind of looking at having to spend days on trying to find out what happened here.

Cheers,
Bert