notes on cluster problems when taking my beta exam tonight :-(
Hi, I finally took my beta exam and at some point when I was working my way to question 29, with about 1h remaining, the cluster stopped responding to all kubectl commands. Cluster-info showed only master was up, no signal of etcd so I suppose the whole thing was not going to play ball anymore.
I tried to take a screenshot of that to show you guys here but the proctor freaked out immediately and treated to terminate my exam. I tried to explain to him what was going on but it seems proctors have zero context of what is going on with the exam and its content and its environment... bummer.
So, bottom line is that I think I did well with about 21 questions answered. As for the remaining ones, question 29 was answered on disk /home/student/q29.conf but not applied to the cluster due to the problem above. I left other 2 pretty questions (one to sort CPU usage and another one I can't recall now) for last but I guess I was too dumb in doing so, I could have got 2 more answered fine. All the others I had left for last (basically the troubleshooting ones) were left behind for good...
I don't now how this will be graded, nor it anything done to the cluster was my fault or what but I tried to look around for some Juju and Snap fu tricks to restore etcd from a working unit in the cluster but I guess the one we log in to is set up differently than the Juju units via CDK. CDK I think protects "local" snaps like etcd so there is not way to bork the cluster, but somehow the outer env where /home/student is I think people will quite easily bork it if it keeps the way it is today when the exam goes live!
— Caio Begotti
Thanks for reporting this, sounds like one that hasn't been
reported before and I'm sure it was both stressful and annoying.
We'll definitely grade around it for your exam so that you're not
penalized and I'll provide this info to the relevant team so they
can triage what happened (we keep the machines alive for a day or
two post exam so should be able to do a thorough analysis).
As for your second note about etcd, I'm sending you a separate (off-list) note to introduce you to the engineer who can work through it with you.
On 09/07/2017 09:18 PM, Caio Begotti wrote:
-- Clyde Seepersad General Manager, Training & Certification e: cseepersad@... p: 404 964 6973