I would like to share a personal horror story from when I was a newly-minted sysadmin. I was hired into my first sysadmin job with some Linux skills, but no system administration experience. It was a true junior position, and I was learning on the job while being trained by senior staff. The small team I was on supported the primary web servers for a university and its central IT department, to which my team belonged.
I’d been there no longer than a couple of weeks when I was asked to make a small change to the Apache server configuration hosting the IT department’s website. I had been trained by the senior staff in this situation. I knew how to make the change, commit it to our version control, and then roll the change out to the servers. I opened the Apache documentation too, just to be sure, and had it in front of me. I could do this!
I made the change, and double-checked it against the documentation. I committed it to Subversion, and rolled the change out to the servers. Satisfied at a job well done, I added my notes to the ticket, and then closed the request.
Also, I forgot an angle bracket.
A few minutes later, the change propagated to the servers, Apache restarted (or tried to), and the website for our department came crashing down. As I frantically tried to roll back my changes—I didn’t know what I’d broken, and could not, in the heat of the moment, remember how to get older versions out of Subversion—I could hear two coworkers talking in a cube nearby.
"Is our website down?"
"Haha, yeah, it looks like it."
Eye roll, snarky comment, nudge nudge wink wink.
I sunk lower in my cube, both shame and embarrassment adding to my panic as I finally retrieved the older version, rolled it out to the web servers, and verified that the site had come back up.
Later that afternoon, I was still in my cube, shame and embarrassment still in full effect, but panic replaced by fear. Two or three weeks into my job, I’d taken down one of our highest traffic sites, having been trained and trusted to do the job. Certainly, I was not going to be employed for long. That fear was only compounded when my boss’ boss’ boss showed up in my cube.
"Don’t worry," she said. I can only wonder what face of pure terror I made when she came into view.
"Don’t worry. We’re not mad at you. You made a mistake, and you fixed it. And now you have learned and you will not make that mistake again."
She was right. Sure, this was a simple typo, but I did not do a syntax check. This lesson has followed me. I never again committed code or configuration to a production service without testing.
I’ve shared this story before, most notably in an article about DevOps and in support of a culture of continual learning and experimentation. This same response from management is a partial example of a blameless postmortem, a key feature of the Google Site Reliability Engineering culture as well.
Humans make mistakes. Failures will occur. In a safe, blame-free culture, team members can learn from their mistakes, and as a result, services can be hardened against the same problems, and the team and organization as a whole can grow.
執筆者紹介
Chris Collins is an SRE at Red Hat and a Community Moderator for Opensource.com. He is a container and container orchestration, DevOps, and automation evangelist, and will talk with anyone interested in those topics for far too long and with much enthusiasm.
チャンネル別に見る
自動化
テクノロジー、チームおよび環境に関する IT 自動化の最新情報
AI (人工知能)
お客様が AI ワークロードをどこでも自由に実行することを可能にするプラットフォームについてのアップデート
オープン・ハイブリッドクラウド
ハイブリッドクラウドで柔軟に未来を築く方法をご確認ください。
セキュリティ
環境やテクノロジー全体に及ぶリスクを軽減する方法に関する最新情報
エッジコンピューティング
エッジでの運用を単純化するプラットフォームのアップデート
インフラストラクチャ
世界有数のエンタープライズ向け Linux プラットフォームの最新情報
アプリケーション
アプリケーションの最も困難な課題に対する Red Hat ソリューションの詳細
オリジナル番組
エンタープライズ向けテクノロジーのメーカーやリーダーによるストーリー
製品
ツール
試用、購入、販売
コミュニケーション
Red Hat について
エンタープライズ・オープンソース・ソリューションのプロバイダーとして世界をリードする Red Hat は、Linux、クラウド、コンテナ、Kubernetes などのテクノロジーを提供しています。Red Hat は強化されたソリューションを提供し、コアデータセンターからネットワークエッジまで、企業が複数のプラットフォームおよび環境間で容易に運用できるようにしています。
言語を選択してください
Red Hat legal and privacy links
- Red Hat について
- 採用情報
- イベント
- 各国のオフィス
- Red Hat へのお問い合わせ
- Red Hat ブログ
- ダイバーシティ、エクイティ、およびインクルージョン
- Cool Stuff Store
- Red Hat Summit