One chestnut from my history in lottery game development:
While our security staff was incredibly tight and did a generally good job, oftentimes levels of paranoia were off the charts.
Once they went around hot gluing shut all of the “unnecessary” USB ports in our PCs under the premise of mitigating data theft via thumb drive, while ignoring that we were all Internet-connected and VPNs are a thing, also that every machine had a RW optical drive.
Over 150 Major Incidents in a single month.
Formerly, I was on the Major Incident Response team for a national insurance company. IT Security has always been in their own ivory tower in every company I’ve worked for. But this company IT Security department was about the worst case I’ve ever seen up until that time and since.
They refused to file changes, or discuss any type of change control with the rest of IT. I get that Change Management is a bitch for the most of IT, but if you want to avoid major outages, file a fucking Change record and follow the approval process. The security directors would get some hair brained idea in a meeting in the morning and assign one of their barely competent techs to implement it that afternoon. They’d bring down what ever system they were fucking with. Then my team had to spend hours, usually after business hours, figuring out why a system, which had not seen a change control in two weeks, suddenly stopped working. Would security send someone to the MI meeting? Of course not. What would happen is, we would call the IT Security response team and ask if anything changed on their end. Suddenly 20 minutes later everything was back up and running. With the MI team not doing anything. We would try to talk to security and ask what they changed. They answered “nothing” every god damn time.
They got their asses handed to them when they brought down a billing system which brought in over $10 Billion (yes with a “B”) a year and people could not pay their bills. That outage went straight to the CIO and even the CEO sat in on that call. All of the sudden there was a hard change freeze for a month and security was required to file changes in the common IT record system, which was ServiceNow at the time.
We went from 150 major outages (defined as having financial, or reputation impact to the company) in a single month to 4 or 5.
Fuck IT Security. It’s a very important part of of every IT Department, but it is almost always filled with the most narcissistic incompetent asshats of the entire industry.
Jesus Christ I never thought id be happy to have a change control process
Lots of safety measures really suck. But they generally get implemented because the alternative is far worse.
At my current company all changes have to happen via GitHub PR and commit because we use GitOps (ex: ArgoCD with Kubernetes). Any changes you do manually are immediately overwritten when ArgoCD notices the config drift.
This makes development more annoying sometimes but I’m so damn glad when I can immediately look at GitHub for an audit trail and source of truth.
It wasn’t InfoSec in this case but I had an annoying tech lead that would merge to main without telling people, so anytime something broke I had his GitHub activity bookmarked and could rule that out first.
You can also lock down the repo to require approvals before merge into main branch to avoid this.
Since we were on the platform team we were all GitHub admins 😩. So it all relied on trust. I’m sure we could have self policed better.
Hm can’t say. I’m using bitbucket and it does block admins, though they all have the ability to go into settings and remove the approval requirement. No one does though because then the bad devs would be able to get changes in without reviews.
That sounds like a good idea. I’ll take another look at GitHub settings. Thanks!
The past several years I have been working more as a process engineer than a technical one. I’ve worked in Problem Management, Change Management, and currently in Incident for a major defense contractor (yes, you’ve heard of it). So I’ve been on both sides. Documenting an incident is a PITA. File a Change record to restart a server that is in an otherwise healthy cluster? You’re kidding, right? What the hell is a “Problem” record and why do I need to mess with it?
All things I’ve heard and even thought over the years. What it comes down to, the difference between a Mom and Pop operation, that has limited scalability and a full Enterprise Environment that can support a multi-billion dollar business… Is documentation. That’s what those numb nuts in that Insurance Company were too stupid to understand.
You poor man. I’ve worked with those exact fukkin’ bozos.