Wednesday, July 22, 2015

Never trust a subcontractor

It all started with a phone call. "The whole network at [customer redacted] is down and they have no power - they need your help."

My blood ran cold. The engineer calling me sounded panicked, and for good reason. [Customer redacted] has an enormous natural gas facility in South Texas, too far from civilization to get enough power off of the grid. We designed and built an onsite natural gas power plant for them - a big one, capable of supplying 40+MW of power at peak load. They could run the facility for a short while without the power plant, but not long - and shutting down the facility meant losing 7 figures per hour. By the time I was informed, they had 6 hours until they had to shut down.

As the guy who had designed and installed said network, I was naturally the guy to call when it had problems, which had never happened before. It was a pretty simple network, honestly - just switches, cat5 cables and fiber. Since this was the network all the PLCs, relays, meters and whatnot ran on, it was airgapped & isolated, no routers. Not much to go wrong.

I quickly get on the phone and walk the guy on their end through plugging in a laptop and running
some simple tests. Check lights on things, ping this, ping that. Everything seems good, though. The network is emphatically not down. So I send him a remote app and take control of his laptop to see for myself.

Log into switches, check things, nope, the network's not down. When I log into the HMI system, though, I see a big red error message: "Network Error: Cannot connect to database". The database server is up, though. I log into the database server (Windows Server 2012 running MSSQL) and that's where I find the problem: SQL isn't running. I try to start it and it immediately shuts back off.

Now this is very bad for a couple of reasons. This server provides databases for a couple of very critical things in this plant. Included are the PLC systems, which explains why the plant shut down - the PLCs weren't running and the monitoring system had shut down the turbines. Additionally, this was not a problem with the network, this was a problem with a server, and we had subbed out the systems work to [PLC Contractor], so I had no idea how all this was supposed to function and no documentation on it. I'm flying blind here.

A couple of phone calls later I find out that [PLC Contractor] had subbed out procurement and set up of the server to [incompetent morons], who was on vacation. [PLC Contractor] doesn't want to touch it, tells me to call [incompetent morons]. Woohoo. 4 hours to shutdown and this has now become my multi-million dollar problem.

Back at the database server I start digging to find out why SQL won't start. It didn't take me very long to find the problem: "Invalid license data. Reinstall is required." Turns out that [incompetent morons] had bought the appropriate licenses for SQL 2012, but never bothered to retrieve or use the license key. The plant had been running on a trial install of SQL this entire time, and went down because the trial period ended. I wish, I really wish I was making this up, but I am not.

Of course, I can't do anything to jeopardize the integrity of the database, including reinstalling, and I don't have the license key. The next hour or so consisted of lots of angry phone calls until someone at [PLC Contractor] finally dug up the documentation from the purchase, and the hour after that consisted of, roughly:

  1. Impersonate the CEO of an energy company ([incompetent morons] put him down, personally, on all the Microsoft paperwork), to get into the MSDN account [incompetent morons] had inexplicably set up in his name.

  2. Find out that MS does not just give you a license key anyway, it's embedded in the install disk. Which means that [incompetent morons] actually bought SQL and then proceeded to set it up using a trial copy.

  3. Sweat bullets waiting for the install disk to download.

  4. Extract license key from install disk.

  5. Use procedure I discovered using the powers of google to update the license information without reinstalling.

  6. Reboot everything, sweat more bullets.

After the database server was again serving databases, the PLC came back online and the techs were able to restart the turbines. We avoided shutdown by about 30 minutes.

I took the rest of the day off.

Source: reddit

No comments: