USPTO says a little more about the cause of the system crashes

Here is a statement dated December 24.  You can see the original here.

Statement of USPTO Acting Chief Communications Officer Patrick Ross

On Tuesday night, December 22, a major power disruption to the USPTO’s data center resulted in the shutdown of our public filing, searching, and payment systems, as well as the core systems our patent and trademark examiners use. Since then, dedicated teams have been working around the clock with our service providers to assess the situation and safely stabilize and restore those systems. Repair estimates remain the same—that the USPTO will be impacted at least through December 25.

Power that comes into the USPTO’s main building feeds two power filtration systems that provide steady, “filtered” power so systems don’t suffer from damaging surges or drops in power supply. A malfunction in the power supply lines feeding these two systems caused significant damage to both systems. This is what we believe caused our systems to go down on Tuesday night.

Because of their size, these large and highly complex power filtration systems cannot be easily replaced. We are working with service providers to obtain a source of uninterrupted conditioned power to the data center as soon as possible.

The USPTO will continue to provide updates, such as yesterday’s announcement of our filing deadline flexibility, through our systems status webpage (www.uspto.gov/blog/ebiz/) and Facebook account (www.facebook.com/uspto.gov (link is external)). With that information, users can make informed decisions about how best to allocate their own time and efforts while the problem is being addressed.

Our IP system is vital to our 21st century knowledge economy. Therefore, having timely and efficient public access to all of our agency’s filing, searching, and payment systems is also vital. The USPTO is mindful of our customer’s needs and appreciates the continuing patience.

So let’s try to put this into plain language.  Some years ago, USPTO spent oodles of money installing redundant “filtration systems” that are intended to protect USPTO’s e-commerce servers from problems with the electricity provided by the electric company.  On December 22, a problem happened with the electricity provided by the telephone company.  Every one of USPTO’s e-commerce servers promptly crashed.  I can’t quite put my finger on what sounds wrong about that.

 

It is now December 25.  About ¾ of USPTO’s e-commerce servers (the least mission-critical servers) are now back online.  So clearly somebody has figured out how to get power reconnected to servers.  Yet, though somebody has figured out how to get power reconnected to ¾ of the servers, still the EFS-Web and TEAS and IFW servers are broken.

The statement is not as clear as it might be, but I guess the situation is that the power that has been connected to the non-mission-critical servers is power that cannot be said to be “uninterrupted conditioned power”.  And until USPTO is able to restore “uninterrupted conditioned power”, USPTO is nervous about flipping the power switches to turn the mission-critical servers (such as EFS-Web and TEAS and IFW) back on.

Too bad that USPTO had not followed suggestions from years ago to move the “contingency” EFS-Web server to a geographically diverse location.

Anyway, “uninterrupted conditioned power” is actually quite easy to get.  USPTO could go on Amazon and place orders for two or three dozen of the biggest UPSs and they would be delivered the next day.  Or USPTO could dispatch employees to stop by all of the nearby Best Buy stores and buy all of the big UPSs.

Two or three dozen big UPSs would be more than enough to power the USPTO mission-critical systems.

4 Replies to “USPTO says a little more about the cause of the system crashes”

  1. Wishful thinking… unless they have implemented a GSA schedule with one or more suppliers, they will likely put out a call for tender, ultimately taking many more months until this problem has been addressed.

  2. Based on the information they’ve provided, here’s what I’m guessing they have and what happened. They are running true online double-conversion UPS’s. These UPS’s take AC power, convert it go DC power, and then convert it back to AC power. Such UPS’s are the best of the best, because any under-voltage or over-voltage conditions (or frequency deviations) are compensate for. By converting AC to DC and back to AC, the AC that’s output is a perfect 60 Hz sine wave at 120 V.

    Now, even industrial grade servers typically only have MOVs as their surge protectors. MOVs are sacrificial devices, and they degrade over time. Little surges slowly chip away at their ability to absorb surges. Further, MOVs can’t handle a huge surge — like a lightning strike, or a transformer going down and rushing orders of magnitude increased voltage.

    I’m guessing the latter happened here, and it has damaged all the servers. So, presumably, they have to get new servers and restore from backups. They should have a few backup servers in boxes so they can begin this process immediately, but it appears they don’t, and had to order them.

    Obviously, all of this would be besides the point if they had geographically diverse servers.

    The solution going forward is to get series mode surge protectors that are non-sacrificial. The company that holds the patents to this is Zero Surge, and they license their technology to some other companies (SurgeX, Brickwall):

    http://www.zerosurge.com/

    Industrial grade true online double conversions UPS’s are huge, and is not something you can just buy from Amazon. Eaton is a primary supplier. Another possibility is that the servers are OK, but they don’t want to bring the servers online until the UPS’s have been replaced, because once the UPS’s arrive, they have to shut down all the servers to insert the servers between power and the servers.

    In this case, they should get Liebert MicroPods. These are neat little devices that permit you to flip a switch to bypass a UPS so that you can replace or service the UPS, without taking down the server.

    http://www.emersonnetworkpower.com/en-US/Products/ACPower/RackmountUPS/Pages/LiebertMicroPODMaintenanceBypassandOutputDistributionAccessory.aspx

    Really, this is just poor planning on the part of the IT department at the PTO. The stuff I noted above is well known in the industry. But, again, the primary problem is the lack of geographical diversity. The fact that this wasn’t done is a huge indication that the IT department doesn’t know what it’s doing.

    1. “…is a huge indication that the IT department doesn’t know what it’s doing.”

      Apparently, the USPTO legal department doesn’t know what it is doing either by declaring a Federal holiday instead of declaring the USPTO to be closed.

      A SPE once told me (after reversing a rejection in his art unit) that he was generally disappointed in his examiners and that the USPTO just doesn’t attract top quality professionals.

Leave a Reply

Your email address will not be published. Required fields are marked *