Understanding “message digests”

click to enlarge

Both EFS-Web and Patentcenter make use of what USPTO calls a “message digest”.  You see the “message digest” in the Acknowledgment Receipt after e-filing a document.  What is a message digest?  Why do we care? 

The point of the message digest in an Acknowledgment Receipt is to “lock in” the particular file that was supposedly uploaded to the USPTO e-filing system.  The message digest has the practical effect of making it practically impossible for the filer to later claim that the file that the filer actually uploaded was some other file rather than the file that now appears in IFW.  It also has a practical effect of making it practically impossible for anyone at the USPTO to slip a different file into IFW other than the actual file that was uploaded, so as to tamper with the contents of IFW.  In the event that the USPTO were to claim that it did not actually receive some file that the filer claims to have e-filed, this message digest permits the USPTO to trust the filer when the filer later provides a copy of what the filer claims to have e-filed.

How exactly does this message digest work?

Look at the excerpt quoted above from the Acknowledgment Receipt in one of our firm’s cases.  If you wish to follow along, this is a published patent application that you can see here.  What we e-filed is this PDF file.  When we e-filed the PDF file, the ack receipt said this:

MESSAGE DIGEST (SHA-512)

065B59D9452A47ED54C19C1B9ACEB3F9452B14C9ABBBEA57ADF8A29B68C52A8CE2BDEC6355FFB09648DC02DB77EE310BBD9E1BFBF8EFF89A16307DA6134CB076

What this means is that the USPTO took our entire PDF file which was exactly 15080 bytes in size, and passed it through a function called the SHA-512 function.  The SHA-512 function is a “hash” function.  The output is what you see here, a string of hexadecimal characters starting with “065B” and ending with “B076”.  One of the desired features of a well designed hash function is that any change in the input, even a change in only one character (one byte) among the 15080 bytes in the file, will change almost everything about the output of the function.  A second desired feature of a well designed hash function is that if someone hands you an output (such as the output quoted above), you would find it practically impossible to arrive at any particular input file that would generate that particular output.  

Among the people who do this stuff for a living (cryptographers), the SHA-512 function is generally thought to be pretty good, coming pretty close to satisfying both desired features fully.

So here is what you can do yourself if you wish to follow along.  Download this very PDF file from this blog.  You will see that it is a response to a restriction requirement.  

Next, surf the Internet and select at random one of the many web sites that offer to calculate a SHA-512 hash from a file that you upload to the web site.  One example of such a web site is https://hash.online-convert.com/sha512-generator .  But pick a different web site.  Or maybe you have a program on your own computer that calculates SHA-512 hashes, in which case, use your own program.

In any event, pick one.  And upload this very PDF file into that hash function.  Look at the answer that it gives to you.  I think it will start with “065B” and will end with “B076”.  If you were to go to the trouble to check all of the digits in the middle, I think they would match those listed in the ack receipt quoted above.

From all of this, we can see a number of desirable results.

Non-repudiation on the part of the filer.  Suppose a filer were for some reason to get the idea of trying to convince the USPTO that the file that now appears in IFW is not actually what the filer e-filed.  Suppose the filer were to offer up to the USPTO some other file that happens to have the same file name and happens to have the same file size (15080 bytes).  The filer’s attempt to get away with substituting this other file would be found out instantly.  A USPTO person could simply run the purported “same file” through a SHA-512 function and the resulting hash would not match what is listed in the ack receipt.

Non-repudiation on the part of the Patent Office.  Suppose there were some system crash at the USPTO, or some other event leading to a loss at the USPTO of what had been e-filed.  (This kind of problem is not idle to imagine;  see Patentcenter trouble ticket P30.)  What might happen next is the filer might find it necessary to proffer a copy of the ack receipt to the USPTO as proof of what the filer uploaded.  The filer would also proffer a copy of the file itself that had been uploaded.  Even if a USPTO person were for some reason skeptical as to whether or not the proffered file was indeed the actual file that had been uploaded, this SHA-512 hash value in the ack receipt would come very close to eliminating any doubt.  If the hash of the proffered file matches the hash listed in the ack receipt then the matter would, as a practical matter, be settled.

The supreme importance of preserving the exact file that had been uploaded.  It will thus be appreciated that for the filer, it is of supreme importance to maintain, in perpetuity, the exact file that the filer had uploaded into EFS-Web or into Patentcenter. 

It is no good, for example, to print it out and scan it and to maintain a scanned copy.  The result of the scan would be some other PDF file that is guaranteed not to generate the same SHA-512 result.

It is no good, for example, to preserve merely the word processor file that had been used to generate the PDF file.  Ten minutes later, or ten years later, any attempt to generate a PDF file from that word processor file would almost certainly generate a PDF file that is non-identical to the originally uploaded PDF file.  The new and old PDF files might look exactly the same when printed on a printer, but that’s not the point.  The point is that mathematically, the ones and zeros in the two PDF files would almost certainly be non-identical.  The result of generating a second PDF file at some later time, is that this later PDF file is nearly guaranteed not to generate the same SHA-512 result.

Only the exact actual file that the filer had uploaded will be capable of serving the purposes described here.

3 Replies to “Understanding “message digests””

  1. Carl, you are in effect recommending a folder dedicated to preserving, without change, any file ever uploaded to the PTO, correct?

    1. Well that would be one approach. Another approach would be to maintain individual folders broken down by client, or still more individual folders broken down by individual files in which e-filings had taken place.

  2. Sometimes I merge the downloaded EAR file with the uploaded document files, for example when the IFW does not yet show the thing filed at the time I am reporting out the thing filed to a client. While I save the combined document to a different file, I have not consciously made an effort to preserve, untouched, the document I filed, for the purposes you mention. Hmm. Time to rethink processing!

Leave a Reply

Your email address will not be published. Required fields are marked *