(Update: it is time for you, dear reader to consider signing another letter. See blog posting.)
Until now, it has been optional for a practitioner to file a US patent application in DOCX format rather than in PDF format. But USPTO now proposes to charge a $400 penalty for filing a patent application in non-DOCX format. This is a very bad idea, for reasons that I will discuss in detail. Only if USPTO were to make fundamental changes in its way of receiving DOCX files would it be acceptable for USPTO to impose a penalty for filing in a non-DOCX format.
USPTO needs to follow WIPO’s example, permitting the practitioner to file a “pre-conversion format” version of a patent application along with the DOCX file. In the event of some later problem with USPTO’s rendering of the DOCX file, the practitioner would be permitted to point to the pre-conversion format, which would control in the event of any discrepancy.
The normal way to file US patent applications is in PDF format. With PDF format, the applicant has complete control over the appearance of characters and symbols.
Some years ago, the USPTO began beta-testing a system that would permit a practitioner to file a patent application in DOCX format instead of in PDF format. Yours truly was among the very first of the beta-testers of USPTO’s system for DOCX filings. As implemented by the USPTO, the practitioner would upload a DOCX file, and USPTO would render the DOCX file in a human-readable PDF image format. As part of the e-filing process, the practitioner was expected to proofread the rendered image as provided by the USPTO’s e-filing system. The notion was that the practitioner would be obliged to catch any instances of USPTO’s system rendering the DOCX file differently from the way the practitioner’s word processor had rendered that same DOCX file. If, for example, some math equation or chemical formula had gotten corrupted in USPTO’s system, the practitioner would expected to catch this prior to clicking “submit”.
A first difficulty about this is that there is no single unambiguous thing called “DOCX” format. The history may be seen in this Wikipedia article. DOCX exists in many variants, and in particular Microsoft has a history of of making poorly documented changes over time to the ways that Microsoft Word implements DOCX formatting of documents.
USPTO inaccurately characterizes DOCX as if one could be sure that any word processor will implement DOCX in the same way as any other word processor. For example, USPTO says:
There are several word processors that can create and save in DOCX format, including Google Docs, Microsoft Word 2007 or higher, Office Online, LibreOffice, and Pages for Mac.
That statement is disingenuous at best, and borders upon falsity given that there is no single unambiguous DOCX format. A more accurate statement would be:
There are several word processors that can create and save documents in variants of DOCX formats, including Google Docs, Microsoft Word 2007 or higher, Office Online, LibreOffice, and Pages for Mac.
USPTO also says:
DOCX is stable and governed by two international standards (ECMA-376 and ISO/IEC 29500).
This statement is simply false. There is no single DOCX standard to which Microsoft Word and the other word processors are all compliant.
To give a simple example, consider this math equation in a patent application that I recently filed as a PDF-based PCT application using Libre Office:
As an experiment I uploaded the DOCX file of this PCT application to EFS-Web as if I were filing a domestic US patent application. The way the USPTO has designed EFS-Web, what happens next is that the practitioner sees this message in red letters:
The PDF(s) have been generated from the docx file(s). Please review the PDF(s) for accuracy. By clicking the continue button, you agree to accept any changes made by the conversion and that it will become the final submission.
It is easy to see that this filing procedure, as contemplated by USPTO, imposes an enormous professional liability risk on the practitioner. The practitioner is obligated to proofread the entire patent application, from top to bottom, for any corruption introduced by the USPTO’s rendering system.
Here is how the USPTO rendered this math equation:
The alert reader will notice that the USPTO inserted a spurious digit “1” into the math equation. Had I overlooked this corruption of the document by the USPTO, I might then have clicked “continue”, at which point it would have been USPTO’s position that I had agreed to accept USPTO’s change of “0.2” to “10.2”. TYFNIL the accused infringer would be able to seize upon this.
There are a dozen other places in this patent application where USPTO corrupted math equations; Equation 14 is merely the most striking so that is the one that I quoted here.
As a beta-tester of USPTO’s DOCX systems, I have used a pretty simple way of choosing which of my patent applications I am willing to subject to the risks of filing in DOCX. Basically if there is any math equation or chemical formula, or anything other than very simple alphanumerical characters, I don’t take the risk. Every now and then, on a whim, I will experiment with something like this “Equation 14” document, but I don’t risk any actual substantive rights of a client by actually clicking “submit” in such a case.
But USPTO’s proposed rulemaking would put me in the untenable position of having to pay a $400 penalty for every case that I file that has a math equation or chemical formula in it.
If USPTO wants to pursue this, USPTO should follow the example of the World Intellectual Property Organization (WIPO). Like the USPTO, WIPO of course encourages practitioners to e-file using characters rather than images. Clearly all forward-thinking patent offices need to consider ways to try to collect characters, because that is more efficient in later workflow than collecting page images.
But what does WIPO do so that practitioners are protected from the kind of risks that we see above with Equation 14? WIPO permits the applicant, at the time of filing an international patent application, to provide not only the character-based version of the patent application (XML, in the case of PCT), but also the “pre-conversion format” of the document. You can see this in Section 706 of the PCT Administrative Instructions. The idea is that if later it turns out that some flaw arose in the generation of the XML file, or some flaw in the way the XML got rendered into human-readable form, the applicant would be able to point to what the application looked like in its “pre-conversion format”.
It’s clear from this the simple thing that USPTO would need to do, as a precondition to imposing a $400 penalty for non-DOCX filings, is to make a provision for the practitioner to be able to provide a PDF version of the patent application being filed, along with the DOCX file. This PDF version would serve as the controlling version in the event that (for example) the USPTO ended up inserting a spurious “1” into a math equation.
We can then circle around to the USPTO’s disingenuous statements about DOCX. If it were really true that there is some single unambiguous DOCX standard, then this spurious “1” would never have gotten inserted into the rendered patent specification in EFS-Web. The very fact that this happened proves that USPTO is wrong when it suggests that there is some single thing called DOCX that means the same thing in EFS-Web and in all word processors.
Thank you for bringing this problem to my attention. This is a very serious issue and one which will lead to serious problems for practitioners who accept the representations of the USPTO at face value.
It is also a serious issue for practitioners who have been assuming that pdf documents created from docx files can be trusted when making submissions to the USPTO. Does anyone know if any docx-pdf conversion software have been certified for reliability?
In the case of mathematical and chemical formulae, would it not be safer to insert those as graphic image files into the docx document, so that any problems arising from conversion would be all or nothing (the image is either there or not) to simplify proofreading the pdf document?
This is indeed scary. I had flashback nightmares of trying to fix formulas in those old .xml files we had to upload. Took hours to proof just one application…..
In reply to Robert Leikes, would you trust the USPTO’s software to handle an embedded graphic correctly?
An alternative might be to include one or more sheets of equations or formulae as drawings, cross referenced to the text by equation number. I remember the days back in London when we were required to provide formula drawings for any in-line formulae and equations, so that the printers had a clean master to work from.
Thanks for this Carl.
Yet again, the USPTO refuses to learn from anyone else, tries to go it alone, and screws it up, badly.
It’s bad enough that, currently, when I submit a searchable pdf file, the USPTO takes that file, makes it unsearchable, and at the same time makes a much larger file in terms of number of bytes. And that, often, it takes decent-looking pdf drawings and renders them unintelligible.
But to expect us to trust them to do the conversions properly, and to put the onus on us to proofread a document that they’ve converted on their end, and have the chutzpa to charge us if we choose not to go along with that?!
They’ll be getting my comments.
Hmm, it is possible to copy pdfs as images into a Word document. I wonder if you could avoid the fee and the risk by submitting a docx file that contained only pdf images. I usually like to follow both the spirit and the letter of the law, but not if rules are unfair.
I never succeeded in filing a docx, so I don’t known if the original docx is kept and downloadable by the filer? If it is, and you download it and inspect the xml (unzip it), is the equation the same as filed? If so, then the docx file in PAIR should serve as the Fail-Safe for the incorrectly rendeeed pdf…?
Yes I am sure USPTO secretes a copy of the DOCX file somewhere. They never throw away anything like this. But no, USPTO’s policy is that once the practitioner has signed the adhesion contract (quoted in the blog article) in red letters, then the PDF rendering is the official document.
Are figures required to be in DocX format to avoid the $400 extortion fee? Can I file text as ‘figures’ and then delete in a preliminary amendment – this might give me a backstop.
Thoughts?
How often are you seeing errors like this, Carl? It would be one thing if the equations just didn’t show up at all, or were translated into jibberish, but to have them almost imperceptibly change — that’s something that even the closest of proofreading is not going to catch.
I think Moshe has more or less the right idea. For equation-heavy applications, I would probably just move the equations into the drawings.
The USPTO proposal to mandate DOCX filing is fundamentally misguided in that it removes applicants’ ability to control the accuracy of their specifications, claims, and abstracts. In any system of filing structured text, applicants must retain certainty in knowing that filed documents are accurate.
Applicants should not be penalized with increased fees for choosing to guarantee the accuracy of applications by filing a PDF as the official application generated under their own control. Instead, the Office should reduce fees for those who file an ISO 19005-1 compliant PDF/A document, which is fully text searchable and accessible. The Office should further reduce fees for those who, additional to their own PDF, file a DOCX version of the application with a certification of its accuracy. The supplemental DOCX file would provide the Office with their structured text without jeopardizing the official application filed in PDF. No need exists for the Office to engage in the practice of DOCX to PDF conversion.