The USPTO published a Federal Register notice entitled Setting and Adjusting Patent Fees during Fiscal Year 2020, dated August 2, 2020 (85 FR 46932). This is the FR notice that communicates the USPTO’s conclusion that if we are going to force applicants to change from what they were doing in the past, and in particular if we are going to force them henceforth to hand in some particular format for US patent applications, then we at the USPTO know what’s best, and what’s best is not some particular flavor of PDF. What’s best (according to the USPTO) is Microsoft Word DOCX format.
The Federal Register notice said, in four places:
The USPTO conducted a yearlong study of the feasibility of processing text in PDF documents. The results showed that searchable text data is available in some PDFs, but the order and accuracy of the content could not be preserved.
As soon as we saw this, many members of the practitioner community wondered what was in the “yearlong study”? What was there in this “yearlong study” that led to a conclusion that Microsoft Word DOCX format was supposedly the better format to try to force applicants and practitioners to file, rather than some particular PDF format?
One member of the practitioner community filed a FOIA request at the USPTO, asking for a copy of the “yearlong study”. This was ten months ago. The people at the USPTO whose job it is to
fight FOIA requests comply with the FOIA law have fought tooth and nail to keep from having to hand over the “yearlong study” and have not handed it over even now after ten months. And indeed almost everything about the USPTO’s way of forcing the Microsoft DOCX format upon applicants and practitioners has led to an adversarial relationship between the USPTO and a substantial portion of the practitioner community.
So it was very much a breath of fresh air when, earlier today, at my request, Acting Commissioner Andrew Faile sent me a copy of the “yearlong study”. I think Acting Commissioner Faile is trying to be more open and candid with the practitioner community now in recent months.
I have done a quick read of the yearlong study and you can read my initial conclusions here.
7 Replies to “What is in the “yearlong study” that supposedly says DOCX is the right path?”
Carl, thanks for getting the document, thanks for reading it, thanks for making it available, and thanks for your analysis of it.
Unfortunately, that analysis leaves me more than a little p.o.ed – not at you, of course, but at the PTO, for being such unbelievable boneheads about this for so long, and wasting so much of our time with what is at best incompetence and at worst malice. And at the people who wrote the study, who, if they actually knew that there was such a thing as PDF-A, could have told that to the PTO, but didn’t because they wanted to collect their fee for conducting the “study” (can we in theory find out via a FOIA request who conducted the “study” and how much it cost?).
But at least now there’s a basis for getting a PI or TRO if the PTO attempts to force docx down our throats. Hopefully it won’t come to that, and Mr. Faile will prevail upon his colleagues to put a stop to the PTO’s stupidity in this matter.
When I tried to read the study, it was only one page. Is there more?
Oh sorry. It was provide to me as DOCX. I opened it in Libre Office. I then tried to save it as PDF. Somehow it lost 23 of the 24 pages. I tried again. Hopefully now all 24 pages are there.
24 pages now. Many thanks.
Thank you for commenting to let me know of the problem in the first place! You are most welcome.
I believe the PTO’s failure to consider some specific flavor or variant of PDF is a red herring, and is at most cumulative of the generally disingenuous nature of the PTO’s inquiry about how to obtain text of patent application initial filings and follow-on submissions for further processing within the Office.
Even without using PDF/A, PDF/UA, or some variant thereof, the vast majority of PDF documents produced from conventional word processors already include text that can be extracted in normal order.
There are some features that would render the extraction imperfect, the most relevant to patent applications being in-text mathematical or chemical formulae, and the like. (There are a variety of other word-processing features that could also cause problems, but they are mostly not used in non-provisional patent applications, because that material is usually consigned to the drawings.)
For the many patent applications filed with a description that contains nothing but text, there’s a very good chance that all of the text could be extracted from a PDF document produced by a conventional word processor.
Even if a filer files in DOCX format, there’s no guarantee that formulae or other special formatting can be correctly extracted as structured text, due (at least) to (a) incompatibilities among word processing programs and (b) filers who may choose to incorporate the formulas (or other special features) as images to avoid the risk the filer will introduce an error when they transcribe the formula (etc.) from the inventor’s disclosure materials.