get ready to sign this letter about DOCX please

Update:  the comment letter got finalized with one hundred seven signatures.  The signed letter got sent to the USPTO within the due date of August 7, 2023.  You can see the letter as sent here.

I also sent in a separate comment letter so as to enter three documents into the record.  You can see that separate comment letter, dated August 6, 2023, here.


Hello folks.  Today, Monday, August 7, 2023, is the last possible day to file comments in response to a Federal Register Notice entitled Agency Information Collection Activities; Submission to the Office of Management and Budget (OMB) for Review and Approval; Comment Request; DOCX Submission Requirements.  You can see the Notice here.  The Notice asks patent applicants and practitioners to comment on the estimated burden that is imposed upon them by the USPTO’s DOCX filing initiative.

The idea is that the Office of Management and Budget, under the Paperwork Reduction Act, will consider whether the burden imposed upon USPTO’s customers by this DOCX initiative is appropriate when compared with the benefit that the USPTO gains from the initiative.

Here is a comment letter that I will send in at the end of the day today.  Thank you very much to colleagues including Dana Stangel and David Boundy, who provided very cogent suggestions and corrections.  The letter is now locked.  It will not change again between now and when we submit it, except possibly to correct typographical errors or the like.  Please consider signing the letter if you have not already done so.  To do that, click here.


This is a comment by Carl Oppedahl and other signers from the EFS-Web listserv community, in response to DOCX Submission Requirements, 88 FR 37039 (June 6, 2023).  The signers of this letter have, directly or through their firms or corporations, filed more than <28000> US patent applications in the past ten years. The signers have, directly or through their firms or corporations, paid more than <$40> million in fees to the USPTO in the past ten years.

The USPTO invites comments as to the burden imposed upon patent applicants and patent practitioners by USPTO’s requirement that patent applications be filed in a USPTO-defined subset of a DOCX word processor format instead of a PDF format.

Background regarding PDF filing. It is helpful to provide a bit of background so that the reader may appreciate how the patent application filing process works from the point of view of the patent applicant or patent practitioner.

For about the past twenty years, nearly all patent applications at the USPTO have been filed in PDF format. Patent applicants and practitioners are comfortable with PDF format for patent application filing — not just because of familiarity, but because it has proven to be exceptionally reliable.

Everybody involved in the patent application process, both on the side of the patent applicants and practitioners, and on the USPTO side, is in general agreement that it is helpful if the applicant or practitioner can provide characters in a computer-readable format, rather than mere image-based pages, to the USPTO.  If the USPTO works from images, the USPTO must incur the cost of OCR (optical character recognition) to get characters:

Optical character recognition of image-based filings costs the Office approximately $3.15 per new submission.

(Setting and Adjusting Patent Fees During Fiscal Year 2020, 84 FR 37398 at 37413 (July 31, 2019).)

The majority of PDF patent applications filed at the USPTO are generated by the applicant or practitioner by “printing to PDF” from a word processor. Such PDF patent applications already contain computer-readable characters. One way to describe such PDF files is to call them “text-rich PDF” files. It is true that some relatively small fraction of PDF patent applications filed at the USPTO are generated by “scanning” physical pages in a physical scanner to generate an image-based PDF file. Nowhere in the present Notice does the USPTO state what that fraction is, but the signers of this letter believe that fraction to be below 5%. It is appreciated, however, that for the small fraction of patent applications filed at the USPTO that are filed with image-based PDFs, the USPTO really does need to incur its $3.15 OCR cost.

But as was just mentioned, the majority of PDF patent applications filed at the USPTO already contain computer-readable characters.   They are already “text-rich”.  Remarkably, and disappointingly, the need for the USPTO to perform OCR on such patent applications is largely due to a problem of USPTO’s own making, as will now be explained. USPTO’s own internal patent application system, called “Image File Wrapper”, takes any e-filed PDF document and “flattens” it into a series of TIF images, one per PDF page, and stores the TIF images. (“TIF” means “tag image format”, and a TIF file is merely an image.  Nothing in the TIF format preserves any computer-readable character content.  The USPTO chooses to store its TIF images at a relatively low resolution, which reduces the accuracy of any OCR activity.)  Thus the native storage format in USPTO’s IFW system is an image-based format. (This is easy to see from the very name of that system which is “Image File Wrapper”.)

For purposes of the present discussion of paperwork burdens which the USPTO imposes upon patent applicants, the important thing to appreciate is that this “flattening” process actively discards permanently the very computer-readable characters that the applicant or practitioner provided to the USPTO when the applicant or practitioner filed the text-rich PDF-formatted patent application.

The subsequent workflows within the USPTO which draw upon the images in the Image File Wrapper system require the USPTO to do OCR (optical character recognition) upon those images. The characters resulting from the OCR activity get used for many internal purposes within the USPTO including a publication of the application at 18 months and the issuance of the patent grant.

It is this OCR activity that costs the USPTO $3.15 per patent application.

The alert reader immediately realizes that if only the USPTO were to discontinue the “flattening” mentioned above, then for the majority of filed patent applications that are “text-rich”, that is the applications that are “printed to PDF” from a word processor, the USPTO would not need to do OCR. The USPTO could get the computer-readable characters that it needs directly from the PDF file that had been “printed to PDF” from the word processor.  The $3.15 cost per patent application would be avoided for all but the 5% of PDF patent applications that are not already “text-rich”.

Background regarding filing date requirements.  Another important area of background is the patent application filing process.  Most tasks that people carry out with other people, with corporations, and with government agencies, are things that have some prospect of being fixed or corrected later. The filing of a patent application is strikingly different from almost all other such tasks, not merely in degree but in kind. The precise content of a patent application, on the day it was filed, controls nearly everything about what can or cannot take place later during prosecution of the patent application, and what can or cannot appear in the granted patent that might issue from it. At such time as the patent is litigated, large amounts of money may change hands due to the outcome of the litigation, and the outcome of the litigation may turn on a single word or a single character or digit that appeared in the patent application as it was originally filed.

A second important thing to appreciate about patent applications is that they tend to differ from day-to-day business and legal documents by making frequent use of mathematical equations, chemical formulas, tables, Greek letters, and other special characters.  This makes the use of DOCX even riskier for patent applications than for other day-to-day documents.

Twenty years of PDF e-filing, and literally millions of patent applications during those twenty years, have reminded everyone in the patent community that the PDF format provides a clear and reliable place to look to see exactly what mathematical equation or chemical formula or table, or exactly which Greek letter or special character appeared, in the patent application as it was filed on its filing date.  In contrast, careless electronic handling of a DOCX file within the USPTO, or alteration or loss of fidelity of the DOCX file within the USPTO, cannot be fixed later.

For the patent practitioner, these two factors give rise to profound professional liability risk in the process of filing a patent application. If the practitioner were to file a patent application on some particular date, and if it were to turn out later that something went wrong in the patent application filing process, the practitioner could face enormous liability. Suppose that a Greek letter mu (“µ”) became an “m” in a numerical prefix in a math formula, as happened to one of us when we filed a patent application in DOCX format rather than PDF format. The prefix changing from µ (which means “one millionth”) to m (which means “one thousandth”) changed an important number by a factor of one thousand.

In the DOCX patent application just mentioned, the practitioner involved had the good fortune to discover the defect in the patent office e-filing system before midnight arrived, and was able to e-file a workaround. The potentially enormous professional liability risk was averted. A different practitioner might not have been so fortunate and might have discovered the defect in the patent office e-filing system only later, or it might only be discovered at the time of litigation of the granted patent.

Background regarding PDF and DOCX.  The “quality, utility and clarity of the information to be collected” by USPTO’s DOCX initiative is very poor, especially when compared with the high reliability of PDF.

It is commonplace for two different people to open a single DOCX file on two different computers and to see non-identical renderings on their computer screens. A particular person, opening a single DOCX file now or a year from now, might see a non-identical rendering now and a year from now. In the patent filing project mentioned above, a Greek letter mu (“µ”) as rendered in the practitioners’s own computer, became an “m” when rendered in the patent office’s e-filing system.

In contrast, there is the accumulated experience of by now some twenty years during which applicants and practitioners have filed literally millions of PDF patent applications at the USPTO. There have been literally millions of opportunities for a Greek letter mu (“µ”) in a PDF patent to become an “m” if the Greek letter mu (“µ”) were somehow inclined to “misbehave” in this way in the PDF file, and for the applicant and practitioner community to report such a disastrous event among themselves and to report it to the USPTO, were it to happen. Twenty years have come and gone, and literally millions of PDF patent applications have been filed at the USPTO, and the applicant and practitioner community is not abuzz with anecdotes of instances of, for example, a Greek letter mu (“µ”) becoming an “m” when PDF was used as the e-filing format.  This is because the path through most word processors to generate a PDF is the same path as is used for displaying characters on the screen, and for printing — a path that is simple and well-tested.

The USPTO has in mind generally trying to force the applicant and practitioner community to give up the PDF format that has accumulated twenty years of trust, and has in mind forcing the use of DOCX.

Almost nobody else uses DOCX.  The USPTO is an outlier in its forced migration to DOCX.  The PDF format is the most widely used format for other legal documents in both law and business.  PDF format is routinely used for other goverment filings, including forms and court filings, and is widely accepted in the US and internationally for legal documents such as business contracts.

Continuing-legal-education (CLE) webinars have taken place about the use of DOCX for the filing of patent applications at the USPTO. Here are some of the dates and titles:

The three e-filing paths that will be available, as planned by the USPTO, are the following:

    • File in PDF. This is the format that has been trusted for the past twenty years. This is the format that has been employed literally millions of times without incident. To make use of this filing format, the applicant or practitioner will need to pay a $400 penalty. (The penalty would be smaller for “small entity” or “micro entity” filers.) This gives three sub-paths:
      • File in PDF for an undiscounted filer.
      • File in PDF for a small entity filer.
      • File in PDF for a micro entity filer.
    • File in DOCX. This is the very risky format that has prompted CLE programs such as those mentioned above.
    • File in DOCX and also file a supplemental PDF file. The USPTO had resisted this approach for the past two years, but a recent Federal Register notice has made a sort of settlement offer to the applicant and practitioner community. See Extension of the Option for Submission of a PDF With a Patent Application Filed in DOCX Format, June 6, 2023, 88 FR 37036.

The “file in PDF” approach is well understood by all members of the patent community and by the USPTO. The USPTO is for example accustomed to its obligation to issue a Certificate of Correction if a patent owner points out a mistake made by the USPTO in the issuance of a patent grant, relative to the content of the PDF-formatted patent application as originally filed. Twenty years of USPTO’s answering for its mistakes, when they occur, have led to mutual trust between all parties concerned.

The “file in DOCX” approach is fraught with risk. It is, of course up to any individual applicant or practitioner to arrive at his or her or its own decision as to whether to gamble that the version USPTO’s engine for rendering a DOCX file into a human-readable rendering that is in effect at the time of issuance will “get it right” or whether the rendering might differ from what the USPTO’s rendering engine yielded at the time of the application e-filing process. (The USPTO’s engine for rendering DOCX files into human-readable form was, by December 22, 2022, up to version 18, as revealed by USPTO Director Vidal’s blog, and by now has gone through at least one more version change.  On practitioner email listservs, new bugs in the USPTO’s DOCX e-filing system are reported every few weeks — it’s not clear that reliability is improving over time.)

One guesses that some applicants and practitioners who wish to avoid having to pay the $400 penalty but who wish to avoid at least some of the risks of the “file in DOCX” approach will be tempted to make use of the third path, namely “File in DOCX and also file a supplemental PDF file.” It is important to keep in mind that this path is not itself risk-free, for several reasons.

In the USPTO’s present DOCX initiative, it can be very hard to establish whether a mistake was introduced in the e-filing process, or to establish whose fault the mistake was, and the USPTO systems do not currently provide adequate safeguards.

The USPTO has not offered any assurances that it will preserve the supplemental PDF file intact. Indeed, test filings by one of the signers of this letter indicate that the supplemental PDF gets halftoned, which degrades the quality of the file. The USPTO ought to preserve the supplemental PDF without degradation, for example by saving it to the Supplemental Content (SCORE) system.

The USPTO has not offered any assurances that it will provide in the acknowledgment receipt a Message Digest that faithfully memorializes the actual supplemental PDF file that was uploaded by the filer. In the present-day DOCX e-filing system, the USPTO fails to provide in the acknowledgment receipt a Message Digest that faithfully memorializes the actual DOCX file that was uploaded by the filer. This makes it impossible for the filer to prove what was in the actual DOCX file that the filer uploaded to the USPTO e-filing system. This misdesign of the DOCX aspect of USPTO’s e-filing system leads to a concern that now or at some future time, the USPTO might do a similar misdesign with respect to uploaded PDF files.  Thus there is a concern that in the future, the USPTO might fail to provide in the acknowledgment receipt a Message Digest that faithfully memorializes the actual supplemental PDF file that was uploaded by the filer. The USPTO needs to commit to providing, in the acknowledgment receipt, a Message Digest that faithfully memorializes the actual supplemental PDF file that was uploaded by the filer.

The USPTO has not actually said that the supplemental PDF file “controls” or is “authoritative”. In the legacy PDF e-filing approach with its twenty-year history, there is no question that the PDF file controls and is authoritative. Any request for a Certificate of Correction is based directly upon the PDF file. In contrast, in the present DOCX e-filing user interface, the filer can proceed to a submission only after clicking an agreement that a USPTO-generated DOCX file is the controlling document. This does not seem to leave room for the supplemental PDF file to somehow trump the USPTO-generated DOCX file as the controlling document.  It is also noted that the USPTO has offered no burden estimates for petitions to seek correction based upon the supplemental PDF file.

The “accuracy of the agency’s estimate of the burden of the proposed collection of information” is off by about a factor of six.  Having provided this background, we can now provide estimates of the burdens imposed by the USPTO upon applicants and practitioners.

We will start with estimates of burdens as they relate to exemplary individual patent applications. We will then take USPTO’s estimates of the number of applications that might get filed in a year, that would fall into various categories, and multiply the numbers together, to arrive at overall burden estimates.

    • File in PDF for an undiscounted filer, if not filed pro se. The government-fee portion of this burden is easy to quantify, being $400. For a practitioner, there is the further burden of having to spend time to explain to the client why the trusted and safe PDF path is needed rather than the riskier DOCX path, along with the time to pay the government fee to the USPTO during the e-filing process, and the time and paperwork to bill the government fee through to the client. I estimate this time and paperwork burden at a minimum of $150. The total burden (for a case that is not filed pro se) is at least $550.
    • File in PDF for a small entity filer, if not filed pro se. The government-fee portion of this burden is easy to quantify, being $160. For a practitioner, there are the just-mentioned time and paperwork burdens of at least $150. The total burden (for a case that is not filed pro se) is at least $310.
    • File in PDF for a micro entity filer, if not filed pro se. The government-fee portion of this burden is easy to quantify, being $80. For a practitioner, there are the just-mentioned time and paperwork burdens of at least $150. The total burden (for a case that is not filed pro se) is at least $230.

The burden imposed for the DOCX path or the DOCX-plus-supplemental-PDF path is substantial, as will now be discussed.

The USPTO seems to estimate only the burdens connected with the e-filing process itself. By this, the USPTO seems to mean an extra half hour of time that is inserted into the e-filing process. The idea is that the filer spends time entering information into a USPTO patent e-filing system (for example Patentcenter) and uploading a patent application. Then the filer downloads the USPTO’s modified DOCX file and reviews it, hoping to catch any defects that the USPTO’s validation and rendering engine might have introduced. The filer also optionally uploads a “supplemental PDF” version of the patent application, in the hopes that at some later time, perhaps when the issued patent is being litigated, if some problem in the USPTO’s handling of the DOCX file becomes apparent, the supplemental PDF file might be of some help to the patent owner in overcoming the problem.

The USPTO estimate of this burden, as just mentioned, is the value of one-half hour of a patent practitioner’s time during the e-filing process. The practitioner’s time is estimated to be worth $435 per hour and so the burden is estimated by the USPTO at $217.50 per patent application filed.

This estimate by the USPTO fails to take into account the many later steps that the applicant or practitioner will need to carry out if the filing path selected was a DOCX filing path rather than PDF.

If the filing path selected was DOCX, then when the USPTO carries out the 18-month publication, the USPTO will use its then-current version of its DOCX rendering engine. As mentioned above, on December 22, 2022 (according to Director Vidal’s blog) the USPTO’s DOCX rendering engine was up to version 18. When the time comes for a particular patent application to get its 18-month publication, the rendering engine will have changed many times and will be at some version number much higher than 18.  (Let’s say it is version 30.)

Under the legacy PDF filing approach, which has been in use for twenty years, and which has been employed in literally millions of patent application filings without incident, the error rate for which the applicant is liable is zero — the PDF document is the “best evidence” copy (Federal Rule of Evidence 1002), and if the published patent application has an error, there’s no harm.  But under DOCX filing, the USPTO will insist that the copy that the USPTO generated “on the fly” during the e-filing process becomes the authoritative copy.  (It is this copy that the application e-filer is required to agree will be authoritative by checking a box, for the application “submit” button to work.)

The applicant or practitioner will have little choice but to cross-check the 18-month publication, perhaps character by character, against what the applicant or practitioner thought they originally filed. Given that the USPTO’s own DOCX rendering engine will be at version 30, the result of the rendering might or might not be the same.

Some applications are very simple and contain no math equations, no chemical formulas, no tables, and no Greek letters, and are only a few pages in length. The cross-check might only take an hour. Other applications are more complex and have a higher page count. As I think of the variety of applications I have filed in recent years, I estimate the average number of hours required to be three hours. Using USPTO’s figure for the value of a practitioner’s time ($435), this is a burden per patent application of $1305.

When a patent issues, the then-current version of USPTO’s DOCX rendering engine will have changed several times again from version 30.  (Let’s say it is up to version 40.)  There will also likely have been amendments at least to the claims and perhaps to other parts of the patent application. The applicant or practitioner will have little choice but to cross-check the issued patent, perhaps character by character, against what the applicant or practitioner expected. Again I estimate the time required at three hours. This again yields a burden per patent application of $1305. Not every application leads to an issued patent. The actual allowance rate averages around 50%. So the burden per patent application when this factor is taken into account is about $652.50.

The total burden (the USPTO’s estimated half hour during the e-filing process, plus the burden relating to the 18-month publication, plus the burden relating to the issued patent) is $2175.

We can now do the math, following for sake of discussion the approach laid out in the Notice.

The USPTO assumes for sake of discussion that about 411817 patent applications get filed per year. We will use that same number for sake of discussion.

Cases in which filer choses to pay penalty. The USPTO projects that about 40% of applications would get filed with the penalty, that is to say, that in about 40% of cases, the filer would stick with the same safe and trusted PDF filing approach that has been used for the past twenty years. The USPTO projects that about 9% would be micro entity, about 29% would be small entity, and the rest would be undiscounted. I assume that half of the filings would be pro se.

    • Non-discounted entity, pro se. 51047 filings at $400, burden $20,419,000.
    • Small entity, pro se. 23703 filings at $160, burden $3,792,480.
    • Micro entity, pro se. 7282 filings at $80, burden $582,600.
    • Non-discounted entity, represented. 51047 filings at $550, burden $28,076,125.
    • Small entity, represented. 23703 filings at $310, burden $7,347,930.
    • Micro entity, represented. 57282 filings at $230, burden $1,674,975.

The total estimated burden for those who choose the safe and trusted path (who choose to pay the penalty) is an annual burden of $61,893,110.

Cases in which filer chooses to follow the new DOCX path. The USPTO projects that in about 60% of the applications, the filer would choose to follow one of the DOCX filing paths (with or without the supplemental PDF). This works out to about 247751 applications annually. The burden per application at filing time is estimated by the USPTO at $217.50. This number is surely unrealistically low but for purposes of discussion I leave this USPTO-provided number undisturbed. We then add the estimated burden for cross-checking and proof-reading at the time of 18-month publication and at the time of patent issuance, and the total estimated burden per application is $2175.  Multiplying 247751 by $2175 yields an annual burden of $538,858,425.

Adding these two numbers (burden on the filers who pay extra to keep the safe and trusted filing path, and burden on the filers who choose to follow the newer DOCX path) yields an annual estimated annual burden of $600,751,535.

Comparing estimated burden with estimated benefits to the USPTO.

The alert reader might choose to compare the estimated burden of the DOCX initiative, as presently envisaged by the USPTO, with the estimated benefit to the USPTO.

The estimated burden on the applicants and practitioners is $600,751,535. The estimated benefit to the USPTO is the avoided OCR cost which is $3.15 times 247751 or about $780,415. This ratio is around 769 to 1.

It is respectfully suggested that the USPTO ought not to be permitted to proceed with a program that places a burden on the USPTO’s customers that is estimated to be 769 times the magnitude of the benefit to the USPTO of that program.

The way for the USPTO to “minimize the burden of the collection of information on those who are to respond, including through the use of automated collection techniques” is to scrap the part of its existing internal workflow that flattens text-rich PDF files so as to discard the computer-readable characters within those files.  If the USPTO were to do so, it would not need to spend the $3.15 to carry out the OCR task, for the 95% of filed PDF applications that are text-rich. The benefit to the USPTO of its DOCX initiative would then drop to about $39,021.  The cost to get there would be zero.

The USPTO would actually come out ahead by cutting loose its DOCX albatross — because of the faulty initial design choice, and Microsoft’s ongoing mutation of DOCX, the USPTO can never debug its DOCX validation and rendering engine to a state of reliability.  Failing to cut it loose would be a constant drain.

With the scrapping of the part of USPTO’s existing internal workflow that flattens text-rich PDF files so as to discard the computer-readable characters, the ratio of burden to benefit would then become about 15391 to 1.

Respectfully submitted,

(signers)


Please consider signing the letter if you have not already done so.  To do that, click here.