PDF generated by Aspose Words doesn't have PDF 2.0 standard DateTime format and cause Exception during ABBYY OCR

We use this Docx file
Pia_Petersen_Entgeltnachweis.docx (216,9 KB) and Aspose Words Java to generate PDF, then use ABBYY OCR the generated PDF file.

However ABBYY OCR gives following exception:

Error code: 260014, Timestamp : Tue Oct 10 11:49:55 CEST 2023, Message: error while analyzing document and creating ocr in OcrProvider, internal msg: error while analyzing document and creating ocr in OcrProvider: Finereader Engine failed to create export file. java.lang.Throwable: Finereader Engine failed to create export file. 
at de.forcont.addon.frecli.CommandLineInterface.processDocument(CommandLineInterface.java:537) 
at de.forcont.addon.frecli.CommandLineInterface.main(CommandLineInterface.java:122) Caused by: com.abbyy.FREngine.EngineException: The creation date "2020-10-21T15:02:00Z" cannot be written in the document. Please specify the date in the correct format. 
at com.abbyy.FREngine.IFRDocument.Export(Native Method) 
at de.forcont.addon.frecli.CommandLineInterface.processDocument(CommandLineInterface.java:522) ... 1 more DocID: null | Pia_Petersen_Entgeltnachweis.docx.pdf

There is such a link at ABBYY website on this exception:

https://support.abbyy.com/hc/en-us/articles/360011978019--The-creation-date-cannot-be-written-in-the-document-Please-specify-the-date-in-the-correct-format-error-in-FineReader-Engine-12

The declared error is related to the PDF export, which is expected in case of incorrect dates. In FineReader Engine 12 R3 and newer, the creation and modification dates can be viewed and changed. For that, only the dates in correct format can be written into the documents. In case of the error, the date should be specified in a correct format or the writing mode should be changed (WriteCreationDate property of the DocumentContentInfoWritingParams Object).

The output document must have valid format: D:YYYYMMDDHHmmSSOHHā€™mm, as specified by the PDF 2.0 standard.

It should be a bug of Aspose Words.

@zwei Aspose.Words writes date in correct D:YYYYMMDDHHmmSSOHHā€™mm format. Could you please attach PDF document produced on your side that causes the problem? We will check it and provide you more information.

Pia_Petersen_Entgeltnachweis.docx.pdf (83,8 KB)

Thank you very much Alexey, here it is the PDF document, generated by Aspose and triggered the ABBYY Exception.

@alexey.noskov I think we find out the reason, yes, it should be a bug in our module, and this exception has nothing to do with Aspose.

Thank you again, Š”ŠæŠ°ŃŠøŠ±Š¾!

@zwei It is perfect that you managed to find the reason of the problem. Please feel free to ask in case of any issues, we are always glad to help.

1 Like

Sorry, it looks that it is a bug of Aspose.

Here it is our Docx test file
Pia_Petersen_Entgeltnachweis.docx (216,9 KB)

Here it is the Aspose generated PDF file
abbyyDatumException.pdf (84,5 KB)

Now just open this PDF file with Notepad++, then you will find following XML in XMP sector
<xmp:CreateDate>2020-10-21T15:02:00Z</xmp:CreateDate>

And yes, this XML caused the ABBYY exception, because it is not D:YYYYMMDDHHmmSSOHHā€™mm.

@zwei The date format is correct for XMP PDF metadata. Here is quote from XMP specification:

Date
A date-time value which is represented using a subset of ISO RFC 8601 formatting, as described in
http://www.w3.org/TR/Note-datetime.html. The following formats are supported:
YYYY
YYYY-MM
YYYY-MM-DD
YYYY-MM-DDThh:mmTZD
YYYY-MM-DDThh:mm:ssTZD
YYYY-MM-DDThh:mm:ss.sTZD

YYYY = four-digit year
MM = two-digit month (01=January)
DD = two-digit day of month (01 through 31)
hh = two digits of hour (00 through 23)
mm = two digits of minute (00 through 59)
ss = two digits of second (00 through 59)
s = one or more digits representing a decimal fraction of a second
TZD = time zone designator (Z or +hh:mm or -hh:mm)

The time zone designator is optional in XMP. When not present, the time zone is unknown, and software should not assume anything about the missing time zone.
It is recommended, when working with local times, that you use a time zone designator of +hh:mm or
-hh:mm instead of Z, to aid human readability. For example, if you know a file was saved at noon on
October 23 a timestamp of 2004-10-23T12:00:00-06:00 is more understandable than
2004-10-23T18:00:00Z.

However it cause an exception of ABBYY and the ABBYY Exception said that the date format is falseā€¦ Perhaps it is a bug of ABBYY?

@zwei Most likely it is bug in ABBYY, since other PDF consumers and validators considers PDF produced by Aspose.Words as valid.

1 Like

Thanks, I will contact ABBYY Helpdesk for this issue.

1 Like

The timezone generated by Aspose is false, here the timezone is ā€œZā€ which is US eastern time, but we are in Germany.

@zwei
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-26111

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

1 Like

The issues you have found earlier (filed as WORDSNET-26111) have been fixed in this Aspose.Words for Java 23.12 update.