We use this Docx file Pia_Petersen_Entgeltnachweis.docx (216,9 KB) and Aspose Words Java to generate PDF, then use ABBYY OCR the generated PDF file.
However ABBYY OCR gives following exception:
Error code: 260014, Timestamp : Tue Oct 10 11:49:55 CEST 2023, Message: error while analyzing document and creating ocr in OcrProvider, internal msg: error while analyzing document and creating ocr in OcrProvider: Finereader Engine failed to create export file. java.lang.Throwable: Finereader Engine failed to create export file.
at de.forcont.addon.frecli.CommandLineInterface.processDocument(CommandLineInterface.java:537)
at de.forcont.addon.frecli.CommandLineInterface.main(CommandLineInterface.java:122) Caused by: com.abbyy.FREngine.EngineException: The creation date "2020-10-21T15:02:00Z" cannot be written in the document. Please specify the date in the correct format.
at com.abbyy.FREngine.IFRDocument.Export(Native Method)
at de.forcont.addon.frecli.CommandLineInterface.processDocument(CommandLineInterface.java:522) ... 1 more DocID: null | Pia_Petersen_Entgeltnachweis.docx.pdf
There is such a link at ABBYY website on this exception:
The declared error is related to the PDF export, which is expected in case of incorrect dates. In FineReader Engine 12 R3 and newer, the creation and modification dates can be viewed and changed. For that, only the dates in correct format can be written into the documents. In case of the error, the date should be specified in a correct format or the writing mode should be changed (WriteCreationDate property of the DocumentContentInfoWritingParams Object).
The output document must have valid format: D:YYYYMMDDHHmmSSOHHāmm, as specified by the PDF 2.0 standard.
@zwei Aspose.Words writes date in correct D:YYYYMMDDHHmmSSOHHāmm format. Could you please attach PDF document produced on your side that causes the problem? We will check it and provide you more information.
@zwei The date format is correct for XMP PDF metadata. Here is quote from XMP specification:
Date
A date-time value which is represented using a subset of ISO RFC 8601 formatting, as described in http://www.w3.org/TR/Note-datetime.html. The following formats are supported:
YYYY
YYYY-MM
YYYY-MM-DD
YYYY-MM-DDThh:mmTZD
YYYY-MM-DDThh:mm:ssTZD
YYYY-MM-DDThh:mm:ss.sTZD
YYYY = four-digit year
MM = two-digit month (01=January)
DD = two-digit day of month (01 through 31)
hh = two digits of hour (00 through 23)
mm = two digits of minute (00 through 59)
ss = two digits of second (00 through 59)
s = one or more digits representing a decimal fraction of a second
TZD = time zone designator (Z or +hh:mm or -hh:mm)
The time zone designator is optional in XMP. When not present, the time zone is unknown, and software should not assume anything about the missing time zone.
It is recommended, when working with local times, that you use a time zone designator of +hh:mm or
-hh:mm instead of Z, to aid human readability. For example, if you know a file was saved at noon on
October 23 a timestamp of 2004-10-23T12:00:00-06:00 is more understandable than
2004-10-23T18:00:00Z.
@zwei
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): WORDSNET-26111
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.