I am testing out the converting of PDF to PDF/A-2U.
When using the convert method of aspose.pdf.java(16.12) to convert the PDF to the PDF/A-2U format, the words in the PDF/A-2U file is spaced out and overlaps each other. I have attached screenshots to show the issue
I have tried with pdf that is converted using the aspose.words.java (16.4), with the save method from a docx file and apsose.email.java (4.7) using the msg to mhtml to pdf method.
However when testing out converting from a doc file or from other PDF files, there is no such issue.
Is converting from PDF the only way to convert to the PDF/A-2U format, can the docx or msg files be converted to the PDF/2-2U format directly?
How can i fix this issue?
Please advise.
Thanks,
Dustin.
PDF converted from Docx file image.png (11.7 KB)
After converting to PDF/2_2U format image.png (22.0 KB)
@dustin00,
There is no direct way to convert a MSG file into PDF format. You can convert a MSG file to MHT with Aspose.Email API, and then convert MHT document to PDF/2-2U with Aspose.Pdf API.
Do you require the original files?
Can you provide a code sample of how to convert MHT document to PDF/2-2U directly?
And can we convert word documents to PDF/2-2U directly too?
@dustin00,
We have tested your source PDF documents with the latest version 17.7 of Aspose.Pdf for Java API and the output PDF (A-2U) documents are fine.
@dustin00,
You can convert an HTML document to PDF/A-2U by calling the following code. Kindly list down all office file formats, we will assist you appropriately.
[Java]
// load HTML document
HtmlLoadOptions optsLoad = new HtmlLoadOptions("base path here");
Document document = new Document("html file path here", optsLoad);
// convert HTML to PDF/A-2U
PdfFormatConversionOptions opts = new PdfFormatConversionOptions("c:\\temp\\outLog.txt", PdfFormat.PDF_A_2U, ConvertErrorAction.Delete);
document.convert(opts);
document.save("c:\\temp\\outFile.pdf");
Hi, may I inquire if other formats, xls and xlxs for excel documents and ppt, pptx for powerpoint documents can be converted to PDF/A-2U format directly or is it the same as word documents where we have to convert to PDF first, then to PDF/2-2U?
Also, I have tested using version 17.7 of Apose.PDF and the issue mentioned earlier does not occur.
@dustin00,
There is no direct way to convert Excel and PowerPoint documents to PDF/2-2U. You can convert Excel and PowerPoint documents with Aspose.Cells and Aspose.Slides APIs to PDF, and then use Aspose.Pdf API to convert PDF to PDF/2-2U.
It is nice to hear from you that the problem has been resolved.
Could you advise on how to convert from a txt file to a pdf file? I tried using the code from this page https://docs.aspose.com/display/pdfjava/Converting+Text+File+to+PDF
but the Pdf , Section and Text cannot be resolved to a type. I am currently using Aspose.Pdf 17.9 to test.
I found that the PDF version seems to be defaulted to 1.5 or 1.4 after converting from Word/Excel/PowerPoint file formats. Using the method you provided when converting HTML files, the PDF version would be 1.7.
Is that a way to set the PDF version and set the format to the PDF/A-2U format in the PdfFormatConversionOptions? Or setting the PDF version when converting the Word/Excel/PowerPoint file formats as PDFs?
I have tested to convert images to pdf by first converting the image file(png/jpg/tiff/bmp) to a searchable pdf using Google Tesseract-OCR. Then, the PDF is converted to the PDF/A-2U format. However, after converting to the PDF/A-2U format, the words in the PDF cannot be highlighted. As I am using a trial version, the watermark can be highlighted, so I am unsure if the PDF has become unsearchable or not. This does not occur for the other file formats such as the doc, docx, xls. pptx, html, msg .etc. I have attached a pair of the PDFs.
Please advise on this issue.
The com.aspose.pdf.generator is a legacy approach, please use the new DOM approach and try TextFragment class to convert a text file to PDF document: Please refer to this help topic: Convert text file to PDF format.
The Document class offers a validate member to change the PDF version. Please try the following code:
[Java]
// load HTML document
HtmlLoadOptions optsLoad = new HtmlLoadOptions("base path here");
Document document = new Document("html file path here", optsLoad);
// convert HTML to PDF/A-2U
PdfFormatConversionOptions opts = new PdfFormatConversionOptions("c:\\temp\\outLog.txt", PdfFormat.PDF_A_2U, ConvertErrorAction.Delete);
document.convert(opts);
System.out.println(document.getVersion());
document.validate("C:\\temp\\outlog.log", PdfFormat.v_1_7);
document.save("c:\\temp\\outFile.pdf");
We have tested your source PDF (jpeg-pdf.pdf) with the latest version 17.9 of Aspose.Pdf for Java API and managed to replicate the problem of not highlighting the target word after search. It has been logged under the ticket ID PDFJAVA-37135 in our bug tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates.