Resolution mismatch in extracted pdf

Mahi39 · July 21, 2022, 5:50am

Hi Team,

I’ve extracted the image from document(.docx). An extracted pdf image using aspose resolution was very low(around 200dbi), l manually converted Docx to pdf using acrobat pro, and the image resolution was high(around 700 dbi) . Please help me, with how to retain the resolution using aspose extraction.

Thanks in advance.

Aspose pdf: Fig0003.pdf (120.9 KB)

Manual pdf: acro-pages-11.pdf (1.7 MB)

alexey.noskov · July 21, 2022, 7:22am

@Mahesh39 Upon saving to PDF Aspose.Words downsamples the images by default to reduce the output document size. You can disable downsampling using PdfSaveOptions.DownsampleOptions. See the following code example:

Document doc = new Document("C:\\Temp\\in.docx");
        
PdfSaveOptions options = new PdfSaveOptions();
options.getDownsampleOptions().setDownsampleImages(false);
        
doc.save("C:\\Temp\\out.pdf", options);

Mahi39 · July 27, 2022, 12:02pm

Hi @alexey.noskov,

I’m sorry for the trouble. This code is not functioning correctly.

Input: Input.zip (1.3 MB)

Resolution details (Original dbi to converted pdf dbi):
Bookmark 1.docx --> 300dbi to 219dbi
Bookmark 2.docx --> 350dbi to 220dbi
Bookmark 3.docx --> 400dbi to 220dbi
Bookmark 4.docx --> 450dbi to 219dbi
Bookmark 5.docx --> 500dbi to 219dbi
Bookmark 6.docx --> 600dbi to 219dbi

Konstantin.Kornilov · July 27, 2022, 1:08pm

@Mahesh39 Do I understand correctly that you want to retain the resolution value stored in the image file when saving document to the PDF? Unfortunately when using PdfImageCompression.AUTO most of the images are stored in the PDF file with /Flate PDF compression algorithm. In this case only raw image data is stored and compressed and all additional information like resolution is lost. This is peculiarities of the PDF format. You could try to use PdfImageCompression.JPEG. In this case images will be stored in PDF document in JPEG format with additional information. Resolution value should be retained this way.

P.S. In all input documents provided by you the TIFF images have resolution of 220dpi.

Mahi39 · July 27, 2022, 2:19pm

Hi @Konstantin.Kornilov,

Thanks for your reply.

Could you please suggest to me,

How to retain the resolution using pdfImageCompression.JPEG?
How to find the image format(jpeg or tiff) in the document?

I would appreciate it if you could share the sample code.

Konstantin.Kornilov · July 27, 2022, 3:33pm

@Mahesh39 To retain resolution using PdfImageCompression.JPEG you could use following code:

Document doc = new Document("C:\\Temp\\in.docx");
        
PdfSaveOptions options = new PdfSaveOptions();
options.getDownsampleOptions().setDownsampleImages(false);
opt.setImageCompression(PdfImageCompression.JPEG);

doc.save("C:\\Temp\\out.pdf", options);

To get the image format manually you could unzip the docx file and check the word\media folder.
To get the image format programmatically you could use following code:

Iterable<Shape> shapes = doc.getChildNodes(NodeType.SHAPE, true);
    for (Shape shape : shapes) {
        if(shape.hasImage())
            System.out.println(shape.getImageData().getImageType());

Mahi39 · July 27, 2022, 3:58pm

Thanks, @Konstantin.Kornilov.

For the tiff image document, is it feasible to retain the resolution?

Konstantin.Kornilov · July 27, 2022, 5:40pm

@Mahesh39 If the image resolution is important in your case then it will be better if you perform the TIFF->JPEG conversion in AW DOM by yourself to be sure. Then the JPEG image data will be stored in the output PDF as is. You could get the image bytes with the Shape.getImageData().getImageBytes() method and set the converted image byte with the Shape.getImageData().setImageBytes() method.