Aspose resizing images when converting to and from PDF

Hi,

I am using Aspose word 24.4.0 to convert images to PDFs, run some processing, and revert back to their original format.

However, I’m having trouble in that often times when I revert the image back from a PDF into it’s native format, it’s showing up as a much smaller image. I thought this would be fixed by sizing the image, but it doesn’t appear to be doing that. What can I do to fix this?

Here is the code for image → PDF

with io.BytesIO(input_bytes_str) as input_bytes:
  
    # Load image into doc
    page = aw.Document()
    page_builder = aw.DocumentBuilder(page)
  
    # Determine image size in pixels
    img = Image.open(input_bytes)
    img_sizes = img.size
  
    # Insert the image at the top left corner of the page
    # Disabled pylint as there are multiple overloads of
    # insert_image method which accept the given args
    # pylint: disable=unexpected-keyword-arg
    page_builder.insert_image(
        stream=input_bytes,
        horz_pos=aw.drawing.RelativeHorizontalPosition.PAGE,
        left=0,
        vert_pos=aw.drawing.RelativeVerticalPosition.PAGE,
        top=0,
        width=img_sizes[0],
        height=img_sizes[1],
        wrap_type=aw.drawing.WrapType.SQUARE,
    )
  
    # Set page size to match image dimensions
    page_builder.page_setup.page_width = img_sizes[0]
    page_builder.page_setup.page_height = img_sizes[1]

Here is the code for PDF → image

doc = aw.Document(io.BytesIO(input_stream[0]))
  with io.BytesIO() as out_stream:
     doc.save(out_stream, save_format)
     return out_stream.getvalue()

It seems that you are experiencing issues with image size when converting images to PDF and then back to their original format using Aspose.Words. This can happen if the image resolution is altered during the conversion process, leading to smaller images upon reverting.

To address this issue, you can try the following steps:

  1. Check Image Resolution: Ensure that the resolution of the images is maintained during the conversion. When inserting the image into the PDF, you are already setting the width and height based on the original image size, which is good. However, you should also ensure that the PDF save options do not downsample the images.

  2. Configure PDF Save Options: When saving the document as a PDF, you can configure the PdfSaveOptions to prevent downsampling. This will help maintain the original quality and size of the images. Here’s an example of how to set this up:

    pdf_save_options = aw.saving.PdfSaveOptions()
    pdf_save_options.downsample_options.resolution = 300  # Set to a high resolution
    pdf_save_options.downsample_options.resolution_threshold = 300  # Prevent downsampling
    doc.save(out_stream, pdf_save_options)
    
  3. Verify Output Format: When converting back from PDF to the original image format, ensure that the output format is correctly specified and that the library is not altering the dimensions during this process.

By ensuring that the image resolution is preserved and configuring the save options appropriately, you should be able to maintain the original size of the images when converting to and from PDF.

If you continue to face issues, please provide more details about the output format and any specific settings you are using during the conversion process.

Sources:
No specific sources were referenced in this response.

Thank you so much! So i am running this program with a variety of images of all different sizes, and I won’t know exactly what size each will be. Is there a flexible way to set resolution?

@acn Have you tried converting PDF to images directly using Converter.convert_to_images method?

Also, upon converting image to PDF, try disabling the image downsampling:

opt = aw.saving.PdfSaveOptions()
opt.downsample_options.downsample_images = False
doc.save("C:\\Temp\\out.pdf", opt)

By default Aspose.Words downsamples images upon rendering document to PDF.

It seems like even with that change my original image size is 7mb while my converted and reverted image is 2.1mb. Is there something else I’m missing?

I’m also noticing the original metadata shows a color space of CMYK while the new converted and reverted shows a color space of RGB. What is the reason for that? Can I preserve the original color space?

@acn I am afraid there is no way to keep original image after image->PDF->image roundtrip. PDF format has it’s requirements to the images which can be stored in PDF document. While converting PDF to image, PDF page is rendered to a new image, it does not extract the image embedded into PDF document. So it is expected that image file size is not the same as the original one.

and there’s no way to improve even partially?

@acn You can play with ImageSaveOptions, but I am afraid there is no way to retain the original image after such roundtrip.