Aspose.Words Losing Text / Spacing of PDF When Merging

jcgrigg2 · June 25, 2022, 10:37pm

Hi. I feel like this might be any easy fix but I haven’t been able to figure out how to let Aspose.Words correctly merge a DOCX and PDF without cramming / not showing some of the text in the PDF fields. It also seems to add a random page break to the final output. I’ve tried the 3 different format switches but no luck with either. Any help is much appreciated!

alexey.noskov · June 26, 2022, 4:51am

@jcgrigg2 Could you please attach your problematic input document here for testing? We will check the issue and provide you more information.

jcgrigg2 · June 26, 2022, 4:26pm

Hi. Thanks for the quick reply. Attached is the blank (unfilled) version of the form. The form is ultimately saved as a regular PDF (not XFA) and pretty much any text is enlarged outside of its box or bleeds out into surrounding areas. Thanks!af911.pdf (79.0 KB)

alexey.noskov · June 26, 2022, 6:54pm

@jcgrigg2 Could you please clarify your scenario? Do you fill the attached PDF document with data programmatically and then convert the resulting document to DOCX? Could you please provide the code that will allow us to reproduce the problem? Also, it would be great if you attach the problematic output document too. This will allow us to better understand the problem.

jcgrigg2 · June 26, 2022, 8:19pm

Not a problem. So the user manually fills out the AF911 form and uploads it to our platform (filled out version attached). Then our system takes that file and combines it with another file that our system has on file (VMPF template docx attached) and makes sure both of them are PDFs, then we download the final version. However, in the final PDF version, the AF911 becomes scrambled (PDF Error attachment). Also attached is our convert and merge code. Hope this helps. Thanks again!af911_FILLED.pdf (282.6 KB)
PDF Error.pdf (216.8 KB)
VMPF Template.docx (13.6 KB)

def convert_and_merge(word_doc):
    aw.License().set_license(BytesIO(sw_context.inputs['license'].encode()))
    output = aw.Document()
    output.remove_all_children()

    out_pdf = BytesIO()
    word_doc_pdf = aw.Document(word_doc)
    #word_doc_pdf.save(out_pdf, aw.SaveFormat.PDF)
    output.append_document(word_doc_pdf, aw.ImportFormatMode.KEEP_SOURCE_FORMATTING)
    for f in files:
        if sw_context.inputs[f]:
            doc = aw.Document(BytesIO(base64.b64decode(sw_context.inputs[f][0]['base64'])))
            output.append_document(doc, aw.ImportFormatMode.KEEP_SOURCE_FORMATTING)

    final_pdf = BytesIO()
    output.save(final_pdf, aw.SaveFormat.PDF)
    final_pdf.seek(0)
    return final_pdf

alexey.noskov · June 27, 2022, 4:58am

@jcgrigg2 Tank you for additional information. I have managed to reproduce the problem. For a sake of correction it has been logged as WORDSNET-24032. We will keep you updated and let you know once the problem is resolved or we have more information.

jcgrigg2 · July 7, 2022, 2:37pm

Hi. We were doing some more testing and found out something interested, which might be related. We tried to convert our PDF to a PNG first, then attach it to our other first PDF file. This worked, but only through the Web Converter and the Web Merger (Convert Files Online - Word, PDF, HTML, JPG And Many More).

However, when we used the Web “Python” (Convert Word, PDF And Many Other File Formats Using Python) it doesn’t work; it adds in some spacing, etc.

So not sure the difference between Web / Web Python but maybe that will shed some more light on it.

Thanks!

alexey.noskov · July 7, 2022, 3:24pm

@jcgrigg2 The problem is that when you load PDF document into Aspose.Words Document object, PDF is converted to flow format, which is natural for MS Word document, Aspose.Words is designed to work with.
In .NET version there is a way to convert PDF document directly to Fixed Page formats like image, this conversion is more accurate in sense of layout. I have logged a feature request WORDSNET-24092 to add this feature to Python version too. We will keep you informed and let you know once it is available.

aspose.notifier · August 3, 2022, 5:59am

The issues you have found earlier (filed as WORDSNET-24032) have been fixed in this Aspose.Words for .NET 22.8 update also available on NuGet.

aspose.notifier · July 9, 2024, 4:38am

The issues you have found earlier (filed as WORDSNET-24092) have been fixed in this Aspose.Words for .NET 24.7 update also available on NuGet.