Help in appending word document converted from PDF by ASPOSE.PDF

Hello Team,

We are trying to add few documents at runtime at specific location in a Main document. The documents that we trying to add are converted to doc from pdf using ASPOSE.pdf. The documents are getting added successfully but the issue is with margin, because of which the format is getting messed up.

Please help how to do the same.

Thanks
Hardik

@HardikS

Could you please attach the following resources here for testing?

  • Your input Word document.
  • Please attach the output Word file that shows the undesired behavior.
  • Please attach the expected output Word file that shows the desired behavior.
  • Please create a standalone console application (Aspose.Words’ source code without compilation errors) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

We are preparing the points you asked for. We will upload the same once we are ready with it.

@HardikS

Thanks for your cooperation. Please share the requested resources here for testing. We will investigate the issue and provide you more information on it.

Please find attached zip file. The folder Files contains one “OriginalFile.docx” file which is our main file. Then we have one PDF file which is “PDFFile.pdf”.

Now we want to append this PDF to the word document, so we first converted the PDF to word using ASPOSE.PDF which converts it to DOC successfully and does not have any rendering issues.

However when we append this newly converted word document to Original File, the interface is getting distorted. Rendered File is available in SaveFiles folder.

The folder Aspose Word Append contains the C# Source code of the same solution. Please check and help us how can we have appended document with perfect UI.

The link can be available on below path.

Thanks
Hardik

@HardikS

You are using very old version of Aspose.Words. Please upgrade to the latest version of Aspose.Words for .NET 21.10 and use following modified code to achieve your requirement. With the latest version of Aspose.Words, you do not need to use Aspose.PDF.

var document = new Document(PDFFile);
document.Save(ReportSavePath + DestWord);

_Docreplacement = Path.Combine(ReportSavePath + DestWord);
FindReplaceOptions findReplaceOptions = new FindReplaceOptions();
findReplaceOptions.ReplacingCallback = new InsertDocumentAtReplaceHandler();
srcMainDoc.Range.Replace(new Regex("%ENTITY_ATTACHMENT%"), "", findReplaceOptions);

string Dest = "21.10.doc";
srcMainDoc.Save(ReportSavePath + Dest, SaveFormat.Doc);


private class InsertDocumentAtReplaceHandler : IReplacingCallback
{
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs args)
    {
        Document subDoc = new Document(_Docreplacement);
        DocumentBuilder builder = new DocumentBuilder((Document)args.MatchNode.Document);
        // Insert a document after the paragraph, containing the match text.
        Paragraph para = (Paragraph)args.MatchNode.ParentNode;
        builder.MoveTo(para);
        builder.InsertDocument(subDoc, ImportFormatMode.KeepSourceFormatting);

        // Remove the paragraph with the match text.
        para.Remove();
        return ReplaceAction.Skip;
    }
}

Hi Tahir,

Thanks for the updates. We downloaded the mentioned version and tried implementing the same code that you mentioned to us. However it is still not working properly. The first PDFTOWORD is working perfectly fine, but when we insert the converted document to main document, the UI is getting distorted.

Please find attached final word document.
21.1020211008-152343.zip (112.1 KB)

@HardikS

We have tested the scenario and managed to reproduce the same issue at our side. There are some formatting issue in output DOCX generated from PDF using Aspose.Words. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-22833. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

@HardikS

We have closed this issue WORDSNET-22833 as ‘Not a Bug’. Please use PageSetup.LeftMargin property as shown below to avoid this issue.

var document = new Document(PDFFile);

// Add this to make left margin in destination document the same as in source document.
srcMainDoc.FirstSection.PageSetup.LeftMargin = document.FirstSection.PageSetup.LeftMargin;

Hi Tahir,

It resolved the issue partly. Please find attached document converted using the line of code you provided. 21.10_20211018-181850.zip (112.1 KB)

Few issues:

  1. The replacement tag is not getting replaced with empty string.
  2. If you see the second part of the page, the appended PDF with last paragraph has distorted interface.
  3. Also this appending document should not change the margin for original document. Means if you see the top two line, there also margin has changed.

Please check and help us with solution, We are planning to upgrade to this version of ASPOSE if that resolves our issues.

Thanks for support.

Thanks
Hardik

@HardikS

Please use the following modified code to get the desired output.

private class InsertDocumentAtReplaceHandler : IReplacingCallback
{
    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;

        // The first (and may be the only) run can contain text before the match, 
        // In this case it is necessary to split the run.
        if (e.MatchOffset > 0)
            currentNode = SplitRun((Run)currentNode, e.MatchOffset);

        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.Match.Value.Length;
        while (
            (remainingLength > 0) &&
            (currentNode != null) &&
            (currentNode.GetText().Length <= remainingLength))
        {
            runs.Add(currentNode);
            remainingLength = remainingLength - currentNode.GetText().Length;

            // Select the next Run node. 
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do
            {
                currentNode = currentNode.NextSibling;
            }
            while ((currentNode != null) && (currentNode.NodeType != NodeType.Run));
        }

        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0))
        {
            SplitRun((Run)currentNode, remainingLength);
            runs.Add(currentNode);
        }

        Document subDoc = new Document(_Docreplacement);
        subDoc.LastSection.Body.LastParagraph.ParagraphFormat.LeftIndent = 0.0;
        subDoc.LastSection.Body.LastParagraph.ParagraphFormat.RightIndent = 0.0;
        DocumentBuilder builder = new DocumentBuilder((Document)e.MatchNode.Document);
        builder.MoveTo((Run)runs[0]);
                
        builder.InsertDocument(subDoc, ImportFormatMode.KeepSourceFormatting);

        builder.InsertBreak(BreakType.SectionBreakContinuous);

        foreach (Run run in runs)
            run.Remove();

        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.Skip;
    }

    private static Run SplitRun(Run run, int position)
    {
        Run afterRun = (Run)run.Clone(true);
        afterRun.Text = run.Text.Substring(position);
        run.Text = run.Text.Substring(0, position);
        run.ParentNode.InsertAfter(afterRun, run);
        return afterRun;
    }
}
var document = new Document(PDFFile);
document.Save(ReportSavePath + DestWord);
_Docreplacement = Path.Combine(ReportSavePath + DestWord);
FindReplaceOptions findReplaceOptions = new FindReplaceOptions();
findReplaceOptions.ReplacingCallback = new InsertDocumentAtReplaceHandler();
srcMainDoc.Range.Replace(new Regex("%ENTITY_ATTACHMENT%"), "", findReplaceOptions);
srcMainDoc.FirstSection.PageSetup.LeftMargin = document.FirstSection.PageSetup.LeftMargin;
string Dest = "21.10.docx";
srcMainDoc.Save(ReportSavePath + Dest, SaveFormat.Docx);

Please note that Aspose.Words mimics the behavior of MS Word. If you insert the converted DOCX to main document, you will get the same output.

For this case, please set the left and right indent of last paragraph of document to 0.0 to avoid this issue.

Hi Tahir,

Thank you for the support. Now I had payment related query, we had purchased ASPOSE.WORDS but didnt renew it from last 5 years.

What should we do now? I see three options for this, 1. Renew, 2. Upgrade and 3. New Purchase.

What will be the minimal cost for having new version of ASPOSE.WORDS latest version?

Please help.

Thanks
Hardik

One issue we found when we try to append PDF which is having large image in it. It does not get compressed to A4 size and hence having rendering issues while viewing the final document.

Please find attached PDF we tried and the output word document on how it looks.

Also wanted to know what all file types can be added to word document like pdf? Can msg file be appended to word document as we did for pdf?

SamplePDFImage.zip (566.5 KB)

Thanks
Hardik

@HardikS

Please post your query in Aspose.Purchase forum where our sales team will answer your query appropriately.

We have tested the scenario using the latest version of Aspose.Words for .NET 21.10 and have not found the shared issue. So, please use Aspose.Words for .NET 21.10. We have attached the output DOCX with this post for your kind reference.
21.10.docx (44.1 KB)

Hi Tahir,

I downloaded the document. But it is same as we sent, the image is not coming properly as A4 size. We want the image and PDF to be with same margin, which is not there. You can verify it when you try for print.

Thanks
Hardik

@HardikS

You are facing the expected behavior of Aspose.Words. If you want to change the paper size, you can use PageSetup.PaperSize property with value PaperSize.A4.

Hi Tahir,

Can you please help us with answer on this? We would need MSG file to be used in ASPOSE.WORDS, is it possible?

@HardikS You can use Aspose.Email to convert msg file to mhtml. Then simply load mhtml document into Aspose.Words.Document and append it to another document.

So we need to purchase Aspose.Email in addition to the ASPOSE.Words that we already have. Can’t ASPOSE.Words accept the MSG file the way it does for EML or PDF?.

Thanks
Hardik

@HardikS Aspose.Words supports wide range of document formats, But unfortunately, Aspose.Words does not accept MSG as input file format.