Copying content from one word file to another has different behaviors with license

paulo.cacao · September 8, 2023, 11:06am

When I copy part of the text (I’m looking for all paragraphs delimited between two specific paragraphs) from a Word document to another word document and without a license, everything works fine.
When I use the temporary license, the copied text is completely different and is not the expected result.
The code is the same, only the difference is apply license or not.

alexey.noskov · September 8, 2023, 12:17pm

@paulo.cacao The problem might be caused by the watermark and evaluation message added by Aspose.Words in evaluation mode. Aspose.Words injects an evaluation message as the very first paragraph in the document, so original document paragraphs’ indexes are shifted. Please make sure the indexes of the delimiter paragraphs are correct.

paulo.cacao · September 8, 2023, 2:40pm

Hi Alexey,

I am trying to read the content between two paragraphs from the source document(original.docx) and copy the text in target document(empty-template.docx)

Just take a look at the code provided. It works as expected without licencing information. As soon as I include licencing, it does not generate the
desired output. The source document does not have any Aspose watermark or evaluation message.

original.docx (139.0 KB)
EmptyTemplate.docx (93.5 KB)

private static void CopyDataFromWord()
{
    CssCCM_Aspose f = new CssCCM_Aspose();
    byte[] License = File.ReadAllBytes(@"c:\csa\Aspose.WordsProductFamily.lic");
    byte[] SourceFile = File.ReadAllBytes(@"c:\csa\copy\original.docx");
    byte[] TemplateFile = File.ReadAllBytes(@"c:\csa\copy\EmptyTemplate.docx");


    f.MssExtractText(SourceFile, TemplateFile, "MARKINGS", "ALTERATIONS", "b_Markings", License);

    Console.WriteLine("Done");
}

public static void ExtractText(byte[] SourceFile, byte[] TemplateFile, string StartSearchString, string EndSearchString, string BookmarkName, byte[] License)
{
    if (SourceFile == null) { throw new Exception("No source file was provided"); }
    if (SourceFile.Length == 0) { throw new Exception("An invalid source file was provided"); }
    if (TemplateFile == null) { throw new Exception("No template file was provided"); }
    if (TemplateFile.Length == 0) { throw new Exception("An invalid template file was provided"); }
    if (BookmarkName == null) { throw new Exception("No Bookmark was provided"); }
    if (BookmarkName.Length == 0) { throw new Exception("An invalid Bookmark dataset was provided"); }

    LicenseManager.SetAsposeLicense(License);

    int startIndex = 0;
    int endIndex = 1;
    int seed = 1;

    using (MemoryStream inputStream = new MemoryStream(SourceFile))
    {
        // Load the document
        Document docSourceFile = new Document(inputStream);
        using (MemoryStream inputStreamTemplate = new MemoryStream(TemplateFile))
        {
            // Load the document
            Document docTemplateFile = new Document(inputStreamTemplate);

            NodeCollection paragraphs = docSourceFile.GetChildNodes(NodeType.Paragraph, true);
            foreach (Paragraph p in paragraphs)
            {
                string paraText = p.ToString(SaveFormat.Text).Trim();
                // You can use StartsWith, Contains, EndsWith methods or Regular expression to check paragraph text.
                if (paraText.StartsWith(StartSearchString))
                {
                    startIndex = seed - 1;
                }
                if (paraText.StartsWith(EndSearchString))
                {
                    endIndex = seed - 1;
                }
                seed++;
            }

            // Gather the nodes (the GetChild method uses 0-based index)
            Paragraph startPara = (Paragraph)docSourceFile.FirstSection.Body.GetChild(NodeType.Paragraph, startIndex, true);
            Paragraph endPara = (Paragraph)docSourceFile.FirstSection.Body.GetChild(NodeType.Paragraph, endIndex, true);

            // Extract the content between these nodes in the document. Include these markers in the extraction.
            ArrayList extractedNodes = Common.ExtractContent(startPara, endPara, true);
            if (extractedNodes.Count > 0)
            {
                extractedNodes.RemoveAt(0); // remove first line that contains the start delimiter
                extractedNodes.RemoveAt(extractedNodes.Count - 1); // remove last line that contains the end delimiter
            }

            // Insert the content into a new document and save it to disk.
            Document dstDoc = GenerateDocument(docSourceFile, extractedNodes, docTemplateFile, BookmarkName);
            dstDoc.Save(@"c:\csa\copy\Updated-Merged.docx");
        }
    }
}

Thanks,
Paulo

alexey.noskov · September 8, 2023, 5:20pm

@paulo.cacao Please try modifying your code like this:

public static void ExtractText(byte[] SourceFile, byte[] TemplateFile, string StartSearchString, string EndSearchString, string BookmarkName, byte[] License)
{
    if (SourceFile == null) { throw new Exception("No source file was provided"); }
    if (SourceFile.Length == 0) { throw new Exception("An invalid source file was provided"); }
    if (TemplateFile == null) { throw new Exception("No template file was provided"); }
    if (TemplateFile.Length == 0) { throw new Exception("An invalid template file was provided"); }
    if (BookmarkName == null) { throw new Exception("No Bookmark was provided"); }
    if (BookmarkName.Length == 0) { throw new Exception("An invalid Bookmark dataset was provided"); }

    LicenseManager.SetAsposeLicense(License);

    Paragraph startPara = null;
    Paragraph endPara = null;
    using (MemoryStream inputStream = new MemoryStream(SourceFile))
    {
        // Load the document
        Document docSourceFile = new Document(inputStream);
        using (MemoryStream inputStreamTemplate = new MemoryStream(TemplateFile))
        {
            // Load the document
            Document docTemplateFile = new Document(inputStreamTemplate);

            NodeCollection paragraphs = docSourceFile.GetChildNodes(NodeType.Paragraph, true);
            foreach (Paragraph p in paragraphs)
            {
                string paraText = p.ToString(SaveFormat.Text).Trim();
                // You can use StartsWith, Contains, EndsWith methods or Regular expression to check paragraph text.
                if (paraText.StartsWith(StartSearchString))
                    startPara = p;
                if (paraText.StartsWith(EndSearchString))
                    endPara = p;
            }

            // Extract the content between these nodes in the document. Include these markers in the extraction.
            ArrayList extractedNodes = Common.ExtractContent(startPara, endPara, true);
            if (extractedNodes.Count > 0)
            {
                extractedNodes.RemoveAt(0); // remove first line that contains the start delimiter
                extractedNodes.RemoveAt(extractedNodes.Count - 1); // remove last line that contains the end delimiter
            }
            // Insert the content into a new document and save it to disk.
            Document dstDoc = GenerateDocument(docSourceFile, extractedNodes, docTemplateFile, BookmarkName);
            dstDoc.Save(@"c:\csa\copy\Updated-Merged.docx");
        }
    }
}

Instead of accessing paragraph by index, the start and end paragraphs are selected while looping all paragraphs.

paulo.cacao · September 19, 2023, 9:03am

Hi Alex,

It works! Thanks a lot!

Best regards,
Paulo Cação

paulo.cacao · September 27, 2023, 3:53pm

Hi Alexey,

We are importing almost everything right. But we have some text from the original file with the font-size 10 that is converted to font-size 11 in the new file.
Right now, it looks that happens with the font-size 10, with the others looks good.
The code is the same that we have in the post.

image.png (96.3 KB)

image.png (89.8 KB)
original_FontSize.docx (100.5 KB)
EmptyTemplate.docx (93.5 KB)

Thanks,
Paulo

alexey.noskov · September 27, 2023, 5:05pm

@paulo.cacao Please try using ImportFormatMode.KeepSourceFormatting instead of ImportFormatMode.UseDestinationStyles.

paulo.cacao · September 28, 2023, 8:35am

Hi Alexey,

Yes, we are using it and we have the issue:

public static Document GenerateDocument(Document srcDoc, ArrayList nodes, Document docToInsert, string BookmarkName)
{
    // Find bookmark in the document
    Bookmark bookmark = docToInsert.Range.Bookmarks[BookmarkName];
    if (bookmark == null)
    {
        throw new ArgumentException("Bookmark not found.");
    }
    Node insertionDestination = bookmark.BookmarkStart.ParentNode;

    // Import each node from the list into the new document. Keep the original formatting of the node.
    //NodeImporter importer = new NodeImporter(srcDoc, docToInsert, ImportFormatMode.KeepSourceFormatting);
    if (insertionDestination.NodeType == NodeType.Paragraph || insertionDestination.NodeType == NodeType.Table)
    {
        CompositeNode destinationParent = insertionDestination.ParentNode;

        NodeImporter importer = new NodeImporter(srcDoc, docToInsert, ImportFormatMode.KeepSourceFormatting);
        foreach (Node node in nodes)
        {
            if (node.NodeType == NodeType.Paragraph)
            {
                Paragraph para = (Paragraph)node;
                if (para.IsEndOfSection && !para.HasChildNodes)
                    continue;
            }

            Node newNode = importer.ImportNode(node, true);

            destinationParent.InsertAfter(newNode, insertionDestination);
            insertionDestination = newNode;
        }
    }
    else
    {
        throw new ArgumentException("The destination node should be either a paragraph or table.");
    }

    // Return the generated document.
    return docToInsert;
}

alexey.noskov · September 28, 2023, 10:54am

@paulo.cacao
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-26000

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

paulo.cacao · September 28, 2023, 11:21am

Hi Alexey,

We have a paid license, if you want, I can share details about our license with you.
Do you have an ETA to fix this issue?

alexey.noskov · September 28, 2023, 1:08pm

@paulo.cacao The issue is currently in the queue for analysis. So at the moment we cannot provide you any estimates. Once analysis is done we will be able provide you more information or probably a fix. Please accept our apologies for your inconvenience.

paulo.cacao · September 28, 2023, 2:09pm

Hi @alexey.noskov, thanks. We will wait for an estimate.

paulo.cacao · November 6, 2023, 4:35pm

Hi @alexey.noskov , do you have news?
Thank you!

alexey.noskov · November 6, 2023, 5:14pm

@paulo.cacao Unfortunately, there are no news regarding the issue yet. We will be sure to let you know once the issue is resolved or we have more information for you.

paulo.cacao · January 29, 2024, 11:00am

Hi Alexey,
Do you have news?
Best regards,
Paulo

alexey.noskov · January 29, 2024, 11:59am

@paulo.cacao I am afraid, there are still no news regarding the issue. I have asked the responsible developer to take a look at it shortly. Please accept our apologies for your inconvenience.

aspose.notifier · March 6, 2024, 9:32am

The issues you have found earlier (filed as WORDSNET-26000) have been fixed in this Aspose.Words for .NET 24.3 update also available on NuGet.