When I copy part of the text (I’m looking for all paragraphs delimited between two specific paragraphs) from a Word document to another word document and without a license, everything works fine.
When I use the temporary license, the copied text is completely different and is not the expected result.
The code is the same, only the difference is apply license or not.
@paulo.cacao The problem might be caused by the watermark and evaluation message added by Aspose.Words in evaluation mode. Aspose.Words injects an evaluation message as the very first paragraph in the document, so original document paragraphs’ indexes are shifted. Please make sure the indexes of the delimiter paragraphs are correct.
Hi Alexey,
I am trying to read the content between two paragraphs from the source document(original.docx) and copy the text in target document(empty-template.docx)
Just take a look at the code provided. It works as expected without licencing information. As soon as I include licencing, it does not generate the
desired output. The source document does not have any Aspose watermark or evaluation message.
original.docx (139.0 KB)
EmptyTemplate.docx (93.5 KB)
private static void CopyDataFromWord()
{
CssCCM_Aspose f = new CssCCM_Aspose();
byte[] License = File.ReadAllBytes(@"c:\csa\Aspose.WordsProductFamily.lic");
byte[] SourceFile = File.ReadAllBytes(@"c:\csa\copy\original.docx");
byte[] TemplateFile = File.ReadAllBytes(@"c:\csa\copy\EmptyTemplate.docx");
f.MssExtractText(SourceFile, TemplateFile, "MARKINGS", "ALTERATIONS", "b_Markings", License);
Console.WriteLine("Done");
}
public static void ExtractText(byte[] SourceFile, byte[] TemplateFile, string StartSearchString, string EndSearchString, string BookmarkName, byte[] License)
{
if (SourceFile == null) { throw new Exception("No source file was provided"); }
if (SourceFile.Length == 0) { throw new Exception("An invalid source file was provided"); }
if (TemplateFile == null) { throw new Exception("No template file was provided"); }
if (TemplateFile.Length == 0) { throw new Exception("An invalid template file was provided"); }
if (BookmarkName == null) { throw new Exception("No Bookmark was provided"); }
if (BookmarkName.Length == 0) { throw new Exception("An invalid Bookmark dataset was provided"); }
LicenseManager.SetAsposeLicense(License);
int startIndex = 0;
int endIndex = 1;
int seed = 1;
using (MemoryStream inputStream = new MemoryStream(SourceFile))
{
// Load the document
Document docSourceFile = new Document(inputStream);
using (MemoryStream inputStreamTemplate = new MemoryStream(TemplateFile))
{
// Load the document
Document docTemplateFile = new Document(inputStreamTemplate);
NodeCollection paragraphs = docSourceFile.GetChildNodes(NodeType.Paragraph, true);
foreach (Paragraph p in paragraphs)
{
string paraText = p.ToString(SaveFormat.Text).Trim();
// You can use StartsWith, Contains, EndsWith methods or Regular expression to check paragraph text.
if (paraText.StartsWith(StartSearchString))
{
startIndex = seed - 1;
}
if (paraText.StartsWith(EndSearchString))
{
endIndex = seed - 1;
}
seed++;
}
// Gather the nodes (the GetChild method uses 0-based index)
Paragraph startPara = (Paragraph)docSourceFile.FirstSection.Body.GetChild(NodeType.Paragraph, startIndex, true);
Paragraph endPara = (Paragraph)docSourceFile.FirstSection.Body.GetChild(NodeType.Paragraph, endIndex, true);
// Extract the content between these nodes in the document. Include these markers in the extraction.
ArrayList extractedNodes = Common.ExtractContent(startPara, endPara, true);
if (extractedNodes.Count > 0)
{
extractedNodes.RemoveAt(0); // remove first line that contains the start delimiter
extractedNodes.RemoveAt(extractedNodes.Count - 1); // remove last line that contains the end delimiter
}
// Insert the content into a new document and save it to disk.
Document dstDoc = GenerateDocument(docSourceFile, extractedNodes, docTemplateFile, BookmarkName);
dstDoc.Save(@"c:\csa\copy\Updated-Merged.docx");
}
}
}
Thanks,
Paulo
@paulo.cacao Please try modifying your code like this:
public static void ExtractText(byte[] SourceFile, byte[] TemplateFile, string StartSearchString, string EndSearchString, string BookmarkName, byte[] License)
{
if (SourceFile == null) { throw new Exception("No source file was provided"); }
if (SourceFile.Length == 0) { throw new Exception("An invalid source file was provided"); }
if (TemplateFile == null) { throw new Exception("No template file was provided"); }
if (TemplateFile.Length == 0) { throw new Exception("An invalid template file was provided"); }
if (BookmarkName == null) { throw new Exception("No Bookmark was provided"); }
if (BookmarkName.Length == 0) { throw new Exception("An invalid Bookmark dataset was provided"); }
LicenseManager.SetAsposeLicense(License);
Paragraph startPara = null;
Paragraph endPara = null;
using (MemoryStream inputStream = new MemoryStream(SourceFile))
{
// Load the document
Document docSourceFile = new Document(inputStream);
using (MemoryStream inputStreamTemplate = new MemoryStream(TemplateFile))
{
// Load the document
Document docTemplateFile = new Document(inputStreamTemplate);
NodeCollection paragraphs = docSourceFile.GetChildNodes(NodeType.Paragraph, true);
foreach (Paragraph p in paragraphs)
{
string paraText = p.ToString(SaveFormat.Text).Trim();
// You can use StartsWith, Contains, EndsWith methods or Regular expression to check paragraph text.
if (paraText.StartsWith(StartSearchString))
startPara = p;
if (paraText.StartsWith(EndSearchString))
endPara = p;
}
// Extract the content between these nodes in the document. Include these markers in the extraction.
ArrayList extractedNodes = Common.ExtractContent(startPara, endPara, true);
if (extractedNodes.Count > 0)
{
extractedNodes.RemoveAt(0); // remove first line that contains the start delimiter
extractedNodes.RemoveAt(extractedNodes.Count - 1); // remove last line that contains the end delimiter
}
// Insert the content into a new document and save it to disk.
Document dstDoc = GenerateDocument(docSourceFile, extractedNodes, docTemplateFile, BookmarkName);
dstDoc.Save(@"c:\csa\copy\Updated-Merged.docx");
}
}
}
Instead of accessing paragraph by index, the start and end paragraphs are selected while looping all paragraphs.
Hi Alex,
It works! Thanks a lot!
Best regards,
Paulo Cação
Hi Alexey,
We are importing almost everything right. But we have some text from the original file with the font-size 10 that is converted to font-size 11 in the new file.
Right now, it looks that happens with the font-size 10, with the others looks good.
The code is the same that we have in the post.
image.png (96.3 KB)
image.png (89.8 KB)
original_FontSize.docx (100.5 KB)
EmptyTemplate.docx (93.5 KB)
Thanks,
Paulo
@paulo.cacao Please try using ImportFormatMode.KeepSourceFormatting
instead of ImportFormatMode.UseDestinationStyles
.
Hi Alexey,
Yes, we are using it and we have the issue:
public static Document GenerateDocument(Document srcDoc, ArrayList nodes, Document docToInsert, string BookmarkName)
{
// Find bookmark in the document
Bookmark bookmark = docToInsert.Range.Bookmarks[BookmarkName];
if (bookmark == null)
{
throw new ArgumentException("Bookmark not found.");
}
Node insertionDestination = bookmark.BookmarkStart.ParentNode;
// Import each node from the list into the new document. Keep the original formatting of the node.
//NodeImporter importer = new NodeImporter(srcDoc, docToInsert, ImportFormatMode.KeepSourceFormatting);
if (insertionDestination.NodeType == NodeType.Paragraph || insertionDestination.NodeType == NodeType.Table)
{
CompositeNode destinationParent = insertionDestination.ParentNode;
NodeImporter importer = new NodeImporter(srcDoc, docToInsert, ImportFormatMode.KeepSourceFormatting);
foreach (Node node in nodes)
{
if (node.NodeType == NodeType.Paragraph)
{
Paragraph para = (Paragraph)node;
if (para.IsEndOfSection && !para.HasChildNodes)
continue;
}
Node newNode = importer.ImportNode(node, true);
destinationParent.InsertAfter(newNode, insertionDestination);
insertionDestination = newNode;
}
}
else
{
throw new ArgumentException("The destination node should be either a paragraph or table.");
}
// Return the generated document.
return docToInsert;
}
@paulo.cacao
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): WORDSNET-26000
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.
Hi Alexey,
We have a paid license, if you want, I can share details about our license with you.
Do you have an ETA to fix this issue?
@paulo.cacao The issue is currently in the queue for analysis. So at the moment we cannot provide you any estimates. Once analysis is done we will be able provide you more information or probably a fix. Please accept our apologies for your inconvenience.
@paulo.cacao Unfortunately, there are no news regarding the issue yet. We will be sure to let you know once the issue is resolved or we have more information for you.
Hi Alexey,
Do you have news?
Best regards,
Paulo
@paulo.cacao I am afraid, there are still no news regarding the issue. I have asked the responsible developer to take a look at it shortly. Please accept our apologies for your inconvenience.
The issues you have found earlier (filed as WORDSNET-26000) have been fixed in this Aspose.Words for .NET 24.3 update also available on NuGet.