Extracting Ooxml of an object from Word

Hi,
I have a range from Excel pasted into Word, now I want to extract only that range’s Ooxml how do I do that?
Range.zip (27.6 KB)

@Adhirath There is no way to extract OOXML representation of a particular node or object from Aspose.Words documents. Though you can copy the node into a separate document and save it as MS Word 2007 XML (FlatOpc) format. Then parse the resulting XML and extract the required part.

can you tell me how to do that?

@Adhirath Something like this:

// Open source document
Document doc = new Document(@"C:\Temp\in.docx");
// Create a dummy document and copy the required node to it.
Document tmp = (Document)doc.Clone(false);
tmp.EnsureMinimum();
// As an example copy the first paragraph.
tmp.FirstSection.Body.RemoveAllChildren();
tmp.FirstSection.Body.AppendChild(tmp.ImportNode(doc.FirstSection.Body.FirstParagraph, true, ImportFormatMode.UseDestinationStyles));
// Save the temporary document as FlatOpc
tmp.Save(@"C:\Temp\out.xml", new OoxmlSaveOptions(SaveFormat.FlatOpc) { PrettyFormat = true });
1 Like

how about a range pasted in word, will that show up in the first paragraph?

@Adhirath In Aspose.Words document object model content is represented as nodes. Please see our documentation for more information:
https://docs.aspose.com/words/net/aspose-words-document-object-model/

So you should move the required nodes into a temporary document and then save it as XML.

1 Like

Hi guys, for this code when i hit the endpoint multiple times, it starts adding extra page breaks and section breaks to the saved file.

    new Aspose.Cells.License().SetLicense();
    Workbook workbook = new Workbook(new MemoryStream(documentContent));
    for (var i = 0; i < workbook.Worksheets.Names.Count; i++)
    {
        var namedRange = workbook.Worksheets.Names[i];
        if (namedRange == null)
        {
            continue;
        }
        else
        {
            if (!string.IsNullOrEmpty(namedRange.Text) && namedRange.Text == RangeName)
            {
                var range = namedRange.GetRange();

                if (range != null)
                {
                    if (range.RowCount == 1 && range.ColumnCount == 1)
                    {
                        var cell = range[0, 0];
                        var value = cell.Value;
                        if (value != null)
                        {
                            return Encoding.UTF8.GetBytes(value.ToString());
                        }
                        return Array.Empty<byte>();
                    }

                    range.Worksheet.IsVisible = true;
                    foreach (Aspose.Cells.Worksheet sheet in workbook.Worksheets)
                    {
                        if (sheet.Index != range.Worksheet.Index)
                        {
                            sheet.IsVisible = false;
                        }
                    }
                    Aspose.Cells.PageSetup pageSetup = range.Worksheet.PageSetup;
                    pageSetup.PrintArea = range.RefersTo;
                    DocxSaveOptions saveOptions = new DocxSaveOptions();
                    using (MemoryStream ms = new MemoryStream())
                    {
                        workbook.Save(ms, saveOptions);
                        ms.Position = 0;
                        new Aspose.Words.License().SetLicense();
                        Aspose.Words.Document Document = new Aspose.Words.Document(ms);
                        Document.ExtractPages(0,1);
                        NodeCollection paragraphs = Document.GetChildNodes(NodeType.Paragraph, true);
                        foreach (Paragraph para in paragraphs)
                        {
                            // If the paragraph has a page break set before, then clear it.
                            if (para.ParagraphFormat.PageBreakBefore)
                                para.ParagraphFormat.PageBreakBefore = false;
                            // Check all runs in the paragraph for page breaks and remove them.
                            foreach (Run run in para.Runs)
                                if (run.Text.Contains(ControlChar.PageBreak))
                                    run.Text = run.Text.Replace(ControlChar.PageBreak, string.Empty);
                        }
                        for (int j = Document.Sections.Count - 2; j >= 0; j--)
                        {
                            // Copy the content of the current section to the beginning of the last section.
                            Document.LastSection.PrependContent(Document.Sections[i]);
                            // Remove the copied section.
                            Document.Sections[i].Remove();
                        }
                        using (MemoryStream memoryStream = new MemoryStream())
                        {
                            Document.Save(memoryStream, new Aspose.Words.Saving.OoxmlSaveOptions(Aspose.Words.SaveFormat.FlatOpc) { PrettyFormat = true });
                            var variableValue = memoryStream.ToArray();
                            return variableValue;
                        }
                    }
                }
            }
        }
    }
    return Array.Empty<byte>();
}

Sometimes I get an extra page, I want to fix that, in case there is an extra page I just want to save the first page

@Adhirath Could you please attach the problematic input passed to Aspose.Words and the output here for our reference? We will check the issue and provide you more information.

I got through that, there is another issue I am stuck with. I want to get the ooxml of a chart in excel. How can I do that? Already talked with the Aspose.Cells team they said to talk with the Aspose.Words team. Let me know what can be done.

@Adhirath There is no direct way to extract OOXML of the chart using Aspose.Words. You can use the same approach as suggested above.