Need help with aspose java API's

natarajanp · October 5, 2010, 4:22pm

Hi,
Could you please help me with some sample code on how to read different sections of MS word and get them as HTML?
Thanks & Regards,
Natarajan P

This message was posted using Email2Forum by ShL77.

adam.skelton · October 5, 2010, 7:57pm

Hi there,
Thanks for your inquiry.
You can convert a section of a document to HTML using the code below. To achieve this for all sections you should loop through all sections in the document.

//Open document
Document doc = new Document("Document.doc");
//Create temporary document and remove all sections from it.
Document tempDoc = new Document();
tempDoc.RemoveAllChildren();

//Import first section
Node dstNode = tempDoc.ImportNode(doc.FirstSection, true, ImportFormatMode.KeepSourceFormatting);
//Insert node into the temporary document
tempDoc.AppendChild(dstNode);

//Save HTML to MemoryStream
MemoryStream htmlStream = new MemoryStream();
tempDoc.Save(htmlStream, SaveFormat.Html);
//Get Html string
string htmlString = Encoding.UTF8.GetString(htmlStream.ToArray());

If you have any further queries, please feel free to ask.
Thanks,

natarajanp · October 5, 2010, 11:09pm

Hi,
Thanks for the reply.
Could you please provide code snippets for reading an image inside the section?
Thanks & Regards,
Natarajan P

adam.skelton · October 5, 2010, 11:20pm

Hi there,
Thanks for your inquiry.
I think the code from the Extracting images from a document article here will help you.
Thanks,

natarajanp · October 5, 2010, 11:47pm

Hi,
Thanks for the response.
Could you please help with Java code for converting sections of document to HTML?
The code snippet you had provided doesn’t looks like a java code.
Thanks & Regards,
Natarajan P

alexey.noskov · October 6, 2010, 3:29am

Hi

Thanks for your inquiry. You can use code like the following to convert Section to HTML:

private String SectionToHtml(Section section) throws Exception
{
    // Create an empty document.
    Document doc = new Document();
    doc.removeAllChildren();
    // Import section to an empty document.
    doc.appendChild(doc.importNode(section, true, ImportFormatMode.KEEP_SOURCE_FORMATTING));
    // Save temporary document into the stream and get HTML strign from stream.
    ByteArrayOutputStream htmlStream = new ByteArrayOutputStream();
    doc.save(htmlStream, SaveFormat.HTML);
    // Get HTML string.
    return htmlStream.toString("UTF8");
}

Hope this helps.
Best regards,

natarajanp · October 6, 2010, 4:30am

Hi,
Thanks for your response.
The document which I am using has images, and I get the following error, could you please help on how to specify HtmlExportImagesFolder or custom streams should be provided via HtmlExportImageSaving event handler.
java.lang.IllegalStateException: Image file cannot be written to disk. When saving the document to a stream either HtmlExportImagesFolder should be specified or custom streams should be provided via HtmlExportImageSaving event handler. Please see documentation for details.

at com.aspose.words.yo.Dk(ImageWriter.java:567)
at com.aspose.words.yo.a(ImageWriter.java:390)
at com.aspose.words.yo.a(ImageWriter.java:127)
at com.aspose.words.km.w(HtmlWriter.java:845)
at com.aspose.words.km.visitShapeStart(HtmlWriter.java:789)
at com.aspose.words.Shape.accept(Shape.java:98)
at com.aspose.words.CompositeNode.acceptChildren(CompositeNode.java:782)
at com.aspose.words.Paragraph.accept(Paragraph.java:335)
at com.aspose.words.CompositeNode.acceptChildren(CompositeNode.java:782)
at com.aspose.words.Body.accept(Body.java:69)
at com.aspose.words.km.a(HtmlWriter.java:292)
at com.aspose.words.km.a(HtmlWriter.java:280)
at com.aspose.words.km.a(HtmlWriter.java:263)
at com.aspose.words.km.writeBody(HtmlWriter.java:202)
at com.aspose.words.km.mm(HtmlWriter.java:140)
at com.aspose.words.km.a(HtmlWriter.java:56)
at com.aspose.words.km.a(HtmlWriter.java:29)
at com.aspose.words.Document.a(Document.java:1472)
at com.aspose.words.Document.save(Document.java:942)
at AsposeAPIDriver4.sectionToHtml(AsposeAPIDriver4.java:43)

Thanks & Regards,
Natarajan P

alexey.noskov · October 6, 2010, 5:17am

Hi

Thanks for your inquiry. You should just do exactly what the error message suggest, i.e. specify folder where images will be stored:

private String SectionToHtml(Section section) throws Exception
{
    // Create an empty document.
    Document doc = new Document();
    doc.removeAllChildren();
    // Import section to an empty document.
    doc.appendChild(doc.importNode(section, true, ImportFormatMode.KEEP_SOURCE_FORMATTING));
    // Save temporary document into the stream and get HTML strign from stream.
    ByteArrayOutputStream htmlStream = new ByteArrayOutputStream();
    // Specify folder where images will be saved.
    doc.getSaveOptions().setHtmlExportImagesFolder("C:\\Temp\\");
    doc.save(htmlStream, SaveFormat.HTML);
    // Get HTML string.
    return htmlStream.toString("UTF8");
}

Best regards,

natarajanp · October 6, 2010, 6:41am

Hi,
Thanks for the quick response
Much appreciated.
Thanks & Regards,
Natarajan P

natarajanp · October 6, 2010, 9:00am

Hi,
Could you please help me with the sample code to get the table of contents list and how to read the section associated with each entry in table of content?
Thanks & Regards,
Natarajan P

natarajanp · October 6, 2010, 1:39pm

Hi,
Wanted to know the feasibility of the api mentioned in my previous post.
Is that possible with aspose java api’s?
Thanks & Regards,
Natarajan P

alexey.noskov · October 7, 2010, 1:39am

Hi

Thanks for your inquiry. Actually, TOC entries are not referring to Sections. Please see the following link to learn more about sections in Word documents:
https://docs.aspose.com/words/net/working-with-sections/
In your case, you should check how TOC is built. It can use Headings for instance. In this case you need to split document by headings. To achieve this, you can use the same technique as used in the code example provided here:
https://forum.aspose.com/t/converting-wordml-to-presentationml/101902/4
Code is in C#, but I think, you can easily accomplish the same in Java.
Best regards,