Need help with aspose java API's

Hi,
Could you please help me with some sample code on how to read different sections of MS word and get them as HTML?
Thanks & Regards,
Natarajan P

This message was posted using Email2Forum by ShL77.

Hi there,
Thanks for your inquiry.
You can convert a section of a document to HTML using the code below. To achieve this for all sections you should loop through all sections in the document.

//Open document
Document doc = new Document("Document.doc");
//Create temporary document and remove all sections from it.
Document tempDoc = new Document();
tempDoc.RemoveAllChildren();

//Import first section
Node dstNode = tempDoc.ImportNode(doc.FirstSection, true, ImportFormatMode.KeepSourceFormatting);
//Insert node into the temporary document
tempDoc.AppendChild(dstNode);

//Save HTML to MemoryStream
MemoryStream htmlStream = new MemoryStream();
tempDoc.Save(htmlStream, SaveFormat.Html);
//Get Html string
string htmlString = Encoding.UTF8.GetString(htmlStream.ToArray());

If you have any further queries, please feel free to ask.
Thanks,

Hi,
Thanks for the reply.
Could you please provide code snippets for reading an image inside the section?
Thanks & Regards,
Natarajan P

Hi there,
Thanks for your inquiry.
I think the code from the Extracting images from a document article here will help you.
Thanks,

Hi,
Thanks for the response.
Could you please help with Java code for converting sections of document to HTML?
The code snippet you had provided doesn’t looks like a java code.
Thanks & Regards,
Natarajan P

Hi

Thanks for your inquiry. You can use code like the following to convert Section to HTML:

private String SectionToHtml(Section section) throws Exception
{
    // Create an empty document.
    Document doc = new Document();
    doc.removeAllChildren();
    // Import section to an empty document.
    doc.appendChild(doc.importNode(section, true, ImportFormatMode.KEEP_SOURCE_FORMATTING));
    // Save temporary document into the stream and get HTML strign from stream.
    ByteArrayOutputStream htmlStream = new ByteArrayOutputStream();
    doc.save(htmlStream, SaveFormat.HTML);
    // Get HTML string.
    return htmlStream.toString("UTF8");
}

Hope this helps.
Best regards,

Hi,
Thanks for your response.
The document which I am using has images, and I get the following error, could you please help on how to specify HtmlExportImagesFolder or custom streams should be provided via HtmlExportImageSaving event handler.
java.lang.IllegalStateException: Image file cannot be written to disk. When saving the document to a stream either HtmlExportImagesFolder should be specified or custom streams should be provided via HtmlExportImageSaving event handler. Please see documentation for details.

at com.aspose.words.yo.Dk(ImageWriter.java:567)
at com.aspose.words.yo.a(ImageWriter.java:390)
at com.aspose.words.yo.a(ImageWriter.java:127)
at com.aspose.words.km.w(HtmlWriter.java:845)
at com.aspose.words.km.visitShapeStart(HtmlWriter.java:789)
at com.aspose.words.Shape.accept(Shape.java:98)
at com.aspose.words.CompositeNode.acceptChildren(CompositeNode.java:782)
at com.aspose.words.Paragraph.accept(Paragraph.java:335)
at com.aspose.words.CompositeNode.acceptChildren(CompositeNode.java:782)
at com.aspose.words.Body.accept(Body.java:69)
at com.aspose.words.km.a(HtmlWriter.java:292)
at com.aspose.words.km.a(HtmlWriter.java:280)
at com.aspose.words.km.a(HtmlWriter.java:263)
at com.aspose.words.km.writeBody(HtmlWriter.java:202)
at com.aspose.words.km.mm(HtmlWriter.java:140)
at com.aspose.words.km.a(HtmlWriter.java:56)
at com.aspose.words.km.a(HtmlWriter.java:29)
at com.aspose.words.Document.a(Document.java:1472)
at com.aspose.words.Document.save(Document.java:942)
at AsposeAPIDriver4.sectionToHtml(AsposeAPIDriver4.java:43)

Thanks & Regards,
Natarajan P

Hi

Thanks for your inquiry. You should just do exactly what the error message suggest, i.e. specify folder where images will be stored:

private String SectionToHtml(Section section) throws Exception
{
    // Create an empty document.
    Document doc = new Document();
    doc.removeAllChildren();
    // Import section to an empty document.
    doc.appendChild(doc.importNode(section, true, ImportFormatMode.KEEP_SOURCE_FORMATTING));
    // Save temporary document into the stream and get HTML strign from stream.
    ByteArrayOutputStream htmlStream = new ByteArrayOutputStream();
    // Specify folder where images will be saved.
    doc.getSaveOptions().setHtmlExportImagesFolder("C:\\Temp\\");
    doc.save(htmlStream, SaveFormat.HTML);
    // Get HTML string.
    return htmlStream.toString("UTF8");
}

Best regards,

Hi,
Thanks for the quick response
Much appreciated.
Thanks & Regards,
Natarajan P

Hi,
Could you please help me with the sample code to get the table of contents list and how to read the section associated with each entry in table of content?
Thanks & Regards,
Natarajan P

Hi,
Wanted to know the feasibility of the api mentioned in my previous post.
Is that possible with aspose java api’s?
Thanks & Regards,
Natarajan P

Hi

Thanks for your inquiry. Actually, TOC entries are not referring to Sections. Please see the following link to learn more about sections in Word documents:
https://docs.aspose.com/words/net/working-with-sections/
In your case, you should check how TOC is built. It can use Headings for instance. In this case you need to split document by headings. To achieve this, you can use the same technique as used in the code example provided here:
https://forum.aspose.com/t/converting-wordml-to-presentationml/101902/4
Code is in C#, but I think, you can easily accomplish the same in Java.
Best regards,