Convert document to HTML5

I try to build a service to convert document to html5 by following process:


Document => PDF file => HTML5 (this is continuos process)

Document can be convert hold document or some page of document , and HTML output in many page file.

I want to make 4 threads of process, it mean when file post to a folder, service will auto get file and convert 4 files at the same time.

Can you suggest me:
- Component to use (Java is better but I know it can not convert some page of document)
- Some example code to convert document to html5

Thank you.

Hi,


Please list down all your input file formats. Each Java version of Aspose APIs is auto-ported from its equivalent .NET version. Since, the Java version offers the same features as of its .NET version. Please let us know if you come across any problem in using the Java version of Aspose APIs.

Secondly, you can convert Word and PDF documents to HTML5 directly using Aspose.Pdf and Aspose.Words APIs. Please refer to these help links: Convert a Word to HTML5 and Convert PDF to HTML Format

I want to convert : doc, docx, xls,xlsx, ppt, pptx, pdf to HTML5


With Word.Aspose I can convert to HTML5 directly but i think it will not display page by page. It will display HTML like winword convert doc to html.

I need: convert some page of document only , not hold document.

Hi,

Thank you for the details. In reference to the Excel (Aspose.Cells API) and PowerPoint (Aspose.Slides API) formats, we have already logged feature requests under ticket IDs CELLSJAVA-42044 and SLIDESJAVA-35704 in our issue tracking system. Your post has also been linked to these tickets. We’ll keep you informed regarding any available updates. We’re sorry for the inconvenience caused.

hiepaof:
With Word.Aspose I can convert to HTML5 directly but i think it will not display page by page. It will display HTML like winword convert doc to html.

Well, you can split Word document pages by using page splitter utility prepared by using Aspose.Words API. My fellow worker has narrated about this approach in another reply. Please refer to this: Split Word document Pages. Once you get a page in the Document class object, then you can convert it to HTML5 as narrated there: Convert a Word to HTML5

Using Aspose.Pdf API, you can select a few pages from input PDF and then perform the PDF to HTML5 conversion. Please refer to this help topic: Working with Pages

If I convert directly word to html5, does it keep the same page format with original ? I’m afraid the conversion keep text and format text only, not include page format .


Is there any function allow me choose to convert 1-4 first page of 100 page document , for example

Hi,

Please check if the following solution is acceptable for you?

C# code to convert DOCX pages into HTML fixed format

Document doc = new Document(MyDir + @"input.docx");

HtmlFixedSaveOptions options = new HtmlFixedSaveOptions();
options.PageCount = 1;

for (int pageCount = 0; pageCount < 4; pageCount++)
{
    options.PageIndex = pageCount;
    doc.Save(MyDir + "out_" + pageCount + ".html", options);
}

Best regards,

hello. I have request to convert document to html or html5.

These file type of doc/docx, xls/xlsx can convert to html using aspose java api, but Text on shape can not convert normally.

Hi Suixing,


Thank you for contacting support. Please provide us your source Word and Excel documents along with the code. Please also highlight the problematic shapes with the help of screenshots. We’ll investigate and reply you appropriately. Your response is awaited.

Can I convert page 1 to page 5 in one html file and CSS and image embed ?

Hi,


Thank you for the inquiry. You can select a range of pages in the long Word document, and then convert them to a single HTML document as below:

[.NET, C#]
<span style=“color: rgb(0, 128, 0); background-color: rgb(255, 255, 255); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>// load a Word document
Document doc = new Document(@“C:\temp\Input.docx”);
// set output HTML options
HtmlFixedSaveOptions options = new HtmlFixedSaveOptions();
// set page count
options.PageCount = 5;
// set index of the first page
options.PageIndex = 0;
// embed CSS
options.ExportEmbeddedCss = true;
// embed images
options.ExportEmbeddedImages = true;
// save HTML document
doc.Save(@“C:\temp<span class=“kwrd”>out.html”, options);

Please let us know in case of any confusion or questions.

is it possible with JAVA component ?

Hi,


Thank you for the inquiry. Yes, it is also possible in Java. Aspose.Words for Java API is fully auto-ported from its .NET version, since the Java version offers the same features as on its .NET version.

[Java]
<span style=“color: rgb(0, 128, 0); background-color: rgb(255, 255, 255); font-family: “Courier New”, Consolas, Courier, monospace; font-size: small; white-space: pre;”>// load a Word document
Document doc = new Document(“C:\temp\Input.docx”);
// set output HTML options
HtmlFixedSaveOptions options = new HtmlFixedSaveOptions();
// set page count
options.setPageCount(5);
// set index of the first page
options.setPageIndex(0);
// embed CSS
options.setExportEmbeddedCss(true);
// embed images
options.setExportEmbeddedImages(true);
// save HTML document
doc.save(“C:\temp\out.html”, options);

The issues you have found earlier (filed as SLIDESJAVA-35704) have been fixed in Aspose.Slides for Java 22.8 (ZIP).
You can check all fixes on the Release Notes page.
You can also find the latest version of our library on the Product Download page.