I have an issue of converting word document (24 pages) to its epub. The requirement is to have separate html page for each page in word document. Ideally I should get 24 html pages. I am using below code,
String dataDir = "/home/nirmalap/workspace-word/wordSample1/";
FileInputStream fstream = new FileInputStream(dataDir + "Aspose.Words.lic");
License license = new License();
license.setLicense(fstream);
Document doc = new Document(dataDir + "01 Keown_Text_MS-10e_Ch01_vim_AJK.docx");
HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.setEncoding(Charset.forName("UTF-8"));
saveOptions.setDocumentSplitCriteria(DocumentSplitCriteria.PAGE_BREAK);
saveOptions.setExportDocumentProperties(true);
saveOptions.setSaveFormat(SaveFormat.EPUB);
doc.save(dataDir + "Document.EpubConversion_out.epub", saveOptions);
But when I look at the epub it has only two html files. Could you let me know I am missing anything here?
You are using HtmlSaveOptions correctly. The document is split into parts at explicit page breaks. when you use the property DocumentSplitCriteria(DocumentSplitCriteria.PAGE_BREAK). To achieve your requirement, you need to insert explicit page break at the end of each page.
Moreover, you can insert page break into document using Aspose.Words API. Please move the cursor to the last node of page and insert page break using DocumentBuilder.InsertBreak method. You can get the last node of page using layout API of Aspose.Words.
Thanks tahir. I tried to find sample code to lookup last node of a page in a word document. But was unable. Could you point me to couple of examples of how to do that.
Thanks Tahir. I have tried out the code but it doesn’t meet my requirement since when I open word document with MS word it will show only 23 pages, but code added total of 321 page breaks. Would it be possible to add single page break for each physical page in word document.
Please ZIP and attach your input Word document and expected EPUB file here for testing. We will investigate the issue and provide you more information on it.
Just to add few more things to above we discussed. We have two requirements,
convert a word document (book) into epub
convert a pdf document (book) into epub
Does aspose.word support both these requirements. In the LoadFormat class we have PDF and in the SaveFormat class we have EPUB support in aspose API documentation. Can we utilize these features in meeting above two requirements.
For licensing and costing its better if we can get both features in a single product. Can you advise.
Exception in thread "main" com.aspose.words.UnsupportedFileFormatException: Pdf format is not supported on this platform. Use .NET Standard or .NET 4.6.1 version of Aspose.Words for loading Pdf documents.
at com.aspose.words.zzZ34.zzLs(Unknown Source)
at com.aspose.words.Document.zzY(Unknown Source)
at com.aspose.words.Document.zzZ(Unknown Source)
at com.aspose.words.Document.<init>(Unknown Source)
at com.aspose.words.Document.<init>(Unknown Source)
at word.sample.PdfSample.main(PdfSample.java:19)
code I am using is as below,
String dataDir = "/home/nirmalap/workspace-word/wordSample1/";
FileInputStream fstream = new FileInputStream(dataDir + "Aspose.Words.lic");
License license = new License();
license.setLicense(fstream);
Document doc = new Document(dataDir + "1621_ladder.pdf");
PdfSaveOptions saveOptions = new PdfSaveOptions();
saveOptions.setDisplayDocTitle(true);
doc.save(dataDir + "Test File.Pdf",saveOptions);
saveOptions.setSaveFormat(SaveFormat.EPUB);
doc.save(dataDir + "Document.EpubConversion_out.epub", saveOptions);
Please use the latest version of Aspose.Words for .NET 20.9. If you still face problem, please ZIP and attach your input document here for testing. We will investigate the issue and provide you more information on it.
I am using aspose.word for java version 20.6 (aspose-words-20.6-jdk17.jar) not the .Net version. But I am getting above .Net version error. Please advice.
I have updated aspose.word for java version to 20.9 (aspose-words-20.9-jdk17.jar) but still getting same error. CVR_BERK3809_05_SE_BEP.pdf (120.3 KB)
File is attached for you to investigate further.
Please accept my apologies for your inconvenience. Unfortunately, this feature is not available in Aspose.Words for Java. We logged this feature request as WORDSJAVA-2366 in our issue tracking system. You will be notified via this forum thread once this feature is available.
Unfortunately, there is no ETA available for this feature at the moment. We will inform you via this forum thread once there is an update available on it.
Thanks Tahir. But now I am getting 680 html pages in the epub. Ideally it should be 83 html pages, which should be equal to the value return from doc.getPageCount(). I have attached the word document, updated code and the epub for you to investigate. Still it doesn’t meet my requirement. Please advice.