Issue in splitting word document (multiple pages) to html pages in saving as an epub

@nirmalap

After further investigation, we have noticed that the page count in EPUB file is incorrect. We are investigating this issue and provide you more information on it soon.

@nirmalap

We have logged two issues WORDSNET-21149 and WORDSJAVA-2465 in our issue tracking system for your scenario. We will inform you via this forum thread once these issues are resolved. We apologize for your inconvenience.

Thanks Tahir. I am working on a POC to see how aspose.word for java can be used to convert word document to its epub. Based on this POC, company would make the decision to purchase the aspose.word for java. I would like to know what is the ETA for the fix.

@nirmalap

We try our best to deal with every customer request in a timely fashion, we unfortunately cannot guarantee a delivery date to every customer issue. Our developers work on issues on a first come, first served basis. We feel this is the fairest and most appropriate way to satisfy the needs of the majority of our customers.

Currently, your issues are pending for analysis and are in the queue. Once we complete the analysis of your issues, we will then be able to provide you an estimate.

Could you please share your expected output EPUB file here for our reference? We will then fix your issues according to your requirement. Thanks for your cooperation.

Thanks Tahir. I understand. I don’t have a sample of expected output EPUB to share with you. I am trying to generate my first epub with aspose.word. Perhaps we could have a teams call so I would be able to explain my requirement. In brief my requirement is,

  • there is a component which takes .doc or .docx file as input
  • that component output is a .epub file
  • component uses aspose.word to create a .epub from word document
  • .epub has some requirements to meet
    • each page of word document there must be a separate html file
    • other assets like images, styles, scripts, fonts must be extracted to its own file
    • toc must be generated with links to html pages
    • package.opf file must be generated with link to toc.html
  • the issue is in the step “each page of word document there must be a separate html file”
  • lets take very simple word document with 6 pages attached here (shows in the bottom page 1 of 6 when open in MS word)
  • I hope “page 1 of 6” when open in MS word indicates it has six pages
  • Is that indicates it has 6 page breaks
  • I want to split word document to 6 html pages since it has 6 pages when open in MS word
  • so that .epub has 6 .html files inside
  • it would be great if you could share sample code to achieve thissample.zip (705.2 KB)

@nirmalap

We suggest you please use the following code example to get the desired output. Hope this helps you.

Please get the PageSplitter code from following Github link.
https://reference.aspose.com/words/java/com.aspose.words/Document#extractPages(int,int)

Document doc = new Document(MyDir + "Bozarth_ch05_ed.doc");
doc.acceptAllRevisions();
System.out.println("Actual page count : "+doc.getPageCount());
DocumentPageSplitter splitter = new DocumentPageSplitter(doc);
Document epub = splitter.getDocumentOfPageRange(1, doc.getPageCount());
epub.acceptAllRevisions();
HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.setEncoding(Charset.forName("UTF-8"));
saveOptions.setDocumentSplitCriteria(DocumentSplitCriteria.SECTION_BREAK);
saveOptions.setExportDocumentProperties(true);
saveOptions.setSaveFormat(SaveFormat.EPUB);
epub.save(MyDir + "20.9.epub", saveOptions);

Thanks Tahir. Let me try and get back.

@nirmalap

Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.

A post was split to a new topic: Purchase aspose.words package for developer OEM license

@nirmalap

It is to inform you that the issue which you are facing is actually not a bug in Aspose.Words. So, we have closed this issue (WORDSJAVA-2465) as ‘Not a Bug’.

Aspose.Words correctly splits pages when DocumentSplitCriteria is applied, and there are 86 HTML documents inside the produced Epub publication. However, a whole page (an HTML document) does not fit a window of an Epub reader, and it splits the page to fit it. If you reduce the size of an Epub reader window, it increases the count of pages. So, it’s an internal page numbering of an Epub reader.

Thanks Tahir.

@nirmalap

Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.