Word document with frames is not getting converted properly to epub

Hi,


I have a pdf book which i needed to convert to valid epub2 file. So i have tried the following flow…
PDF ==>WORD==>EPUB2.

So i have used convertpdfexe to convert the pdf file to doc file, the document file looks fine but when i looked closed i can see the content are placed in frames. When i converted the same document to epub, the format is completely messed up. The css, images and format is completely gone. Am guessing this is because of the frames which are present in the document.

When i used a document which doesnt have a frame it is working fine. Please advice is there anyway to fix this or is there any other direct way to convert the pdf file to epub.

Thanks
Thilak

Hi Thilak,


Thanks for your inquiry. You can achieve your desired results with combination of Aspose.Pdf and Aspose.Words i.e converting PDF to DOC using Aspose.Pdf and later can convert resultant DOC file to EPUB with Aspose.Words. However, I want to update you that PDF to DOC feature is supported in Aspose.Pdf for .NET but malfunctioning in Aspose.Pdf for Java. We have already logged an investigation ticket PDFNEWJAVA-33309 for the purpose. I’ve also linked your request to the issue and you will be notified via this forum thread as soon as its resolved.

Furthermore regarding doc to epub2, my colleague from Apose.Words will update you soon.

Best Regards,

Hi… We are looking for a product which converts the pdf book to epub2/ebub3. If aspose can convert it to epub3 then it would be really great.


Right now i tried to use aspose pdf api to convert the same to word doc and then word doc to epub 2

pdf -> word -> epub2. Generated epub2 file is not formatted properly as the word document has frames and its messing up. PDF to WORD is not producing a good word document. Each block is covered in MSword frames and i think thats not read in the epub convertor when we use the same document in aspose java api to generate epub file.

Kindly help us in fixing this, if it works properly we will buy the licence.




Hi Thilak,

Thanks for your inquiry. I will answer your query about Aspose.Words.

Could you please attach your Word document and output Epub document showing the undesired behaviour here for testing? I will investigate the issue on my side and provide you more information.

Best regards,

Hi Awais,


This is the document which was converted from pdf api. This document has frames all over it. When i convert the same document to epub using aspose words api its not working as expected.

Please try converting the same docx file to epub and you can see the formatting issues in the html file inside the epub.

Please note :- this docx file is generated from the pdf kit api, i could not upload the pdf file due to copyrights issue.

Thanks
Thilak

Hi Thilak,


Thanks for your inquiry and sorry for the delayed response.

While using the latest version of Aspose.Words i.e. 13.6.0, I managed to reproduce the formatting issues on my side. I have logged your problem in our bug tracking system. The issue ID is WORDSNET-8582. Your request has also been linked to this issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.

Best regards,

Hi Thilak,


Thanks for being patient.

Regarding WORDSNET-8582, it is to update you that our development team has completed the analysis of this issue and has come to a conclusion that they won’t be able to fix this issue and the undesired behaviour you’re observing in output Epub document. So, we will most likely close this issue with ‘Won’t Fix’ resolution. We cannot convert your document correctly to any flow document format like EPUB or HTML because it contains a large number of absolutely positioned frames.

However, this document looks good when it is converted to HTML Fixed format. We can offer you another solution i.e. converting the Word document to the HTML format using absolutely positioned elements (HtmlFixed) produces the expected output. Here is how you can use the new HtmlFixedSaveOptions to get output in HtmlFixed save format and perhaps, you will be satisfied with HTML Fixed format.

Document doc = new Document(MyDir + “in.docx”);

HtmlFixedSaveOptions option = new HtmlFixedSaveOptions();<o:p></o:p>

option.SaveFormat = SaveFormat.HtmlFixed;<o:p></o:p>

doc.Save(MyDir + “out.html”, option);


Please let me know if I can be of any further assistance.

Best regards,