Hi, I am trying to set up an app to enable extraction of OneNote sections as individual Word documents, with any attachments embedded within that. Towards this, I have tried Aspose.Note to extract content for each page in a separate HTML file as well as extract the attachments within a subfolder per page. Next, I have tried to convert these exported HTML files to Word documents and embed the extracted attachments. However, it looks like converting from exported HTML to Word loses some formatting and layout. I have also tried saving the pages as PDF and then converting to Word. Formatting is better but there are now issues with text spacing. I also need a way to embed the attachments back into the generated Word documents for each section. I was wondering if you would have any suggestions or snippets to address the OneNote to Word conversion whilst retaining the formatting and the embedded attachments. Many thanks.
Since you are doing it in two steps:
- Extract OneNote sections to HTML via Aspose.Note for .NET
- Export to Word documents from exported HTML via Aspose.Words for .NET
Could you please be specific which step has problems? It seems the second step loses the formatting, correct us if we are wrong. Moreover, we appreciate if you could provide us sample application with sample input documents and output files (HTMLs, PDF, Word documents, etc.) to reproduce the issue on our end. Also, could you please share some screenshots to highlight the problematic areas/issues. We will evaluate your issue and assist you accordingly soon.
OneNoteConversion.zip (8.2 MB)
Hi Amjad,
Thanks for your quick reply. Please see the sample app and input and output documents attached. Its the second step, converting from the exported HTML or PDF to Word, which causes the loss of formatting and layout, as you can see in the final output folders WordOutputFromHTML and WordOutputFromPDF in the attached zip. Happy to work through the example with you, in case anything is unclear. Many thanks, Kapil
Our fellow colleagues from Aspose.Words team will evaluate your issue soon.
@kapilmehtahmt Please note, Aspose.Words is designed to work with MS Word documents at first. MS Word documents are flow documents and they have structure very similar to Aspose.Words Document Object Model. On the other hand PDF documents are fixed page format documents . While loading PDF document Aspose.Words converts Fixed Page Document structure into the Flow Document Object Model. Unfortunately, such conversion does not guaranty 100% fidelity.
I have logged WORDSNET-25449 issue in our defect database. We will check whether whether the behavior can be improved in this particular case.
With HTML the things are similar to PDF, it is also not native format for Aspose.Words. In most cases Aspose.Words mimics MS Word behavior when load HTML documents. But the provided HTML document cannot be opened by MS Word 2019, so we cannot compare. Looking into the document internal structure, I see there are a lot of floating DIV
elements in the document. In MS Word there is no direct analog of DIV
, so such elements are imported as paragraphs. This might cause the layout differences. I am afraid, there is no way to preserve the original document layout after loading such kind of HTML documents.
Hi Alexey and Amjad,
Thanks for your detailed explanation and all of that makes sense. Perhaps I could ask you re the best way to accomplish the task that I have at hand using the Aspose.Net options. The task is that for a specific OneNote notebook, I need to extract each section, page and attachment and then stitch back the extracted section, pages and attachments into a single document per section. This could be in Word, PDF or another format. I’ve accomplished the extraction aspects, but composing them back into a single document whilst preserving the layout seems to be creating a challenge, as you’ve explained above. Wondering if you would have any suggestions or an alternative approach in mind? Many thanks.
I am afraid, there may be not any better way to cope with it. Since Aspose.Words team has already logged a ticket (WORDSNET-25449) for it, so please spare us sometime to evaluate your issue in details.
By the way, did you try to convert OneNote file to PDF directly via Aspose.Note if it makes any difference?
Hi Amjad, the OneNote to PDF conversion with Aspose.Note creates the PDF with the attachments as icons rather than embedded/clickable documents. Word was just an option I was trying out, as it supports embedded documents, but if there is another fixed page format that does support document embedding, then that might be a better option to consider. I am just trying to think of another format that might provide such a support. Thanks
Hi, I would like to try out the Aspose.PDF as an option, where I can try and attach the extracted attachments directly to the exported PDFs instead of converting them to Word. However, it looks like I have exceeded the number of trial licence requests. Would it be possible to send me a trial licence for Aspose.PDF for .Net pls so I can try this out and confirm which products need to be procured. It could be the Aspose.Total suite eventually. Thanks