We have requirement to assemble PDF file into docx file. To do the same we are converting PDF file into multiple images (one image per one PDF page) using Aspose’s ImageDevice (JpegDevice). After this we use these images to insert into docx using open xml library. But PDF to images conversion takes too much time, approx. 7.5 seconds for 5 Mb PDF file with 21 pages having text and images. With increasing file size (and complex content) performance degrades heavily.
Can you please let us know if there any better way to fulfill the requirement or are we doing anything wrong here? Below is the sample code of POC done for PDF to image conversion.
Another approach we were thinking that if we can merge two PDFs (main docx is being converted to PDF using Aspose library and we need to merge another PDF to the same) to satisfy above requirement. In open xml we have content control block using which we can know to where we need to insert/merge another file. So in same way is there anything in Aspose library using which we can decide after which page we can merge another PDF’s pages? Please refer below sample code of POC done, here currently we are passing some static value to variable “insertAfterPage”. So can we have some type of tagging in PDF (same as content control block of open xml) so that we can know where to merge another PDF pages?
Please let us know if more information is required. And kindly note that we have already tried Aspose’s PDF to Word converter API but it won’t work for us as it does not maintain format properly in case of complex PDF files.”