Convert Pdf to DocX using Aspose.PDF for Java - Conversion is slow and results missing content

We are using the latest version of ASPOSE PDF for Java. When converting an existing PDF to DOCX the resulting document is missing content (See supplied files). In the PDF each page has numbers that are not coming across in the resulting document.


Also we are not understanding why the conversion process is so slow. Opening an existing PDF and saving it as a DOCX is taking well over 30 seconds. This is NOT going to be an acceptable implementation for our customers who expect to be able to click on a button and get the resulting document in several seconds as well as expecting ALL of the content to be converted.


********************START SOURCE CODE

GetAsposeLicense sl = new GetAsposeLicense();
sl.setLicense(session, “PDF”);
System.out.println(“License is set”);
Document doc = new Document(“c:\temp\2015 D&O DemoCo User.pdf”);
// Instantiate Doc SaveOptions instance
DocSaveOptions saveOptions = new DocSaveOptions();
// Set the recognition mode as Flow
saveOptions.setMode(DocSaveOptions.RecognitionMode.Flow);
// Set output file format as DOCX
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
// Save resultant DOCX file
System.out.println(“Start conversion”);
doc.save(“c:\temp\2015 D&O DemoCo User.docx”, saveOptions);
System.out.println(“End conversion”);
doc.close();
MemoryCleaner.clear();


********************END SOURCE CODE

Hi Paul,

Thanks for contacting support.

paul.calhoun:
We are using the latest version of ASPOSE PDF for Java. When converting an existing PDF to DOCX the resulting document is missing content (See supplied files). In the PDF each page has numbers that are not coming across in the resulting document.

I have tested the scenario with your shared code snippet and was unable to observe the issue. The page numbers were present in the resultant file generated over our end as well as in the file which you have shared. For your reference, I have also attached an output generated over our end.

paul.calhoun:
Also we are not understanding why the conversion process is so slow. Opening an existing PDF and saving it as a DOCX is taking well over 30 seconds. This is NOT going to be an acceptable implementation for our customers who expect to be able to click on a button and get the resulting document in several seconds as well as expecting ALL of the content to be converted.

I have tested the scenario in an environment (i.e Windows 10 EN x64, 8GB of RAM, Eclipse Neon 4.6.2, JDK 1.8, Aspose.Pdf for Java 17.5 ) and the code execution time was only 14 seconds. Please check following code snippet which I have tried to check the total time spent while code execution.

long start = System.currentTimeMillis();

Document doc = new Document(dataDir + “2015 D & O DemoCo User.pdf”);

// Instantiate Doc SaveOptions instance

DocSaveOptions saveOptions = new DocSaveOptions();

// Set the recognition mode as Flow

saveOptions.setMode(DocSaveOptions.RecognitionMode.Flow);

// Set output file format as DOCX

saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);

// Save resultant DOCX file

System.out .println(“Start conversion”);

doc.save(dataDir + “2015 D & O DemoCo User2.docx”, saveOptions);

System.out .println(“End conversion”);

doc.close();

System.out .println("Spending time: " + (System.currentTimeMillis() - start) / 1000 + “sec”);

We will really appreciate if you please share you environment details, so that we can test the scenario in specific environment and address it accordingly. Moreover if you find any other difference between input and output files, please let us know.

Best Regards,

My apologies. I was not clear. It's NOT the page numbers but number on the pages. If you look on page two of the original PDF I supplied you will see a Number one (1) next to the first input field that contains the text Russell Maher.

It's also missing the "2" on page 3 and the "4" on page 5 and the "6" on page 7.

As far as processing time is concerned, it seems as if you are telling me that the best performance we will get is OVER 1 second per page (the supplied file we sent was 10 pages).

So if we have a 50 page PDF to convert it will take well over one minute to have the file available for download.

14 seconds is an eternity in internet time. :)

Over a minute is will never be an acceptable wait time.


Hi Paul,


Thanks for sharing details.

paul.calhoun:
I was not clear. It’s NOT the page numbers but number on the pages. If you look on page two of the original PDF I supplied you will see a Number one (1) next to the first input field that contains the text Russell Maher.

I was able to observe the same issue in our environment and have logged an issue as PDFJAVA-36819 in our issue tracking system.

paul.calhoun:
So if we have a 50 page PDF to convert it will take well over one minute to have the file available for download.

14 seconds is an eternity in internet time. :slight_smile:

Over a minute is will never be an acceptable wait time.

I have tested the scenario with PDF of 50 pages and noticed that the API took 53 seconds for conversion process. Hence, I have logged a performance related issue as PDFJAVA-36820 in our issue tracking system.

We will further look into the details of both logged issues and keep you updated with the status of their correction. Please be patient and spare us little time. We are sorry for the inconvenience.


Best Regards,

I really NEED to have an update status on this OTHER than “please spare us some time” We are trying to run our business with these tools and they continue to fail. I NEED to know if this is HAS been addressed? Is BEING addressed. Or if like our previous bug reports it will be six months before there is any type of fix or solution

@paul.calhoun

Thanks for your inquiry.

I would like to share with you that the investigation of above issue has been started and product team is working over fixing it. As soon as they fix it as per their development schedule, we will update you within this forum thread.

I am afraid that this performance related issue has not been reviewed yet as relevant team has been busy in resolving other reported issues in the queue. We are sure that depending upon their development schedule, they will plan to investigate it and we will let you know once we receive some updates in this regard. Please be patient.

We are sorry for this inconvenience.

I received an email from Aspose Sales today for our annual renewal. I have having a difficult time justifying the price of the renewing this software that contains FLAWS that keep it from being usable in production. If you can not CONFIRM we will have a resolution to this ( and other posted issues with ASPOSE PDF) in the next two weeks then I don’t think it is unreasonable to request an extension of renewing our license until this IS resolved.

@paul.calhoun

Thanks for contacting support.

We humbly apologize for the delay and inconvenience which you are facing. Please note that every issue occupies equal significance and importance from us, however they have been resolved on first come first serve basis. There are other pending issues in the queue already and they were reported prior to yours. Product has been working their best to resolve them as well as adding new features and enhancements to the API.

Therefore, we have recorded your recent concerns and shared with relevant team as well. As soon as we have some significant updates regarding resolution progress of your issues, we will certainly share with you. Your patience and comprehension is highly appreciated in this regard. Please spare us little time.

We are sorry for the inconvenience.

So it has been another MONTH and no update. I would like a status update please that DOES NOT INCLUDE ASKING ME TO BE PATIENT. I believe I have been patient enough. Just get me a status update so I can make the decision to wait for ya’ll or go with another product.

@paul.calhoun

Thanks for your inquiry.

We are checking details with our product team, regarding your issue and will get back to you shortly.

@paul.calhoun,

Thanks for your patience and sorry for the delayed response.

Please note that issues are resolved in first come first serve basis as we believe it is the fairest policy to all the customers. However concerning to your reported issues, we recently noticed them in the month of June and they are still pending for review as the team has been busy fixing/investigating other previously reported issues. However as soon as we have some definite updates regarding its resolution, we will let you know.

PS, meanwhile I have also recorded your concerns with product team and they will consider them during the issue prioritization.

Seriously. It has been THREE MONTHS since this was reported and identified as a bug. THREE MONTHS. For clarification we run a SAS and if one of our paying customers reported a bug that we provided NO update on for THREE MONTHS we would no longer have any customers. I’m not sure how many developers you have or what your prioritization matrix is but I would hope that a bug that kept the product from doing what it is designed to do would somehow take a priority.

@paul.calhoun

Thanks for writing us back and sorry for the delayed response.

We humbly apologize for the delay regarding resolution of your issues, and the inconvenience which you have been facing. Please note that we do realize the severity of the issues but due to large number of pending issues in the queue, your issues have not been resolved.

However, we have raised your concerns to escalate the issues to the next level. As soon as we receive some feedback from relevant team, we will definitely inform you. We highly appreciate your comprehension in this regard.

We are sorry for the inconvenience caused.

It has been another two weeks with NO status update. Please provide an update.

@paul.calhoun

Thanks for your inquiry.

Our product team has shared their feedback regarding issue i.e PDFJAVA-36819, that this is expected to be fixed in 17.10 version of the API.

However, we are sorry that we do not have any definite updates regarding resolution progress of the issue i.e PDFJAVA-36820. As we have already intimated the respective team about your concerns, so they will surely plan to provide a fix against this issue, as per their development schedule.

We greatly appreciate your patience and cooperation in this matter. Please spare us little time.

We are sorry for the inconvenience.

@paul.calhoun

Thanks for your patience.

We are pleased to inform you that earlier reported issue PDFJAVA-36819, has been resolved in latest version Aspose.Pdf for Java 17.10.

Please try using the latest release version and in case you face any issue, please feel free to contact.

The issues you have found earlier (filed as PDFJAVA-36820) have been fixed in Aspose.PDF for Java 20.2.