Cpu 100%

When converting a PDF to DOC/DOCX our CPU spikes to 100% and stays there.


We have tried with 3 different documents (attached) and this is replicated in a consistent way.

As it stands we do not believe ASPose PDF is a commercially viable product as it does not reliably convert PDF documents to WORD format. Our only option is to use this forum to report this and await your feedback or source a different product.

This is the code we have used for the conversions:

Dim pdf as New Aspose.Pdf.Document(sUNCFilePath)<o:p></o:p>

Dim pdfOptions As New Aspose.Pdf.HtmlSaveOptions

'pdfOptions.SaveFormat = Aspose.Pdf.SaveFormat.Html

pdfOptions.FixedLayout = True

pdfOptions.SplitIntoPages = False

pdfOptions.CompressSvgGraphicsIfAny = False

'pdfOptions.HtmlImageSavingInfo()

'pdfOptions.HtmlImageType =

Aspose.Pdf.HtmlSaveOptions.HtmlImageType.Png

pdfOptions.SpecialFolderForAllImages =

System.IO.Path.GetDirectoryName(sDestinationFile)

pdfOptions.SpecialFolderForSvgImages =

System.IO.Path.GetDirectoryName(sDestinationFile)

pdfOptions.DocumentType = Aspose.Pdf.HtmlDocumentType.Html5

pdf.Save(sDestinationFile, pdfOptions)

'Aspose.Pdf.SaveFormat.Html)




Looking forward to your reply.


Hi Francisco,


Thanks for contacting support.

From your above problem description, you are facing an issue while converting PDF file to DOCX format but the code snippet which you have shared above is for PDF to HTML conversion. In order to test the scenario, I have performed the PDF to DOCX conversion using latest release of Aspose.Pdf for .NET 10.5.0 in Visual Studio 2012 application with target framework as .NET Framework 4.0, running over Windows 7 (x64) where I have intel 3.4 Ghz processor with 8 GB of RAM and as per my observations, 2-curriculum-vitae-vincenzo-cavallo.pdf and Cloud Computing - Partner and Customer Introduction.pdf are properly converted to DOCX format and I did not notice any intense memory/CPU utilization during conversion.

However I have observed that the conversion process to transform CV_LUDOVIC_VALLAT_EN_2014_V3.pdf file to DOCX format hangs and does not end over same configuration. For
the sake of correction, I have logged it in our issue tracking system as PDFNEWNET-38921. We
will investigate this issue in details and will keep you updated on the status
of a correction.

We apologize for your inconvenience.

Thank you for your reply.


You are correct, we had issues converting PDF to HTML.
Can you please perform the tests again and confirm the CPU usage?

Also can you please confirm if the images convert correctly into HTML, as this did not work for us either?

Thank You.

Hi Francisco,


Thanks for sharing the feedback.

I have also tested the PDF to HTML conversion scenario and have shared my findings in your other forum thread. Should you have any further query, please feel free to contact.

Hello,


We do not have an update on this thread.

Our PDF to DOC/DOCX conversion issue persists.
We experience 100% CPU utilisation and have to kill the process.
This affects all users in the system as the CPU becomes freezes.

This happens with various CVs.
Attached is one of these. We challenge you to resolve this and to parse this CV into a DOC/DOCX.

We do not need the images…we just want the TEXT, the WORDS.
That’s the only thing that we need please…

We have a commercial product which is being used by clients and we are losing business as a result of failing to have a technical resolution to convert PDF to Word. Considering we are paying ASPOSE for a product and a service, are you going to limit your support to typing responses here or are you going to address the underlying technical issue please?

Converting PDF to DOC/DOCX is one of the most critical features our customers need on a day to day basis. Each user has dozens of PDF files arriving into their inboxes which simply cannot be converted?!

Why would your PDF to DOC/DOCX converter simply hang/freeze and kill the CPU? Would it not be easier to deal with this gracefully, so that the machine would not freeze and we could instead notify the user that his CV was not successfully converted? Killing a CPU looks like bad programming methodology do you not agree?

Thank You.
Francisco

Hi Francisco,


Thanks for your inquiry. I have tested the PDF to DOC scenario using following code with Aspose.Pdf for .NET 10.8.0 and Win7 8GB RAM, but unable to subjected 100% CPU usage issue. My system CPU usage remain 20 to 55. However, your input file is big so it took around 3 min. Aspose.Pdf do not use any temporary files for processing but memory. Please download and try latest version hopefully it will improve the situation.

// Path of input PDF document<o:p></o:p>

String filePath = myDir + "CV_LUDOVIC_VALLAT_EN_2014_V3.pdf";

// Instantiate the Document object

Aspose.Pdf.Document document = new Aspose.Pdf.Document(filePath);

// Create DocSaveOptions object

DocSaveOptions saveOptions = new DocSaveOptions();

// Set the recognition mode as Flow

saveOptions.Mode = DocSaveOptions.RecognitionMode.Flow;

// Set the Horizontal proximity as 2.5

//saveOptions.RelativeHorizontalProximity = 2.5f;

// Enable the value to recognize bullets during conversion process

saveOptions.RecognizeBullets = true;

// Save the resultant DOC file

document.Save(myDir + "CV_LUDOVIC_VALLAT_EN_2014_V3.doc", saveOptions);

Please feel free to contact us for any further assistance.


Best Regards,