We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Large Doc size when converted from PDF

Hi,


Using the following code, I have found that the resulting .doc file size is approximately 10 times larger than the original .pdf size (74Kb vs 777Kb). Is there any way to convert the Pdf into a document without increasing the size to such an extent. Many thanks!

// path of input PDF document
String filePath = @“C://Source.pdf”;
// Instantiate the Document object
Aspose.Pdf.Document document = new Aspose.Pdf.Document(filePath);
// create DocSaveOptions object
document.Save(@“C://Source.doc”);

Hi Mike,


Thanks for your inquiry. Could you please share your sample document? So we will test it at our end and will share here our findings.

Best Regards,

Hi Tilal,


Thanks for your response and offer to help. It’s much appreciated. Attached is the original PDF (43Kb) as well as the converted Doc (963Kb).

Also the last line in my code sample should be

document.Save(@“C://Source.doc”,Aspose.Pdf.SaveFormat.Doc);

not

document.Save(@“C://Source.doc”);


Thanks again!

Hi Mike,


<span style=“font-size:10.0pt;font-family:“Arial”,“sans-serif””>Thanks
for sharing the resource files.

I
have tested the scenario and I am able to reproduce the same problem. For the
sake of correction, I have logged it in our issue tracking system as
PDFNEWNET-34718. We
will investigate this issue in details and will keep you updated on the status
of a correction.

We
apologize for your inconvenience.

Hi Nayyer,


Thanks for the feedback. Is it typical for the document to increase in size to this degree during a pdf-doc conversion? Or is there perhaps, something I can do prior to conversion that would reduce the size of the resultant .doc file?

Also, do you have some approximation on when your investigation will be complete? Would something like this typically take days, weeks, months?

Thanks again,
Mike

mike O:
Thanks for the feedback. Is it typical for the document to increase in size to this degree during a pdf-doc conversion? Or is there perhaps, something I can do prior to conversion that would reduce the size of the resultant .doc file?
Hi Mike,

The size of resultant doc file depends upon the objects/contents present in source PDF file and also based upon the components i.e. Font, text formatting etc being used in PDF document. In another scenario, I have tried converting 35KB PDF file to DOC format and as a result a file of 37KB is generated.

mike O:
Also, do you have some approximation on when your investigation will be complete? Would something like this typically take days, weeks, months?
The team already has list of issues and list of feature which they need to support and all issues are fixed based on particular schedule. Furthermore, the issues with highest Priority are fixed first as compare to the issues with normal resolution priority.

Until or unless development team has investigated this issue, I am afraid its quite difficult to share the timelines by which it will be resolved. Also the time to resolve any issue depends upon the complexity of that scenario and in case current case, team will have to investigate the reasons by which the size of output file is being increased to such a huge amount.

Please be patient and spare us little time. Your patience and comprehension is greatly appreciated in this regard. We are sorry for this delay and inconvenience.

Hi Nayyer,


Thanks again for your feedback. I just want to confirm that there is nothing I can do from a user standpoint to reduce the size of the resultant file. For example can I “clean” the file using Aspose.PDF (resulting in a less complicated pdf file) before I attempt to convert?

I understand that escalating this issue would take significant time and I want to make sure I cover all bases for any other possible solutions before I await feedback from development.

All the best,
Mike

Hi Mike,


We have a method named OptimizeResources() present in Document which optimizes the size of resultant document by eliminating/removing resources which are not being used, equal resources are joined into one object and unused objects are removed. But this method is not working when output format is other than PDF. I have also intimated this point to development team and they will do consider this during the resolution of this issue.

We are sorry for this inconvenience.

PS, We have a component named Aspose.Words which provides the feature create as well as manipulate existing word documents. I am not sure if it supports the feature to reduce/optimize the size of word document. I will ask my fellow worker from respective team to further share the details regarding this requirement.

Hi Mike,


I am representative of Aspose.Words team.

There is no direct way to reduce/optimize the size of word document. However, you can use the following code snippet as a workaround for your issue. First save the output Doc file (exported via Aspose.PDF) to Docx by using Aspose.Words and then save to Doc file format.

I have attached the output Doc file with this post for your kind reference.

<span lang=“EN-GB” style=“font-size:10.0pt;font-family:“Courier New”;color:#2B91AF;background:
silver;mso-highlight:silver;mso-font-kerning:0pt;mso-no-proof:yes”>MemoryStream<span lang=“EN-GB” style=“font-size:10.0pt;font-family:“Courier New”;background:silver;
mso-highlight:silver;mso-font-kerning:0pt;mso-no-proof:yes”> DocStream = new MemoryStream();<o:p></o:p>

Aspose.Pdf.Document pdf = new Aspose.Pdf.Document(MyDir + "Source.pdf");

Aspose.Pdf.DocSaveOptions saveOptions = new Aspose.Pdf.DocSaveOptions();

saveOptions.Mode = Aspose.Pdf.DocSaveOptions.RecognitionMode.Flow;

pdf.Save(MyDir + @"PDF_out.doc", saveOptions);

//Load the output Doc file generated by Aspose.PDF

Document doc = new Document(MyDir + @"PDF_out.doc");

//Save the document to Docx file format

doc.Save(MyDir + @"Wordout.docx", SaveFormat.Docx);

//Load the output Docx

doc = new Document(MyDir + @"Wordout.docx");

//Save the Docx file to Doc

doc.Save(MyDir + @"Finalout.doc");


The issues you have found earlier (filed as PDFNEWNET-34718) have been fixed in Aspose.Pdf for .NET 9.5.0.


This message was posted using Notification2Forum from Downloads module by Aspose Notifier.

The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan