In Memory Conversion from PDF to DOC using Memorystream Showing Out of Memory Exception



Hi I am converting an PDF file of 1.78 MB without saving it on disk. I am converting it using Memory stream. I am using Aspose PDF version 11.6.0.0.

I am facing Out of Memory exception even after using optimization options.

Please find the attached PDF file and Error Screen Shot with this email/Message

and below is my code.

On the Red Line I am getting the OutOfmemory exception.

MemoryStream inStream = new MemoryStream(grwbkItmdetails.ELEMENT_IMAGE---(PDF file in form of Byte Array));
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(inStream);
MemoryStream WordDocStream = new MemoryStream();
Aspose.Pdf.DocSaveOptions saveOptions = new Aspose.Pdf.DocSaveOptions();
saveOptions.AddReturnToLineEnd = true;
saveOptions.Mode = Aspose.Pdf.DocSaveOptions.RecognitionMode.Flow;
saveOptions.RelativeHorizontalProximity = 2.5f;
saveOptions.RecognizeBullets = true;
saveOptions.Format = Aspose.Pdf.DocSaveOptions.DocFormat.DocX;

pdfDocument.Optimize();
pdfDocument.OptimizeResources(new Aspose.Pdf.Document.OptimizationOptions()
{
AllowReusePageContent = true,
LinkDuplcateStreams = true,
RemoveUnusedObjects = true,
RemoveUnusedStreams = true,
CompressImages = true,
UnembedFonts=true,
ImageQuality = 20
});

pdfDocument.Save(WordDocStream, saveOptions);

Hi Sandeep,


Thanks for your interest in Aspose. I have tested the scenario with your shared code and document using Aspose.Pdf for .NET 11.6.0 on Win 7 64 bit 8GB RAM without any issue. We will appreciate it if you please share a sample console application along with your environment details. We will further investigate the issue and will provide you information accordingly.

We are sorry for the inconvenience caused.

Best Regards,

Hi,


Please find attached console application and 2 PDF files . What I am trying to do is Convert the PDFs to their respective HTML and Insert in one output as Word Document.

But When I am Creating it all the formatting lost in output word file. although however when I am converting PDFs to HTMLs it is working fine.

Please help me in the code to do :

1.)

PDF->HTML
-------> DOCX file(merged HTMLS)
PDF->HTML

2.) Using Only Streams no disk location would be given.

Please help me in the code.

Can Any one please help me what is “RequestURL” string in the following code? I have found this code on Internet for converting PDF-HTML-Word using only streams…



	MemoryStream HTMLStreamFromPDF = new MemoryStream();
List ResourseStreamList = new List();
List ResourceNameList = new List();
MemoryStream CSSStream = new MemoryStream();
Aspose.Pdf.HtmlSaveOptions saveOptions = new Aspose.Pdf.HtmlSaveOptions();
CustomResourcesProcessingBind customResourcesProcessingBind = new CustomResourcesProcessingBind((_1) => CustomResourcesProcessing(ResourseStreamList,ResourceNameList, RequestURL, _1));
saveOptions.CustomResourceSavingStrategy = new Aspose.Pdf.HtmlSaveOptions.ResourceSavingStrategy(customResourcesProcessingBind);
CssUrlCreationCustomStrategyBind cssUrlCreationCustomStrategyBind = new CssUrlCreationCustomStrategyBind((_1) => CssUrlCreationCustomStrategy(RequestURL, _1));
saveOptions.CustomStrategyOfCssUrlCreation = new Aspose.Pdf.HtmlSaveOptions.CssUrlMakingStrategy(cssUrlCreationCustomStrategyBind);
CustomCssSavingProcessingBind customCssSavingProcessingBind = new CustomCssSavingProcessingBind((_1) => CustomCssSavingProcessing(CSSStream, _1));
saveOptions.CustomCssSavingStrategy = new Aspose.Pdf.HtmlSaveOptions.CssSavingStrategy(customCssSavingProcessingBind);
saveOptions.HtmlMarkupGenerationMode = Aspose.Pdf.HtmlSaveOptions.HtmlMarkupGenerationModes.WriteOnlyBodyContent;
PDFDocument.Save(HTMLStreamFromPDF, saveOptions);
    private delegate string CustomResourcesProcessingBind(Aspose.Pdf.SaveOptions.ResourceSavingInfo resourceSavingInfo);

    private static string CustomResourcesProcessing(List<MemoryStream> ResourseStreamList, List<string> ResourceNameList, string RequestURL, Aspose.Pdf.SaveOptions.ResourceSavingInfo resourceSavingInfo)
    {
        MemoryStream newResource = new MemoryStream();
        resourceSavingInfo.ContentStream.CopyTo(newResource);
        ResourceNameList.Add(resourceSavingInfo.SupposedFileName);
        ResourseStreamList.Add(newResource);

        string urlThatWillBeUsedInHtml = RequestURL +"/"+ Path.GetFileName(resourceSavingInfo.SupposedFileName);
        return urlThatWillBeUsedInHtml;
    }
    private delegate string CssUrlCreationCustomStrategyBind(Aspose.Pdf.HtmlSaveOptions.CssUrlRequestInfo requestInfo);

    private static string CssUrlCreationCustomStrategy(string RequestURL,Aspose.Pdf.HtmlSaveOptions.CssUrlRequestInfo requestInfo)
    {
        return RequestURL + "/css_style.css"; 
    }

    private delegate void CustomCssSavingProcessingBind(Aspose.Pdf.HtmlSaveOptions.CssSavingInfo resourceInfo);

    private static void CustomCssSavingProcessing(MemoryStream CSSStream, Aspose.Pdf.HtmlSaveOptions.CssSavingInfo resourceInfo)
    {
        resourceInfo.ContentStream.CopyTo(CSSStream);           
    }</font></font></code></pre></div>

Hi Sandeep,


Thanks for sharing the sample project.

We are working on replicating the issue using recently shared resources and will get back to you soon.

Hi,

As per my understanding your requirement is to convert multiple PDF docuemnts into a single DOCX file, so you can merge PDF documents and render resultant PDF document to DOCX file as following. However I am afraid the conversion is not working as expected. The resultant file has incorrect images and formatting, so we have logged following tickets in our issue tracking system for further investigation and rectification. We will keep you update about the issues resolution progress.

PDFNEWNET-40889: PDF to DOCX renders images incorrectly

PDFNEWNET-40890: PDF to DOCX text formatting issue

Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(inputStream1);
Aspose.Pdf.Document pdfDocument1 = new Aspose.Pdf.Document(inputStream1);

MemoryStream ms = new MemoryStream();

pdfDocument.Pages.Add(pdfDocument1.Pages);
pdfDocument.Save(ms);

pdfDocument = new Document(ms);

MemoryStream WordDocStream = new MemoryStream();

Aspose.Pdf.DocSaveOptions saveOptions = new Aspose.Pdf.DocSaveOptions
{
    Mode = Aspose.Pdf.DocSaveOptions.RecognitionMode.Flow,
    RelativeHorizontalProximity = 2.5f,
    RecognizeBullets = true,
    Format = Aspose.Pdf.DocSaveOptions.DocFormat.DocX
};

pdfDocument.Optimize();
pdfDocument.OptimizeResources(new Aspose.Pdf.Document.OptimizationOptions
{
    AllowReusePageContent = true,
    LinkDuplcateStreams = true,
    RemoveUnusedObjects = true,
    RemoveUnusedStreams = true,
    CompressImages = true,
    UnembedFonts = true,
    ImageQuality = 20
});

pdfDocument.Save(WordDocStream, saveOptions);

Furthermore, I have noticed you are evaluating Aspose without a valid license. You may request a 30 days temporary license file, it will help you to evaluate Aspose.Pdf without the evaluation limitation.

We are sorry for the inconvenience caused.

Best Regards,

Hi Sandeep,

vermasandeep:
Can Any one please help me what is "RequestURL" string in the following code? I have found this code on Internet for converting PDF-HTML-Word using only streams...


	MemoryStream HTMLStreamFromPDF = new MemoryStream();
        List  ResourseStreamList = new List();
        List ResourceNameList = new List();
        MemoryStream CSSStream = new MemoryStream();
        Aspose.Pdf.HtmlSaveOptions saveOptions = new Aspose.Pdf.HtmlSaveOptions();
        CustomResourcesProcessingBind customResourcesProcessingBind = new CustomResourcesProcessingBind((_1) => CustomResourcesProcessing(ResourseStreamList,ResourceNameList, RequestURL, _1));
        saveOptions.CustomResourceSavingStrategy = new Aspose.Pdf.HtmlSaveOptions.ResourceSavingStrategy(customResourcesProcessingBind);
        CssUrlCreationCustomStrategyBind cssUrlCreationCustomStrategyBind = new CssUrlCreationCustomStrategyBind((_1) => CssUrlCreationCustomStrategy(RequestURL, _1));
        saveOptions.CustomStrategyOfCssUrlCreation = new Aspose.Pdf.HtmlSaveOptions.CssUrlMakingStrategy(cssUrlCreationCustomStrategyBind);
        CustomCssSavingProcessingBind customCssSavingProcessingBind = new CustomCssSavingProcessingBind((_1) => CustomCssSavingProcessing(CSSStream, _1));
        saveOptions.CustomCssSavingStrategy = new Aspose.Pdf.HtmlSaveOptions.CssSavingStrategy(customCssSavingProcessingBind);
        saveOptions.HtmlMarkupGenerationMode = Aspose.Pdf.HtmlSaveOptions.HtmlMarkupGenerationModes.WriteOnlyBodyContent;
        PDFDocument.Save(HTMLStreamFromPDF, saveOptions);
    private delegate string CustomResourcesProcessingBind(Aspose.Pdf.SaveOptions.ResourceSavingInfo resourceSavingInfo);

    private static string CustomResourcesProcessing(List<MemoryStream> ResourseStreamList, List<string> ResourceNameList, string RequestURL, Aspose.Pdf.SaveOptions.ResourceSavingInfo resourceSavingInfo)
    {
        MemoryStream newResource = new MemoryStream();
        resourceSavingInfo.ContentStream.CopyTo(newResource);
        ResourceNameList.Add(resourceSavingInfo.SupposedFileName);
        ResourseStreamList.Add(newResource);

        string urlThatWillBeUsedInHtml = RequestURL +"/"+ Path.GetFileName(resourceSavingInfo.SupposedFileName);
        return urlThatWillBeUsedInHtml;
    }
    private delegate string CssUrlCreationCustomStrategyBind(Aspose.Pdf.HtmlSaveOptions.CssUrlRequestInfo requestInfo);

    private static string CssUrlCreationCustomStrategy(string RequestURL,Aspose.Pdf.HtmlSaveOptions.CssUrlRequestInfo requestInfo)
    {
        return RequestURL + "/css_style.css"; 
    }

    private delegate void CustomCssSavingProcessingBind(Aspose.Pdf.HtmlSaveOptions.CssSavingInfo resourceInfo);

    private static void CustomCssSavingProcessing(MemoryStream CSSStream, Aspose.Pdf.HtmlSaveOptions.CssSavingInfo resourceInfo)
    {
        resourceInfo.ContentStream.CopyTo(CSSStream);           
    }</font></code></pre></div><font face="Arial" size="2"></div></BLOCKQUOTE></font><div><font face="Arial" size="2"><br></font></div><div><font face="Arial" size="2">Thanks for your inquiry. It seems it is customized code of some user and he is referring his base path as RequestURL string variable.</font></div><div><font face="Arial" size="2"><br></font></div><div><font face="Arial" size="2">However, if you want to convert PDF to HTML and saving resources in different stream objects then please check following documentation link for the purpose.</font></div><div><font face="Arial" size="2"><br></font></div><div><ul><li><a href="http://www.aspose.com/docs/display/pdfnet/PDF+to+HTML+-+Save+HTML%2C+CSS%2C+Image%2C+and+Font+Resources+in+Stream+Object"><font face="Arial" size="2">Convert PDF to HTML - Save HTML, CSS, Image and Font resources in different Stream object.</font></a></li></ul></div><div><font face="Arial" size="2"><br></font></div><div><font face="Arial" size="2">And if you want to convert PDF to a single HTML stream with embedded resources then please refer <b>second section</b> of following documentation.</font></div><div><font face="Arial" size="2"><br></font></div><div><ul><li><a href="http://www.aspose.com/docs/display/pdfnet/PDF+to+HTML+-+Save+Output+to+a+Stream+Object"><font face="Arial" size="2">Convert PDF to HTML - save output in stream object.</font></a></li></ul></div><div><font face="Arial" size="2"><br></font></div><div><font face="Arial" size="2">Please feel free to contact us for any further assistance.</font></div><div><font face="Arial" size="2"><br></font></div><div><font face="Arial" size="2">Best Regards,</font></div>