Trouble converting PDF to JPEG

I am using VS 2013 .Net 4.5 and need to convert PDF to JPEG. I get a message saying to use Words 4.6 or Standard to open PDF file. If I add netstandard2.0 Words reference it says “Reference required to assembly ‘netstandard, Version=2.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51’ containing the implemented interface ‘System.Collections.Generic.IEnumerable`1’. Add one to your project.”
I already have Imports System.Collections.Generic. What’s missing?

@rdaviessci,

Starting from the 20.2 release, Aspose.Words for .NET supports loading/processing PDF file format in its .NET Standard variant (i.e. use the DLLs from netstandard2.0 folder ). After that, starting from the 20.4 release, Aspose.Words also provides DLL for .NET 4.6.1 (i.e. use DLLs from net461 folder ) to read/load/process PDF documents. I suggest you to please create a new Visual Studio project, set the target framework to 4.6.1 and finally install latest (21.1) version of Aspose.Words for .NET API from NuGet.

Thanks so much, Awais. I wasn’t aware that I could target a later framework than the one that came with the VS version. It works.
One more question: When I load an Aspose.Words.Document -
Dim doc As Aspose.Words.Document = New Aspose.Words.Document(sFName)

  • for the first time it takes about 10 seconds. Subsequent is faster. Can I speed up that first time?

@rdaviessci,

I think, Aspose.Words needs to precache fonts and other resources for rendering or updating page layout. This only happens for the first time when you convert to PDF or call Document.UpdatePageLayout method or invoke Document.PageCount property. Can you please ZIP and upload the input Word document (you are getting this problem with) here for testing? We will then investigate the issue on our end and provide you more information.

I was testing with the Correspondence.docx file included in my zip. It’s pretty simple though it contains an image.
I have another different problem with the PDF in the zip, where I load it with Aspose.Words and then save it as a Jpeg image with a different name, the jpeg file is blank - using this command:
doc.Save(sFileSrvPath & sImgOutFName, Aspose.Words.SaveFormat.Jpeg)
Thanks for your help on both issues!Correspondence.zip (2.3 MB)

@rdaviessci,

We have logged the following problem in our issue tracking system:

  • WORDSNET-21677: Blank Output produced when Converting a PDF with Images to JPEG Format

We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

We are unable to observe any issue during converting Correspondence.docx to different formats using latest (21.1) version of Aspose.Words for .NET on our end. Please elaborate your inquiry further by providing complete details along with screenshot(s). This will help us to understand your scenario, and we will be in a better position to address your concerns accordingly.

I didn’t have a problem with the conversion of Correspondence.docx. The issue was about it taking 10 seconds for Aspose.Words to open the document for the first time.
That was my second question up above, to which you responded, " Can you please ZIP and upload the input Word document (you are getting this problem with ) here for testing?"

@rdaviessci,

Loading of “Correspondence.docx” into 21.1 version of Aspose.Words’ for the very first time took just 600 milliseconds and saving to PDF took another 1300 milliseconds on my end. Subsequent calls took an average of 425 and 950 milliseconds respectively (loading & saving). I tested this scenario on Windows 10 using .NET Framework 4.6.1.

Ok, well I don’t know why it is running so slowly on my machine. I also have a 7MB PDF file which takes 30+ seconds to open. My system memory is:
Installed Physical Memory (RAM) 32.0 GB.
I am using this in a scheduled automated process. Hopefully I can work around the delay

Another question: I am processing a multipage PDF with Aspose.Words. When I try to use the page count in the ImageSaveOptions I am getting this error msg:
‘PageCount’ is not a member of ‘Aspose.Cells.ImageSaveOptions’

I use Cells in another subroutine in the VB project. But in this subroutine I am using Words. It doesn’t let me say “Aspose.Words.ImageSaveOptions”. How do I tell it to use the Words ImageSaveOptions that has PageCount? Thanks!

image.png (5.3 KB)

@rdaviessci,

PageCount and PageIndex properties are no longer part of the ImageSaveOptions Class. We have now added PageSet and PageRange Classes. Sample usage is as follows:

C# example to convert all pages in Word Document to separate PNG image files:

Document doc = new Document(@"C:\Temp\input.docx");
ImageSaveOptions imageSaveOptions = new ImageSaveOptions(SaveFormat.Png);
PageRange pageRange = new PageRange(0, doc.PageCount - 1);
imageSaveOptions.PageSet = new PageSet(pageRange);
imageSaveOptions.PageSavingCallback = new HandlePageSavingCallback();
doc.Save(@"C:\Temp\output.png", imageSaveOptions);

And the definition of Class implementing the IPageSavingCallback interface


private class HandlePageSavingCallback : IPageSavingCallback
{
    public void PageSaving(PageSavingArgs args)
    {
        args.PageFileName = string.Format(@"C:\Temp\Page_{0}.png", args.PageIndex);
    }
}

3201079-santa-monica-city-net-fiber-2014-2.jpg (209.4 KB)
Thanks Awais, that did the trick.
Another issue: I am turning the pages of the attached PDF into images, but the layers(?) of the PDF are being misaligned in the conversion to .docx (and from there into .jpeg)3201079-santa-monica-city-net-fiber-2014-2.pdf (6.9 MB)
. Is there a way to prevent that?

@rdaviessci,

We have converted your “3201079-santa-monica-city-net-fiber-2014-2.pdf” to DOCX format and JPEG images and were unable to observe this problem (3201079-santa-monica-city-net-fiber-2014-2.jpg) on the first page on our end. Please provide source code and Aspose.Words generated output file showing the undesired behavior here for our reference.

ProcessPDF-WordFile.zip (235.5 KB)
Here is the code in a .txt file. And also the Aspose-converted PDF as a .docx. Thank you.
PS - I shortened the Word file so it wasn’t to big to be uploaded.

@rdaviessci,

Does the following two lines of code produce correct output in DOCX format on your end?

Document doc = new Document("C:\\Temp\\3201079-santa-monica-city-net-fiber-2014-2.pdf");
doc.Save("C:\\Temp\\21.1.docx");

You have provided a source code comprising of 100+ lines; we request you to please provide a simplified source code (comprising of only Aspose.Words related source code) and attach it here for our reference. It would be great if you please also create a standalone simplified Console Application (source code without compilation errors) that helps us to reproduce this problem on our end and attach it here for testing. Please do not include Aspose.Words DLL files in it to reduce the file size.

I am attaching a zip with a shortened VB.net subroutine for saving the individual pages. Also in the Zip is the same PDF saved as DOCX using these commands (it scrambled the font “layers” as before) :

    Dim doc As Aspose.Words.Document = New Aspose.Words.Document(sFName)
    doc.Save("C:\Oakland\TestUploads\MySubset\Aspose211.docx")

ProcessWordFileShort.zip (233.1 KB)

I am also attaching this separate PDF. The problem with it is that even though it has 20 pages, when I open it doc.PageCount = 1. Is that due to the landscape orientation? Thank you!

eandi.pdf (4.4 MB)

@rdaviessci,

I have managed to observe this behavior but when using an old 20.11 version of Aspose.Words for .NET. The same code produces correct output when using the latest 21.1 version of Aspose.Words for .NET on my end. Please make sure that you had successfully upgraded to the latest version.

While using the latest 21.1 version of Aspose.Words for .NET, I managed to reproduce this issue on my side. I have logged this issue with ID WORDSNET-21719. Your thread has also been linked to this issue and you will be notified here as soon as it is resolved. Sorry for the inconvenience.

5 posts were split to a new topic: Installing via NuGet the Aspose.Total for .NET Complete Package containing all .NET File Format APIs offered by Aspose

A post was split to a new topic: A PDF File cannot be Opened | Aspose.Words C# .NET | It might have Unsupported Format or be Corrupted | Invalid Operation Exception

A post was split to a new topic: Content Extraction is Restricted by File Permissions | Avoid Exception when Loading Adobe generated PDF with C# .NET Library

Hi Awais,
I just wanted to let you know we are going to use an Adobe PDF conversion tool for the PDFs. It seems to have fewer issues with embedded photos, etc.
We are still using Aspose.Total for Word and Excel file imaging.
Thanks for your support!