Trouble converting PDF to JPEG

rdaviessci · January 21, 2021, 2:37am

I was testing with the Correspondence.docx file included in my zip. It’s pretty simple though it contains an image.
I have another different problem with the PDF in the zip, where I load it with Aspose.Words and then save it as a Jpeg image with a different name, the jpeg file is blank - using this command:
doc.Save(sFileSrvPath & sImgOutFName, Aspose.Words.SaveFormat.Jpeg)
Thanks for your help on both issues!Correspondence.zip (2.3 MB)

awais.hafeez · January 21, 2021, 5:17am

@rdaviessci,

We have logged the following problem in our issue tracking system:

WORDSNET-21677: Blank Output produced when Converting a PDF with Images to JPEG Format

We will further look into the details of this problem and will keep you updated on the status of correction. We apologize for your inconvenience.

We are unable to observe any issue during converting Correspondence.docx to different formats using latest (21.1) version of Aspose.Words for .NET on our end. Please elaborate your inquiry further by providing complete details along with screenshot(s). This will help us to understand your scenario, and we will be in a better position to address your concerns accordingly.

rdaviessci · January 21, 2021, 6:07pm

I didn’t have a problem with the conversion of Correspondence.docx. The issue was about it taking 10 seconds for Aspose.Words to open the document for the first time.
That was my second question up above, to which you responded, " Can you please ZIP and upload the input Word document (you are getting this problem with ) here for testing?"

awais.hafeez · January 22, 2021, 4:52am

@rdaviessci,

Loading of “Correspondence.docx” into 21.1 version of Aspose.Words’ for the very first time took just 600 milliseconds and saving to PDF took another 1300 milliseconds on my end. Subsequent calls took an average of 425 and 950 milliseconds respectively (loading & saving). I tested this scenario on Windows 10 using .NET Framework 4.6.1.

rdaviessci · January 22, 2021, 11:22pm

Ok, well I don’t know why it is running so slowly on my machine. I also have a 7MB PDF file which takes 30+ seconds to open. My system memory is:
Installed Physical Memory (RAM) 32.0 GB.
I am using this in a scheduled automated process. Hopefully I can work around the delay

Another question: I am processing a multipage PDF with Aspose.Words. When I try to use the page count in the ImageSaveOptions I am getting this error msg:
‘PageCount’ is not a member of ‘Aspose.Cells.ImageSaveOptions’

I use Cells in another subroutine in the VB project. But in this subroutine I am using Words. It doesn’t let me say “Aspose.Words.ImageSaveOptions”. How do I tell it to use the Words ImageSaveOptions that has PageCount? Thanks!

image.png (5.3 KB)

awais.hafeez · January 23, 2021, 6:43am

@rdaviessci,

PageCount and PageIndex properties are no longer part of the ImageSaveOptions Class. We have now added PageSet and PageRange Classes. Sample usage is as follows:

C# example to convert all pages in Word Document to separate PNG image files:

Document doc = new Document(@"C:\Temp\input.docx");
ImageSaveOptions imageSaveOptions = new ImageSaveOptions(SaveFormat.Png);
PageRange pageRange = new PageRange(0, doc.PageCount - 1);
imageSaveOptions.PageSet = new PageSet(pageRange);
imageSaveOptions.PageSavingCallback = new HandlePageSavingCallback();
doc.Save(@"C:\Temp\output.png", imageSaveOptions);

And the definition of Class implementing the IPageSavingCallback interface

private class HandlePageSavingCallback : IPageSavingCallback
{
    public void PageSaving(PageSavingArgs args)
    {
        args.PageFileName = string.Format(@"C:\Temp\Page_{0}.png", args.PageIndex);
    }
}

rdaviessci · January 25, 2021, 6:48pm

3201079-santa-monica-city-net-fiber-2014-2.jpg (209.4 KB)
Thanks Awais, that did the trick.
Another issue: I am turning the pages of the attached PDF into images, but the layers(?) of the PDF are being misaligned in the conversion to .docx (and from there into .jpeg)3201079-santa-monica-city-net-fiber-2014-2.pdf (6.9 MB)
. Is there a way to prevent that?

awais.hafeez · January 26, 2021, 6:04am

@rdaviessci,

We have converted your “3201079-santa-monica-city-net-fiber-2014-2.pdf” to DOCX format and JPEG images and were unable to observe this problem (3201079-santa-monica-city-net-fiber-2014-2.jpg) on the first page on our end. Please provide source code and Aspose.Words generated output file showing the undesired behavior here for our reference.

rdaviessci · January 26, 2021, 11:40pm

ProcessPDF-WordFile.zip (235.5 KB)
Here is the code in a .txt file. And also the Aspose-converted PDF as a .docx. Thank you.
PS - I shortened the Word file so it wasn’t to big to be uploaded.

awais.hafeez · January 27, 2021, 7:57am

@rdaviessci,

Does the following two lines of code produce correct output in DOCX format on your end?

Document doc = new Document("C:\\Temp\\3201079-santa-monica-city-net-fiber-2014-2.pdf");
doc.Save("C:\\Temp\\21.1.docx");

You have provided a source code comprising of 100+ lines; we request you to please provide a simplified source code (comprising of only Aspose.Words related source code) and attach it here for our reference. It would be great if you please also create a standalone simplified Console Application (source code without compilation errors) that helps us to reproduce this problem on our end and attach it here for testing. Please do not include Aspose.Words DLL files in it to reduce the file size.

rdaviessci · January 27, 2021, 10:49pm

I am attaching a zip with a shortened VB.net subroutine for saving the individual pages. Also in the Zip is the same PDF saved as DOCX using these commands (it scrambled the font “layers” as before) :

    Dim doc As Aspose.Words.Document = New Aspose.Words.Document(sFName)
    doc.Save("C:\Oakland\TestUploads\MySubset\Aspose211.docx")

ProcessWordFileShort.zip (233.1 KB)

I am also attaching this separate PDF. The problem with it is that even though it has 20 pages, when I open it doc.PageCount = 1. Is that due to the landscape orientation? Thank you!

eandi.pdf (4.4 MB)

awais.hafeez · January 28, 2021, 8:02am

@rdaviessci,

I have managed to observe this behavior but when using an old 20.11 version of Aspose.Words for .NET. The same code produces correct output when using the latest 21.1 version of Aspose.Words for .NET on my end. Please make sure that you had successfully upgraded to the latest version.

While using the latest 21.1 version of Aspose.Words for .NET, I managed to reproduce this issue on my side. I have logged this issue with ID WORDSNET-21719. Your thread has also been linked to this issue and you will be notified here as soon as it is resolved. Sorry for the inconvenience.

awais.hafeez · January 29, 2021, 3:08am

5 posts were split to a new topic: Installing via NuGet the Aspose.Total for .NET Complete Package containing all .NET File Format APIs offered by Aspose

awais.hafeez · January 29, 2021, 2:45am

A post was split to a new topic: A PDF File cannot be Opened | Aspose.Words C# .NET | It might have Unsupported Format or be Corrupted | Invalid Operation Exception

awais.hafeez · January 29, 2021, 2:58am

A post was split to a new topic: Content Extraction is Restricted by File Permissions | Avoid Exception when Loading Adobe generated PDF with C# .NET Library

rdaviessci · February 5, 2021, 7:48pm

Hi Awais,
I just wanted to let you know we are going to use an Adobe PDF conversion tool for the PDFs. It seems to have fewer issues with embedded photos, etc.
We are still using Aspose.Total for Word and Excel file imaging.
Thanks for your support!

awais.hafeez · February 6, 2021, 8:12am

@rdaviessci,

Please upgrade to the latest 21.2 version of Aspose.Words for .NET as it contains fixes of both the linked issues.

Regarding WORDSNET-21677, we fixed few issues that we found:

The original PDF’s size is too big - 42x56 inches. Because Aspose.Words and MS Word only support up to 22 inches, we had to adjust the logic.
So, the result image after PDF to DOCX to JPEG conversion will be 1584x2112 pixels (while the original pictures from PDF are 3000x4000 pixels).
Because the provided PDF contains only JPEG images, we changed image processing to JpegEncoder instead of PngEncoder - it helped to improve the conversion time from 12 sec down to 8 sec.

We think that you want to convert both PDF pages and the code only handles the first one. Please use ImageSaveOptions to achieve that goal:

var doc = new Document("gti title_resig.pdf");

var options = new ImageSaveOptions(SaveFormat.Jpeg) { PageSet = new PageSet(0) };
doc.Save("gti title_resig_0.jpeg", options);

options.PageSet = new PageSet(1);
doc.Save("gti title_resig_1.jpeg", options);

Please let us know if we can be of any further assistance.

aspose.notifier · February 9, 2021, 4:53pm

The issues you have found earlier (filed as WORDSNET-21677) have been fixed in this Aspose.Words for .NET 21.2 update and this Aspose.Words for Java 21.2 update.

aspose.notifier · February 9, 2021, 4:54pm

The issues you have found earlier (filed as WORDSNET-21719) have been fixed in this Aspose.Words for .NET 21.2 update and this Aspose.Words for Java 21.2 update.

awais.hafeez · March 11, 2021, 8:15am

A post was split to a new topic: Extract Text from PDF File Line by Line and Save Data Values inside SQL Server Database C# .NET