Form Fields not filled in when converting PDF to JPG

jefferydronsella · September 4, 2014, 8:50am

When I convert a PDF to JPG, the form fields are not filled out. When I look at the same PDF in Adobe reader the fields are filled out correctly. (I have attached images of both)

Here is my code:

Dim doc As New Aspose.Pdf.Document(fileName)
Using memStream As New IO.MemoryStream()

	'create Resolution object

	Dim resolution As New Aspose.Pdf.Devices.Resolution(300)

'create JPEG device with specified attributes (Width, Height, Resolution, Quality)

'Quality [0-100], 100 is Maximum

Dim jpegDevice As New Aspose.Pdf.Devices.JpegDevice(resolution, 100)
<span style="color:green;">'convert a particular page and save the image to stream</span>
jpegDevice.Process(doc.Pages(page), memStream)

	Return memStream.ToArray()

End Using

codewarior · September 5, 2014, 4:15am

Hi Jeff,

Thanks for contacting support.

I
have tested the scenario and have observed that input document is XFA form and in order to convert it to Image format, first we need to convert XFA to Standard AcroFrom. However I am still able to reproduce the same problem that fields value/data is missing in image file. For the
sake of correction, I have logged it in our issue tracking system as PDFNEWNET-37448. We will
investigate this issue in details and will keep you updated on the status of a
correction. <o:p></o:p>

We apologize for your inconvenience.

ChrisF · December 8, 2015, 6:36am

We appear to be having the same issue.

Has there been any progress on fixing it as this is a serious issue for us?

Do you have an estimate of when it will be fixed?

I have tried the latest version of Aspose.Pdf.

Thanks.

codewarior · December 9, 2015, 3:18am

Hi Chris,

Thanks for contacting support.

I am afraid the earlier reported issues is not yet resolved. However if you are also facing similar problem, please share your source file as the problem might be related to structure and complexity of input file you are using and scenario varies from file to file. We are sorry for your inconvenience.

Wymsical · May 3, 2021, 7:31pm

We recently ran into the same problem. has the issue been addressed?
Our Aspose software version is 18.5.0.

asad.ali · May 3, 2021, 10:37pm

@Wymsical

Regretfully, the earlier logged issue is not yet resolved. However, could you please share your sample PDF document along with the sample code snippet that you are using? We will test the scenario using the latest version of the API and share our feedback with you accordingly.

Wymsical · May 4, 2021, 3:57am

here is the code snippet:
if (string.IsNullOrEmpty(resultMimeType))

            resultMimeType = "image/png";



        using (var memoryStream = new MemoryStream(pdfBytes))

        using (var pdfDocument = new Document(memoryStream, true))

        {

           var result = new List<PdfPageData>();

            pdfDocument.Form.Type = Aspose.Pdf.Forms.FormType.Standard;

            pdfDocument.Form.Flatten();

            pdfDocument.Flatten();

            for (var pageCount = 1; pageCount <= pdfDocument.Pages.Count; pageCount++)

            {

                int boxWidth = (int)pdfDocument.Pages[pageCount].MediaBox.Width;

                int boxHeight = (int)pdfDocument.Pages[pageCount].MediaBox.Height;

                using (var imageStream = new MemoryStream())

                {

                    // Create Resolution object

                    var resolution = new Resolution(boxWidth, boxHeight);



                    // Create PNG device with specified attributes

                    ImageDevice pngDevice = null;



                    string mimeType = null;

                    switch (resultMimeType.ToUpperInvariant())

                    {

                        case "IMAGE/JPG":

                        case "IMAGE/JPEG":

                            pngDevice = new JpegDevice(boxWidth, boxHeight, resolution);

                            mimeType = "IMAGE/JPG";

                            break;



                        default:

                            pngDevice = new PngDevice(boxWidth, boxHeight, resolution);

                            mimeType = "IMAGE/PNG";

                            break;

                    }



                    // Convert a particular page and save the image to stream

                    pngDevice.Process(pdfDocument.Pages[pageCount], imageStream);



                    var bytes = imageStream.ToArray();

                    if (width.HasValue && width.Value < pngDevice.Width)

                    {

                        bytes = ImageHelper.Resize(bytes, width.Value, MimeTypeHelper.GetImageFormat(mimeType));

                    }



                    result.Add(new PdfPageData()

                    {

                        MimeType = mimeType,

                        PageData = bytes

                    });

                }

            }



            return result;

        }

Wymsical · May 4, 2021, 3:59am

and the document used Joshua_v - Reichel Insulation Employment Application.pdf (4.3 MB)

Wymsical · May 4, 2021, 4:06am

and document view AFTER going through ASPOSE. Notice the missing text on both pages - you can compare them with the original doc. Joshua_v_p1.JPG (48.1 KB)
Joshua_v_p2.JPG (62.4 KB)

asad.ali · May 4, 2021, 2:57pm

@Wymsical

We tested the scenario using Aspose.PDF for .NET 21.4 and the following code snippet. We were not able to notice any issue in output images. For your kind reference, output images are also attached:

var fs = new FileStream(dataDir + "Joshua_v - Reichel Insulation Employment Application.pdf", FileMode.Open);
Document pdfDocument = new Document(fs);
foreach (Page page in pdfDocument.Pages)
{
 using (FileStream imageStream = new FileStream(dataDir + "image" + page.Number + ".jpg", FileMode.Create))
 {
  Resolution resolution = new Resolution(300);
  JpegDevice jpegDeviceLarge = new JpegDevice(resolution, 100);
  jpegDeviceLarge.RenderingOptions.InterpolationHighQuality = true;
  jpegDeviceLarge.RenderingOptions.UseNewImagingEngine = true;
  jpegDeviceLarge.Process(page, imageStream);
 }
}

image2.jpg (1.2 MB)
image1.jpg (748.6 KB)

Would you kindly try using the latest version and let us know in case you face any other issues.

Wymsical · May 4, 2021, 3:03pm

i am unable to open the attached file. it says the file is private and is only visible to topic owner and staff members. Are you saying our current version of Aspose software (version 18.5.0.) is not working and need to upgrade?

Wymsical · May 4, 2021, 3:05pm

before we upgrade, is there away for us to test it out on the latest Aspose so we know for sure that the software version is the culprit?

asad.ali · May 4, 2021, 8:51pm

@Wymsical

You can please download the images from the links below:

Sure, you can test the latest version using a free 30-days temporary license.

Wymsical · May 6, 2021, 7:32am

After upgrading from 18.5.0 to 21.4.0, the issue still exists.

My Asp.Net Core Web Service, .Net Core Version is 2.2.

How to reproduce the issue:
After starting your application or service, run the convert pages to png images two or more times.
Start with the second time, some form field values will be lost.

Here is my test code:

 // PdfHelper.SplitPdfToPagesV1(fileData, 780);
 // PdfHelper.SplitPdfToPagesV1(fileData, 780);

Result:

v1 input bytes: 4488515
v1 memory stream bytes: 4488515
v1 page: 1, Device Process stream length: 89256 // Different with second time
v1 width: 780, page: 1, to png stream length: 89256
v1 page: 2, Device Process stream length: 111503
v1 width: 780, page: 2, to png stream length: 111503

v1 input bytes: 4488515
v1 memory stream bytes: 4488515
v1 page: 1, Device Process stream length: 78128
v1 width: 780, page: 1, to png stream length: 78128
v1 page: 2, Device Process stream length: 87056
v1 width: 780, page: 2, to png stream length: 87056

 // PdfHelper.SplitPdfToPagesV2(fileData, 780);
 // PdfHelper.SplitPdfToPagesV2(fileData, 780);

Result:

v2 input bytes: 4488515
v2 memory stream bytes: 4488515
v2 page: 1, Page ConvertPageToPNGMemoryStream stream length: 644774 // Different with second time
v2 width: 780, page: 1, to png stream length: 118849
v2 page: 2, Page ConvertPageToPNGMemoryStream stream length: 694750
v2 width: 780, page: 2, to png stream length: 165614

v2 input bytes: 4488515
v2 memory stream bytes: 4488515
v2 page: 1, Page ConvertPageToPNGMemoryStream stream length: 573272
v2 width: 780, page: 1, to png stream length: 102409
v2 page: 2, Page ConvertPageToPNGMemoryStream stream length: 538528
v2 width: 780, page: 2, to png stream length: 133305

 PdfHelper.SplitPdfToPagesV3(fileData, 780);
 PdfHelper.SplitPdfToPagesV3(fileData, 780);

Result:

v3 input bytes: 4488515
v3 memory stream bytes: 4488515
v3 page: 1, Page AsByteArray stream length: 45319878
v3 width: 780, page: 1, to png stream length: 96054 // Different with second time
v3 page: 2, Page AsByteArray stream length: 45319878
v3 width: 780, page: 2, to png stream length: 132969

v3 input bytes: 4488515
v3 memory stream bytes: 4488515
v3 page: 1, Page AsByteArray stream length: 45319878
v3 width: 780, page: 1, to png stream length: 82841
v3 page: 2, Page AsByteArray stream length: 45319878
v3 width: 780, page: 2, to png stream length: 106231

Detailed codes:

    public static void SplitPdfToPagesV3(byte[] pdfBytes, int width)
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine($"\nv3 input bytes: {pdfBytes.Length}");
        
        using (var memoryStream = new MemoryStream(pdfBytes))
        {
            Console.WriteLine($"v3 memory stream bytes: {memoryStream.Length}");
            using (var pdfDocument = new Document(memoryStream, true))
            {
                for (var pageCount = 1; pageCount <= pdfDocument.Pages.Count; pageCount++)
                {
                    int boxWidth = (int)pdfDocument.Pages[pageCount].MediaBox.Width;
                    int boxHeight = (int)pdfDocument.Pages[pageCount].MediaBox.Height;
                    var resolution = new Resolution(boxWidth, boxHeight);
                    using (var imageStream = new MemoryStream(pdfDocument.Pages[pageCount].AsByteArray(resolution)))
                    {
                        Console.WriteLine($"v3 page: {pageCount}, Page AsByteArray stream length: {imageStream.Length}");
                        using (var bmp = System.Drawing.Image.FromStream(imageStream))
                        {
                            Size thumbnailSize = GetThumbnailSize(bmp, width);
                            System.Drawing.Image thumbnailImage = bmp.GetThumbnailImage(thumbnailSize.Width, thumbnailSize.Height, null, IntPtr.Zero);
                            using (var toImageStream = new MemoryStream())
                            {
                                thumbnailImage.Save(toImageStream, System.Drawing.Imaging.ImageFormat.Png);
                                Console.WriteLine($"v3 width: {width}, page: {pageCount}, to png stream length: {toImageStream.Length}");
                            }
                        }
                    }
                }
            }
        }
        
        Console.ResetColor();
    }

    public static void SplitPdfToPagesV2(byte[] pdfBytes, int width)
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine($"\nv2 input bytes: {pdfBytes.Length}");

        using (var memoryStream = new MemoryStream(pdfBytes))
        {
            Console.WriteLine($"v2 memory stream bytes: {memoryStream.Length}");
            using (var pdfDocument = new Document(memoryStream, true))
            {
                for (var pageCount = 1; pageCount <= pdfDocument.Pages.Count; pageCount++)
                {
                    using (var imageStream = pdfDocument.ConvertPageToPNGMemoryStream(pdfDocument.Pages[pageCount]))
                    {
                        Console.WriteLine($"v2 page: {pageCount}, Page ConvertPageToPNGMemoryStream stream length: {imageStream.Length}");
                        using (var pngImage = System.Drawing.Image.FromStream(imageStream))
                        {
                            Size thumbnailSize = GetThumbnailSize(pngImage, width);
                            System.Drawing.Image thumbnailImage = pngImage.GetThumbnailImage(thumbnailSize.Width, thumbnailSize.Height, null, IntPtr.Zero);
                            using (var toImageStream = new MemoryStream())
                            {
                                thumbnailImage.Save(toImageStream, System.Drawing.Imaging.ImageFormat.Png);
                                Console.WriteLine($"v2 width: {width}, page: {pageCount}, to png stream length: {toImageStream.Length}");
                            }
                        }
                    }
                }
            }
        }

        Console.ResetColor();
    }

    public static void SplitPdfToPagesV1(byte[] pdfBytes, int width)
    {
        Console.ForegroundColor = ConsoleColor.Red;
        Console.WriteLine($"\nv1 input bytes: {pdfBytes.Length}");

        using (var memoryStream = new MemoryStream(pdfBytes))
        {
            Console.WriteLine($"v1 memory stream bytes: {memoryStream.Length}");
            using (var pdfDocument = new Document(memoryStream, true))
            {
                for (var pageCount = 1; pageCount <= pdfDocument.Pages.Count; pageCount++)
                {
                    int boxWidth = (int)pdfDocument.Pages[pageCount].MediaBox.Width;
                    int boxHeight = (int)pdfDocument.Pages[pageCount].MediaBox.Height;
                    var resolution = new Resolution(boxWidth, boxHeight);
                    using (var imageStream = new MemoryStream())
                    {
                        ImageDevice pngDevice = new PngDevice(boxWidth, boxHeight, resolution);
                        pngDevice.RenderingOptions.InterpolationHighQuality = true;
                        pngDevice.RenderingOptions.UseNewImagingEngine = true;

                        pngDevice.Process(pdfDocument.Pages[pageCount], imageStream);
                        Console.WriteLine($"v1 page: {pageCount}, Device Process stream length: {imageStream.Length}");

                        var bytes = imageStream.ToArray();
                        Console.WriteLine($"v1 width: {width}, page: {pageCount}, to png stream length: {bytes.Length}");
                    }
                }
            }
        }

        Console.ResetColor();
    }

    public static Size GetThumbnailSize(System.Drawing.Image original, int width)
    {
        // Maximum size of any dimension.
        int maxPixels = width;

        // Width and height.
        int originalWidth = original.Width;
        int originalHeight = original.Height;

        // Compute best factor to scale entire image based on larger dimension.
        double factor;
        if (originalWidth > originalHeight)
        {
            factor = (double)maxPixels / originalWidth;
        }
        else
        {
            factor = (double)maxPixels / originalHeight;
        }

        // Return thumbnail size.
        return new Size((int)(originalWidth * factor), (int)(originalHeight * factor));
    }

asad.ali · May 6, 2021, 7:07pm

@Wymsical

We were able to replicate the issue at our end and have logged it as PDFNET-49877 in our issue tracking system. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We apologize for the inconvenience.

asad.ali:

var fs = new FileStream(dataDir + "Joshua_v - Reichel Insulation Employment Application.pdf", FileMode.Open);
Document pdfDocument = new Document(fs);
foreach (Page page in pdfDocument.Pages)
{
 using (FileStream imageStream = new FileStream(dataDir + "image" + page.Number + ".jpg", FileMode.Create))
 {
  Resolution resolution = new Resolution(300);
  JpegDevice jpegDeviceLarge = new JpegDevice(resolution, 100);
  jpegDeviceLarge.RenderingOptions.InterpolationHighQuality = true;
  jpegDeviceLarge.RenderingOptions.UseNewImagingEngine = true;
  jpegDeviceLarge.Process(page, imageStream);
 }
}

For the time being, you can please try using above code snippet as this approach does not produce the problematic images.