Converting from PDF to Tiff some texts are missing

Some values are missing when we try to convert from PDF to Tiff.
var document = new Document(“document.pdf”);

var tiffSettings = new TiffSettings();

var device = new TiffDevice(tiffSettings);

device.Process(document, “output.tiff”);

examplePG2.pdf (52.0 KB)

@RpiresWexinc

When converting a PDF to TIFF and experiencing missing text, there are several factors to consider that may help resolve the issue.

  1. Check PDF Content: Ensure that the text in the PDF is not embedded in a way that makes it difficult to extract. Sometimes, text may be part of an image or not properly encoded, which can lead to missing content during conversion.

  2. Adjust TiffSettings: The settings used for the TIFF conversion can significantly impact the output. You might want to experiment with different compression types and color depths. For instance, using CompressionType.CCITT4 and ColorDepth.Format8bpp are common settings, but adjusting these may yield better results depending on your specific PDF content.

  3. Resolution Settings: The resolution at which you convert the PDF can also affect the output quality. A higher resolution (e.g., 300 DPI) may help in capturing more details, including text.

  4. Skip Blank Pages: If your PDF contains blank pages, ensure that the setting to skip blank pages is correctly configured. This can prevent unnecessary processing and might help focus on the pages that contain content.

Here is a sample code snippet that demonstrates how to set up the conversion with these considerations:

var document = new Document("document.pdf");

var tiffSettings = new TiffSettings
{
    Compression = CompressionType.CCITT4,
    Depth = ColorDepth.Format8bpp,
    SkipBlankPages = true
};

var resolution = new Resolution(300);
var device = new TiffDevice(resolution, tiffSettings);

device.Process(document, "output.tiff");

By adjusting these parameters, you may be able to improve the conversion results and reduce the instances of missing text. If the problem persists, consider checking the original PDF for any issues or trying different PDF files to see if the issue is specific to certain documents.

Even using these settings the converted file still missing text.

@RpiresWexinc
Please attach the source PDF document(s) so that we can investigate the issue.

done, file called examplePG2.pdf

@RpiresWexinc

var document = new Document(dataDir + "examplePG2.pdf");
var tiffSettings = new TiffSettings();
var device = new TiffDevice(tiffSettings);
device.Process(document, dataDir + "examplePG2_out.tiff");

I got a valid result using the code you provided with the library version 25.01 in Windows, .Net 6 project.
examplePG2_out.zip (77.1 KB)

What do you use?

The values are missing compared to the original PDF in this result you sent.
example3.png (127.6 KB)

@RpiresWexinc
Yes, that’s right.
Sorry for the omission.
This is a bug in the library and I will create a task for the development team about it.

@RpiresWexinc
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-59064

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Do we have any date term or prevision for this one?

@RpiresWexinc

Created tasks are solved in the order they are received, taking into account priorities.
The highest priority is for tasks with paid support, followed by tasks from users who have purchased a license.
The time it takes to solve problems can also vary. Therefore, unfortunately, it is not even possible to give ETA.