Converting from PDF to Tiff some texts are missing

Rpires123 · January 17, 2025, 12:43pm

Some values are missing when we try to convert from PDF to Tiff.
var document = new Document(“document.pdf”);

var tiffSettings = new TiffSettings();

var device = new TiffDevice(tiffSettings);

device.Process(document, “output.tiff”);

Professionalize.Discourse · January 17, 2025, 12:43pm

When converting a PDF to TIFF and experiencing missing text, there are several factors to consider that may help resolve the issue.

Check PDF Content: Ensure that the text in the PDF is not embedded in a way that makes it difficult to extract. Sometimes, text may be part of an image or not properly encoded, which can lead to missing content during conversion.
Adjust TiffSettings: The settings used for the TIFF conversion can significantly impact the output. You might want to experiment with different compression types and color depths. For instance, using CompressionType.CCITT4 and ColorDepth.Format8bpp are common settings, but adjusting these may yield better results depending on your specific PDF content.
Resolution Settings: The resolution at which you convert the PDF can also affect the output quality. A higher resolution (e.g., 300 DPI) may help in capturing more details, including text.
Skip Blank Pages: If your PDF contains blank pages, ensure that the setting to skip blank pages is correctly configured. This can prevent unnecessary processing and might help focus on the pages that contain content.

Here is a sample code snippet that demonstrates how to set up the conversion with these considerations:

var document = new Document("document.pdf");

var tiffSettings = new TiffSettings
{
    Compression = CompressionType.CCITT4,
    Depth = ColorDepth.Format8bpp,
    SkipBlankPages = true
};

var resolution = new Resolution(300);
var device = new TiffDevice(resolution, tiffSettings);

device.Process(document, "output.tiff");

By adjusting these parameters, you may be able to improve the conversion results and reduce the instances of missing text. If the problem persists, consider checking the original PDF for any issues or trying different PDF files to see if the issue is specific to certain documents.

Rpires123 · January 17, 2025, 12:47pm

Even using these settings the converted file still missing text.

sergei.shibanov · January 17, 2025, 12:51pm

@Rpires123
Please attach the source PDF document(s) so that we can investigate the issue.

Rpires123 · January 17, 2025, 1:23pm

done, file called examplePG2.pdf

sergei.shibanov · January 17, 2025, 2:33pm

@Rpires123

var document = new Document(dataDir + "examplePG2.pdf");
var tiffSettings = new TiffSettings();
var device = new TiffDevice(tiffSettings);
device.Process(document, dataDir + "examplePG2_out.tiff");

I got a valid result using the code you provided with the library version 25.01 in Windows, .Net 6 project.
examplePG2_out.zip (77.1 KB)

What do you use?

Rpires123 · January 17, 2025, 2:46pm

The values are missing compared to the original PDF in this result you sent.
example3.png (127.6 KB)

sergei.shibanov · January 18, 2025, 7:40am

@Rpires123
Yes, that’s right.
Sorry for the omission.
This is a bug in the library and I will create a task for the development team about it.

sergei.shibanov · January 18, 2025, 1:47pm

@Rpires123
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-59064

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

Rpires123 · January 23, 2025, 2:22pm

Do we have any date term or prevision for this one?

sergei.shibanov · January 23, 2025, 2:25pm

@Rpires123

Created tasks are solved in the order they are received, taking into account priorities.
The highest priority is for tasks with paid support, followed by tasks from users who have purchased a license.
The time it takes to solve problems can also vary. Therefore, unfortunately, it is not even possible to give ETA.