Convert PDF to TIFF

Mike.Oakley · April 21, 2014, 8:24pm

We are trying to convert PDF to TIFF. If the PDF is just text and can be converted using CCITT4 compression the resulting TIFF is fine and easily read. However, if the PDF includes color images and text then the text in the resulting TIFF file cannot be easily read and the size of the file is large. I am using the following which is from another forum post.

//create PdfConverter object and bind input PDF file
Aspose.Pdf.Facades.PdfConverter pdfConverter = new Aspose.Pdf.Facades.PdfConverter();

// create Resolution object with 300 as an argument
Aspose.Pdf.Devices.Resolution resolution = new Aspose.Pdf.Devices.Resolution(SetSaveOptionResolution());

// specify the resolution value for PdfConverter object - default is 150
pdfConverter.Resolution = resolution;

// bind the source PDF file
pdfConverter.BindPdf(cnvFileData.CnvFileName);

// start the conversion process
pdfConverter.DoConvert();

//create TiffSettings object, set Compression and ColorDepth
Aspose.Pdf.Devices.TiffSettings tiffSettings = new Aspose.Pdf.Devices.TiffSettings();

if (BlackAndWhite ||
CheckOverrideCompression(fileExt) ||
Compression.Equals("Group4FaxEncoding", StringComparison.CurrentCultureIgnoreCase))
tiffSettings.Compression = Aspose.Pdf.Devices.CompressionType.CCITT4;
else
tiffSettings.Compression = Aspose.Pdf.Devices.CompressionType.LZW;

retFileName = System.IO.Path.ChangeExtension(cnvFileData.CnvFileName, format.ToLower());

pdfConverter.SaveAsTIFF(retFileName, tiffSettings);

pdfConverter.Close();

//create PdfConverter object and bind input PDF file
Aspose.Pdf.Facades.PdfConverter pdfConverter = new Aspose.Pdf.Facades.PdfConverter();

// create Resolution object with 300 as an argument
Aspose.Pdf.Devices.Resolution resolution = new Aspose.Pdf.Devices.Resolution(SetSaveOptionResolution());

// specify the resolution value for PdfConverter object - default is 150
pdfConverter.Resolution = resolution;

// bind the source PDF file
pdfConverter.BindPdf(cnvFileData.CnvFileName);

// start the conversion process
pdfConverter.DoConvert();

//create TiffSettings object, set Compression and ColorDepth
Aspose.Pdf.Devices.TiffSettings tiffSettings = new Aspose.Pdf.Devices.TiffSettings();

if (BlackAndWhite ||
CheckOverrideCompression(fileExt) ||
Compression.Equals("Group4FaxEncoding", StringComparison.CurrentCultureIgnoreCase))
tiffSettings.Compression = Aspose.Pdf.Devices.CompressionType.CCITT4;
else
tiffSettings.Compression = Aspose.Pdf.Devices.CompressionType.LZW;

retFileName = System.IO.Path.ChangeExtension(cnvFileData.CnvFileName, format.ToLower());

pdfConverter.SaveAsTIFF(retFileName, tiffSettings);

pdfConverter.Close();

This is being done in a console app or service. Are there settings that will give better results for color both file size and legibility of text? Can we query attributes of the PDF in order to make better decision on the compression and resolution? For instance if the PDF is all text but saved as color can we determine that it is all text and can be saved as CCITT4.

File to be converted is attached.

codewarior · April 22, 2014, 11:18pm

Mike.Oakley:
We are trying to convert PDF to TIFF. If the PDF is just text and can be converted using CCITT4 compression the resulting TIFF is fine and easily read. However, if the PDF includes color images and text then the text in the resulting TIFF file cannot be easily read and the size of the file is large. I am using the following which is from another forum post.

Hi Mike,

Thanks for contacting support.

I have tested the scenario using Aspose.Pdf for .NET 9.1.0 where I have used the following code snippet (based on your original code) and I am unable to notice any issue. The text is properly readable in resultant TIFF image. For your reference, I have also attached the resultant TIFF generated over my end.

[C#]

//create
PdfConverter object and bind input PDF file<o:p></o:p>

Aspose.Pdf.Facades.PdfConverter pdfConverter = new Aspose.Pdf.Facades.PdfConverter();

// create Resolution object with 300 as an argument

Aspose.Pdf.Devices.Resolution resolution = new Aspose.Pdf.Devices.Resolution(300);

// specify the resolution value for PdfConverter object - default is 150

pdfConverter.Resolution = resolution;

// bind the source PDF file

pdfConverter.BindPdf("c:/pdftest/AASF.pdf");

// start the conversion process

pdfConverter.DoConvert();

//create TiffSettings object, set Compression and ColorDepth

Aspose.Pdf.Devices.TiffSettings tiffSettings = new Aspose.Pdf.Devices.TiffSettings();

tiffSettings.Compression = Aspose.Pdf.Devices.CompressionType.CCITT4;

pdfConverter.SaveAsTIFF("c:/pdftest/AASF.tiff", tiffSettings);

pdfConverter.Close();

Mike.Oakley:
Can we query attributes of the PDF in order to make better decision on the compression and resolution? For instance if the PDF is all text but saved as color can we determine that it is all text and can be saved as CCITT4.

You may consider visiting the following link for further details on Find whether PDF file contains images or text only

In the event of any further query, please feel free to contact. We are sorry for this inconvenience.

Mike.Oakley · April 23, 2014, 12:33pm

This is fine, but I must maintain color if the source PDF contains color unless it is discovered that the source PDF is all text. One thing I have done is to set the color depth to 8bpp significantly reducing the file size with no noticeable effect on quality at 300 resolution and LZW compression.

codewarior · April 24, 2014, 6:42am

Hi Mike,

In order to get reduced size TIFF image, you may either use compression value from CompressionType enumeration or you can choose different color depth. Please note that when using 1bpp color depth, the resultant TIFF will be even smaller in size but you might loose colors for images inside the TIFF image.

For further details, you may visit Convert PDF pages to TIFF Image

Mike.Oakley · April 28, 2014, 4:39pm

In the link that you provided Find whether PDF file contains images or text only can I also determine color properties of the images and text found?

codewarior · April 29, 2014, 5:41am

Hi Mike,

Once you have determined that PDF file contains text, you can get all TextFragments and get formatting information of TextFragments which indeed contains color information. For further details, please visit Search and get Text from all pages using Regular Expression

In case the PDF file contains images, you can extract the images and get color properties of images. Following links can be useful for this purpose.