Text is missing while Converting PDF to Image using C#

timathon · January 17, 2022, 5:31pm

Hi there, we’re currently evaluating Aspose.PDF as “PDF to Image” solution and we’re seeing issues with missing text on certain pages. I initially suspected a missing font as the document uses Helvetica, but it didn’t appear to be embedded. Even with the missing missing font installed, I still see the missing text. It does appear that all missing text happens to be Helvetica, it’s just not clear why there’s an issue. I’ve tried version 19.10 and 21.11 with the same result.

I’ve attached a document that contains one of the pages that the issue occurs on. The document was saved with an evaluation version of an app so hopefully it’s not introducing any red herrings, but the problem persists, so I’m hoping it’s enough to diagnose the issue.

Test File: Test-Prod-File Trimmed.pdf (26.5 KB)

Code Snippet:

static string exportDirectory = @"C:\PDF_OUT\";
static string fontExtractionDirectory = @"C:\FONT_OUT\";

void Main()
{
	OpenFileDialog dialog = new OpenFileDialog();

	var result = dialog.ShowDialog();

	if (result.HasValue && result.Value)
	{
		Console.WriteLine($"Processing file {dialog.FileName}...");
		
		FileInfo fileInfo = new FileInfo(dialog.FileName);
		string cleanFileName = fileInfo.Name.Replace(fileInfo.Extension, string.Empty);
		
		Console.WriteLine($"Clean filename {cleanFileName}");

		// Create Resolution object            
		Resolution resolution = new Resolution(300);
		JpegDevice jpegDevice = new JpegDevice(resolution);

		Document document = new Document(dialog.FileName);
		document.FontSubstitution += (sender, args) =>
		{
			Console.WriteLine($"Missing font: {args.FontName}");
		};

		ConvertPDFtoImage(jpegDevice, "jpeg", document, cleanFileName);
	}
}

public static void ConvertPDFtoImage(ImageDevice imageDevice, string ext, Document pdfDocument, string fileName)
{
	for (int pageCount = 1; pageCount <= pdfDocument.Pages.Count; pageCount++)
	{
		Console.WriteLine($"Exporting {fileName} page {pageCount}");
		
		using (FileStream imageStream = new FileStream($"{exportDirectory}{fileName}_{pageCount}.{ext}", FileMode.Create))
		{
			// Convert a particular page and save the image to stream
			imageDevice.Process(pdfDocument.Pages[pageCount], imageStream);

			// Close stream
			imageStream.Close();
		}
	}
}

asad.ali · January 18, 2022, 12:21am

@timathon

We were able to replicate the issue in our environment while using 22.1 version of the API. Therefore, we have logged it as PDFNET-51202 in our issue tracking system. We will further look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

timathon · January 20, 2022, 7:02pm

Is it possible to get an estimate on when the ticket might addressed. I just need something I can take back to my manager. Much appreciated!

asad.ali · January 21, 2022, 6:25pm

@timathon

The ticket has recently been logged in our issue tracking system and we are afraid that we cannot share some reliable ETA at the moment. The ticket is pending for initial review and will be investigated on a first come first serve basis as per the policy of free support mode. However, we will surely inform you as soon as we have some definite updates regarding its resolution. Please be patient and spare us some time.

We apologize for your inconvenience.