Hi there, we’re currently evaluating Aspose.PDF as “PDF to Image” solution and we’re seeing issues with missing text on certain pages. I initially suspected a missing font as the document uses Helvetica, but it didn’t appear to be embedded. Even with the missing missing font installed, I still see the missing text. It does appear that all missing text happens to be Helvetica, it’s just not clear why there’s an issue. I’ve tried version 19.10 and 21.11 with the same result.
I’ve attached a document that contains one of the pages that the issue occurs on. The document was saved with an evaluation version of an app so hopefully it’s not introducing any red herrings, but the problem persists, so I’m hoping it’s enough to diagnose the issue.
Test File: Test-Prod-File Trimmed.pdf (26.5 KB)
Code Snippet:
static string exportDirectory = @"C:\PDF_OUT\";
static string fontExtractionDirectory = @"C:\FONT_OUT\";
void Main()
{
OpenFileDialog dialog = new OpenFileDialog();
var result = dialog.ShowDialog();
if (result.HasValue && result.Value)
{
Console.WriteLine($"Processing file {dialog.FileName}...");
FileInfo fileInfo = new FileInfo(dialog.FileName);
string cleanFileName = fileInfo.Name.Replace(fileInfo.Extension, string.Empty);
Console.WriteLine($"Clean filename {cleanFileName}");
// Create Resolution object
Resolution resolution = new Resolution(300);
JpegDevice jpegDevice = new JpegDevice(resolution);
Document document = new Document(dialog.FileName);
document.FontSubstitution += (sender, args) =>
{
Console.WriteLine($"Missing font: {args.FontName}");
};
ConvertPDFtoImage(jpegDevice, "jpeg", document, cleanFileName);
}
}
public static void ConvertPDFtoImage(ImageDevice imageDevice, string ext, Document pdfDocument, string fileName)
{
for (int pageCount = 1; pageCount <= pdfDocument.Pages.Count; pageCount++)
{
Console.WriteLine($"Exporting {fileName} page {pageCount}");
using (FileStream imageStream = new FileStream($"{exportDirectory}{fileName}_{pageCount}.{ext}", FileMode.Create))
{
// Convert a particular page and save the image to stream
imageDevice.Process(pdfDocument.Pages[pageCount], imageStream);
// Close stream
imageStream.Close();
}
}
}