We use Aspose.Pdf 20.12.0.0 for .NET to convert files of different formats to images.
There was a problem with one pdf file that had some symbols that where missing in resulting image file.
The example file is “edited.pdf”. It has 3 diameter symbols in it.
First, file is converted to pdf. In input stream we can receive file of other formats, so all of them are converted to pdf first. This code is executed, when we receive pdf:
using (Aspose.Pdf.Document pdf = ParseFileToPdfTest(stream))
{
pdf.Save("edited_ConvertedToPdf.pdf.pdf");
ParsePdfToImageFilesTest(pdf);
}
private static Aspose.Pdf.Document ParseFileToPdfTest(Stream stream)
{
Aspose.Pdf.Document pdf = null;
try
{
pdf = new Aspose.Pdf.Document(stream);
}
catch (Exception ex)
{
throw new Exception("Error reading file", ex);
}
MemoryStream output = new MemoryStream();
pdf.Save(output);
pdf.Dispose();
output.Position = 0;
return new Aspose.Pdf.Document(output);
}
1111017173531 (1).pdf (281,2 КБ)
edited.pdf (55,5 КБ)
edited_ConvertedToImage.jpeg (58,4 КБ)
edited_ConvertedToImage_Subset.jpeg (58,7 КБ)
edited_ConvertedToPdf.pdf (55,5 КБ)
I saved the result pdf to disk, file name is “edited_ConvertedToPdf.pdf”. It still has the diameter symbols in it.
Then this pdf file is converted to image:
private static void ParsePdfToImageFilesTest(Aspose.Pdf.Document pdfDoc)
{
List<byte[]> imagePages = new List<byte[]>();
JpegDevice device = new JpegDevice(new Resolution(300));
int previewPagesCount = pdfDoc.Pages.Count;
for (int pageNumber = 0; pageNumber < previewPagesCount; pageNumber++)
{
using (MemoryStream imageStream = new MemoryStream())
{
device.Process(pdfDoc.Pages[pageNumber + 1], imageStream);
File.WriteAllBytes($"imageFileName{pageNumber + 1}.jpeg", imageStream.ToArray());
}
}
}
The result image file is “edited_ConvertedToImage.jpeg”. It is missing the diameter symbols.
We’ve found a workaround to make the existing code work properly with this pdf file.
Method ParseFileToPdfTest was modified with this line before pdf.Save:
pdf.FontUtilities.SubsetFonts(FontSubsetStrategy.SubsetEmbeddedFontsOnly);
Now the resulting image file has the diameter symbols (File “edited_ConvertedToImage_Subset.jpeg”)
The problem is that this fix breaks the convertation for some pdf files in Aspose.Pdf.Drawing 24.12 and higher, which we also use (Aspose.Pdf.Drawing 25.9 and more recent, for example).
The example file is “1111017173531 (1).pdf”
The code for pdf convertation is practically the same (Function ParseFileToPdfTest). The result pdf is missing a lot of text. The converted file is “1111017173531 (1)_ConvertedToPdf_Subset.pdf”. It has evaluation mode marks, but in production with license the result is the same - the text is missing. The following convertation to image is bit different in this version, but it doesn’t matter, because pdf is already missing text.
If I remove the line that fixes the diameter symbols problem in file “edited.pdf”:
pdf.FontUtilities.SubsetFonts(FontSubsetStrategy.SubsetEmbeddedFontsOnly);
Than the second file converts just fine with all text present. Example file is “1111017173531 (1)_ConvertedToPdf”. But without that line the file with diameter symbols still converts to image without them even in recent versions of Aspose.Pdf.Drawing.
So we are in situation where we have a workaround for a bug that causes another bug. Could you please look into it?
We’d like to have a fix that works in both cases.
Thank you for your help.
1111017173531 (1)_ConvertedToPdf.pdf (337,3 КБ)
1111017173531 (1)_ConvertedToPdf_Subset.pdf (508,3 КБ)
edited_ConvertedToPdf_Subset.pdf (79,0 КБ)