Hi,
We are using the Aspose.Pdf.Drawing library for .NET.
The Azure Function app is running in Azure Function Apps (Linux).
We are working with MemoryStreams and not with temp folders or files.
We would like to add a new function to extract (vector?) charts from a PDF to an array of images.
First we tried extracting images from the PDF using page.Resources.Images, but this does not return the charts as those are not images.
I already had a look at the docs and tried some things but I’m not sure what is the best approach and if it is even possible to extract charts from a PDF as image?
I already found Page.TrySaveVectorGraphics but this seem to only work for a file path?
The goal is to have those charts extracted as images.
Extracting charts from a PDF is challenging because PDFs do not store “charts” as semantic objects—instead, charts are typically composed of vector graphics such as paths, lines, text, and shapes. Aspose.PDF for .NET does not provide a direct method to identify or extract “charts” specifically, but it does support extracting vector content that may include such elements .
You can use the Page.TrySaveVectorGraphics method to extract vector graphics (including lines, shapes, and text that may form a chart) from a PDF page into SVG format . However, this method requires a file path and does not support direct MemoryStream output in current versions. Since your Azure Function runs on Linux and avoids temp files, you can work around this limitation by using the /tmp directory, which is writable in Azure Functions:
- Write the SVG output to a temporary file in
/tmp. - Read the file into a
MemoryStream. - Delete the file immediately after reading to avoid accumulation.
Example:
using (var pdfDocument = new Document("input.pdf"))
{
var page = pdfDocument.Pages;
var tempPath = Path.Combine("/tmp", "output.svg");
bool success = page.TrySaveVectorGraphics(tempPath);
if (success)
{
using (var fileStream = new FileStream(tempPath, FileMode.Open, FileAccess.Read))
using (var memoryStream = new MemoryStream())
{
fileStream.CopyTo(memoryStream);
// Now memoryStream contains the SVG data
memoryStream.Position = 0;
// Process or return the stream
}
File.Delete(tempPath); // Clean up
}
}
Using Aspose.Pdf.Drawing is beneficial in your environment because it relies on Aspose.Drawing instead of System.Drawing.Common, avoiding dependencies like libgdiplus that are problematic in Linux containers and Azure Functions .
Limitations:
- There is no built-in way to extract only “charts”—you extract all vector content per page, which may require post-processing to identify chart-like structures.
TrySaveVectorGraphicsdoes not support direct stream output, so a temporary file is currently required .- Ensure your Azure Function has write permissions to
/tmp.
In summary:
It is possible to extract vector-based content (potentially including charts) as SVG using Page.TrySaveVectorGraphics in a Linux Azure Function environment. The use of Aspose.Pdf.Drawing ensures compatibility, but a temporary file in /tmp is needed due to lack of direct MemoryStream support .
If direct stream support becomes critical, consider monitoring Aspose.PDF updates or reaching out to Aspose support for feature requests.
@alexey.noskov , @amjad.sahi could you share your vision related to this topic?
After further investigation, our use case is to extract charts as images for following types of PDF’s:
- PDF’s with image charts
- PDF’s with vector charts
- PDF’s with full-images that contain both text and charts
Given this complex input, our idea is to rely on AI (Azure Document Intelligence) to easily extract charts as images? Do you see an alternative?
Would you kindly share sample PDF documents for our reference? We will test the scenario in our environment and address it accordingly.
@asad.ali , please find here an example of a public available PDF that contains such vector charts: 2023,48312
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): PDFNET-61534
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.