Identify location of SVG graphics in PDF and extract

Hi,
I’m trying to identify the location of SVG graphics in a PDF so that I can process the graphic and return any text from the graphic to the text stream at the location of the graphic.

I see on the forum that the recommendation is to use a GraphicsAbsorber for SVG, and using it I can loop through the recognized elements, but is there way to (1) determine whether the graphic is SVG or not , and (2) save the graphic to a file or stream?

I see that “TrySaveVectorGraphics” will extract the images, but doesn’t seem to have any way to expose exactly where the SVG occurred in the file.

Do you have any recommendations? Essentially, I’d like to open the PDF, find each instance of SVG, save to a file or stream, convert to PNG, run OCR on the PNG, return extracted text to the point where the original SVG image was found.

Thanks,
John

@jdunning

It looks like custom requirements. Can you please share your sample PDF document for our reference as we need to log an investigation for further analysis. We will share the ticket ID with you.

Thanks; here is one example: https://api-kroll.kroll.com/-/media/kroll-images/pdfs/full-version-2022-annual-impact-report.pdf

@jdunning

Would you please confirm if this thread was also initialized from your organization? We are asking because same PDF document was shared in there as well with similar requirements.

@asad.ali , yes, we are in the same org.

Thanks,
John

@jdunning

Thanks for confirming. We had already shared some workaround in the other thread to deal with such kind of PDFs but results were not so good. Related issues were logged in our issue management system for rectification and ticket were attached with that forum thread.

For your requirements, an investigation ticket as PDFNET-57165 has been logged in our system. We will look into its details and let you know as soon as it is resolved. Please be patient and spare us some time.