Regarding AsposePDF Tagging

Hi Team, We are using latest version of Aspose.PDF (25.10).

Requirement : We have to apply tags on the PDF ( Searchable PDF).

I am using below code to add the tags but getting an exception like “Specified Cast is not Valid.

Please find the sample pdf file and tagging.xml files.
TestScanned_Searchable.pdf (1.9 MB)

I am unable to upload the tagging.xml file. Please find the below screenshot for xml file.

image.png (93.7 KB)

public void CreateTaggedPdfFromUnTaggedPdf()
        {
            string input = @"E:\TestFiles\TaggedPDFFile\TestScanned_Searchable.pdf";
            string output = @"E:\TestFiles\TaggedPDFFile\TestScanned_Searchable_Output.pdf";
            string normalizedTemp = @"E:\TestFiles\TaggedPDFFile\Temp_Normalized.pdf";
            string logFile = @"E:\TestFiles\TaggedPDFFile\TaggingLog.xml"; // or .txt

            // Load the existing PDF
            Document pdfDocument = new Document(input);

            foreach (var page in pdfDocument.Pages)
            {
                page.Flatten();
            }

            // (Optional but safer) Normalize the PDF first to reduce complexity
            pdfDocument.Convert(logFile, PdfFormat.PDF_A_1B, ConvertErrorAction.Delete);
            pdfDocument.Save(normalizedTemp);

            // Reload the normalized document
            pdfDocument = new Document(normalizedTemp);

            // Prepare conversion options for tagging
            var options = new PdfFormatConversionOptions(
                PdfFormat.PDF_A_1A,   // or if you prefer PDF_UA_1
                ConvertErrorAction.Delete
            );

            var autoTagSettings = new AutoTaggingSettings
            {
                EnableAutoTagging = true,
                HeadingRecognitionStrategy = HeadingRecognitionStrategy.Auto
            };
            options.AutoTaggingSettings = autoTagSettings;

            //// Optional: hook progress events for debugging
            //pdfDocument.ConvertProgress += (sender, e) =>
            //{
            //    Console.WriteLine($"Progress: page {e.PageNumber}, message: {e.Message}");
            //};

            try
            {
                // This should create the tagged structure during conversion
                //pdfDocument.Convert(options);


                try
                {
                    pdfDocument.Convert(options);
                }
                catch (Aspose.Pdf.ConvertException ex)
                {
                    if (ex.Message.Contains("q/Q"))
                    {
                        Console.WriteLine($"Page issue detected. Flattening...");
                        foreach (var page in pdfDocument.Pages)
                            page.Flatten();

                        pdfDocument.Convert(options);
                    }
                }


                pdfDocument.Save(output);
                Console.WriteLine("✅ PDF tagged successfully!");
            }
            catch (InvalidCastException ex)
            {
                Console.WriteLine("⚠️ Auto-tagging failed: " + ex.Message);
                Console.WriteLine("Attempting fallback logical structure repair...");
            }
        }

@SwroopXerox

It seems you’re encountering an “Invalid Cast” exception while trying to apply tags to a PDF using Aspose.PDF. This issue can arise from various factors, including the structure of the PDF or the tagging XML file.

Here are a few steps you can take to troubleshoot and resolve the issue:

  1. Check the PDF Structure: Ensure that the PDF you are trying to tag is not overly complex or corrupted. You can try normalizing the PDF first, as you are already doing with pdfDocument.Convert(logFile, PdfFormat.PDF_A_1B, ConvertErrorAction.Delete);.

  2. Validate the Tagging XML: Since you mentioned that you couldn’t upload the tagging XML file, ensure that it adheres to the expected structure for tagging. Any discrepancies in the XML can lead to casting issues.

  3. Simplify the Tagging Process: If the issue persists, consider simplifying the tagging process. You can try tagging a simpler PDF or a different XML file to see if the problem is specific to the current files.

  4. Error Handling: You already have a try-catch block for handling ConvertException. You might want to expand your error handling to log more details about the exception, which can help in diagnosing the issue.

  5. Update Aspose.PDF: Ensure you are using the latest version of Aspose.PDF, as updates often include bug fixes and improvements.

Here’s a modified version of your code snippet with additional logging for better error handling:

try
{
    pdfDocument.Convert(options);
}
catch (Aspose.Pdf.ConvertException ex)
{
    Console.WriteLine($"Conversion error: {ex.Message}");
    // Additional logging can be added here
}
catch (InvalidCastException ex)
{
    Console.WriteLine("⚠️ Auto-tagging failed: " + ex.Message);
    // Log more details if needed
}

If the problem continues, consider reaching out to the Aspose support forums with detailed information about your PDF and XML files for more targeted assistance.

Feel free to ask if you have further questions!

Hi Team, While debugging the code, the cursor immediately jumps to an InvalidCastException with the message: ‘Specified cast is not valid.’

@SwroopXerox

Would you please share a sample console application in .zip format that we can use to replicate the issue? We have tested using the latest version in our environment and could not replicate the issue.

1 Like

Thank you for your reply… I am sharing sample application along with sample files also. Please verify once.
TestApp.zip (2.4 MB)

@SwroopXerox

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFNET-60915

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

1 Like