Invalid xref table

Hello,

I just downloaded the 17.9.0 version of aspose.pdf with a temporary licence and I’m trying to create PDF file from DOC / DOCX / RTF and TXT file and display it using PDF.js.

The generated PDF are ok when viewed inside a standard PDF viewer but when displayed inside a web page with PDF.js, I get the following error message: Invalid XRef stream header

I just found the release note of the 17.9.0 which seems to confirm the problem with the XRef table: PDFNET-38505 Invalid xref table in resultant file

Is there any work around that would allow me to get over those invalid XRef table? Is there any place where I can download older version of the aspose.pdf.dll or maybe a beta version with a fix for this issue?

Regards,

Simon

@simonbesner

Thanks for contacting support.

The referred issue PDFNET-38505 was encountered during DOCX-PDF-PDFA conversion where the resultant file was correct but XREF contained several subsections which were acceptable according to PDF specifications. The issue was resolved in Aspose.Pdf for .NET 17.9, which generates PDF with normal XREF.

Please note that sometimes issues may be related to specific document and to investigate the issue, we need that specific document. We will really appreciate, if you can please share your input/output documents along with the sample code snippet and an image of error which you are facing. We will test the scenario in our environment and address it accordingly.

Hello,

We’ve been experiencing the same problem with the invalid XRef table when converting PDF documents to PDF/A-1a and PDF/A-2a. This is inconvenient, since we’ve specially upgraded our Aspose.pdf license to v17.9 for this purpose.

The problem occurrs with every single PDF document that we try to convert, without exception.

We’ve also tried with Aspose.pdf v17.10 and v17.11, this resulted in the same problem.

Furthermore, we’ve also encountered another problem when converting to PDF/A: the resulting document has duplicated one of the objects an has a double reference to this object in the Xref table!!

You can find the original and converted documents attached.

Please provide a solution asap.

Kind regards,
Eline

Output double object problem.pdf (147.2 KB)
input double object problem.pdf (147.7 KB)
Output invalid xref table.pdf (1.2 MB)
Input invalid Xref table.pdf (8.1 KB)

Code snippet:
public void testconversion()
{

        //upload file
        string path = @"C:\ProductDevelopment\wordtopdf.pdf";
        var file = File.ReadAllBytes(path);
        Stream stream = new MemoryStream(file);
        var doc = new PdfDocument(stream);
        doc.IsXrefGapsAllowed = false;

        using (var outputLogStream = new MemoryStream())
        {
            doc.OptimizeResources(new Document.OptimizationOptions
            {
                UnembedFonts = false,
            });
            doc.EmbedStandardFonts = true;

            PdfFormatConversionOptions options =
                new PdfFormatConversionOptions(outputLogStream, PdfFormat.PDF_A_2A, ConvertErrorAction.Delete);
            var conversionSuccess = doc.Convert(options);
        }

        using (var outputMemoryStream = new MemoryStream())
        {
            doc.Save(outputMemoryStream);
            byte[] output = outputMemoryStream.ToArray();
            File.WriteAllBytes(@"C:\ProductDevelopment\Repairedtestdocument3.pdf", output);
        }
    }

@eline.vr

Thanks for contacting support.

We have tested the scenario in our environment by converting your both documents into PDF/A-1a format and observed that Output invalid Xref table.pdf (494.7 KB) was fine and passed the compliance test. However the other output document did not pass the compliance test, for which we have logged an issue as PDFNET-43747 in our issue tracking system. We will further look into the details of the issue and keep you informed with the status.

Furthermore, would you please share some more details regarding XREF error, by providing us a screenshot. We will again test the scenario in our environment and address it accordingly.

We are sorry for the inconvenience.

Dear Ali,

Thank you for your quick reply.

The problem with the “Output Invalid Xref table” is that the first Xref table contains subsections. If you digitally sign this document, the signature will be shown by Adobe as not being valid.

Please see the attached screenshots. I’ve also included the signed document.

Kind regards,
Eline

Output invalid xref table.PNG (45.0 KB)
Invalid Signature.PNG (38.8 KB)
Output invalid xref table - signed.pdf (1.7 MB)

@eline.vr

Thanks for sharing further information.

We will really appreciate if you can also share the signature file and code snippet of signing the PDF which you are using at your side, so that we can log an investigation with all the details of the issue.

@eline.vr

Thanks for your patience.

We are pleased to inform you that earlier reported issue PDFNET-43747 has been resolved in Aspose.PDF for .NET 18.1. Please download latest version of the API and in case of any further assistance, please feel free to let us know.

Dear Ali,

Today we tried upgrading our aspose version to the latest version (18.2.0). When converting to PdfA (1 or 2) the resulting xref table still contains subsections. For the following section see attached zip invalid_xref.zip (2.5 MB)

conversion.cs contains the used conversion code
Input invalid Xref table.pdf is the document used as input.
Asposed.pdf is the resulting pdf after conversion and asposed_xref contains the document’s xref table.

bug 1 (subsections in 1st xref):
As you can see the xref table still contains subsections, this should not be the case. As stated in previous messages this causes digital signatures to be invalid. As stated in the Adobe PDF Reference: ‘For a file that has never been updated, the cross-reference section contains only one subsection, whose object numbering begins at 0.’ Which is definitely the case here since it’s a new file.

bug 2 (double objects):
While testing I also bumped into a different bug. When the boolean IsXrefGapsAllowed is set to false, the xref table makes no sense at all. (I wasn’t able to find any useful documentation about this boolean.) In attached zip Asposed2.pdf uses the same conversion as stated above with the only difference that IsXrefGapsAllowed is set to false. When looking at provided asposed2_xref.txt you can see the table you can see it starts counting from 0 and then 15 lines (thus counting to line 14). Then it starts at a ‘new’ xref table using ‘0000000000 65535 f’ (the first line in any xref table) inside the subsection. There shouldn’t even be subsections.

Bug 2 described above is not an issue for us but I felt like pointing it out to you.

Bug 1 still is an issue for us and contrary to what was stated in the release notes, this issue still persists.

Kind Regards

Kamiel
Product Expert Connective NV

@eline.vr

Thanks for contacting support.

We will really appreciate if you can please share the screenshot(s) showing the highlighted XREF Table errors, along with the source PDF from which Asposed2.pdf was generated. We will definitely generate ticket in our issue tracking system accordingly and update you.

The source document is the same for both conversions.
xref_screenshots.zip (49.5 KB)
I’d like to stress again that only ‘bug 1’ as stated above is an issue for us.

@eline.vr

Thank you for sharing requested data.

I would like to share with you that we have logged following issues in our issue management system for further investigation and resolution.

PDFNET-44253: Presence of subsections
PDFNET-44254: Problem when IsXrefGapsAllowed property is set to false

We appreciate your reporting second bug as it will contribute to improve working of our API. The issue IDs have been linked with this thread so that you will receive notifications as soon as these issues are resolved.

We are sorry for the inconvenience.

Any news regarding PDFNET-44253: Presence of subsections?

@eline.vr

Thanks for your inquiry.

I am afraid that earlier logged issue is not yet resolved due to large number of pending issues in the queue which were logged prior to this issue. We will definitely let you know once we make some significant progress towards resolution of the issue. Please be patient and spare us little time.

We are sorry for the inconvenience.