Aspose.Words is throwing exception for the document generated by Aspose.PDF

@asad.ali

The ouput docx is still bad.
The output file can not be loaded with Aspose.words for .net 24.3.

Tested OSs were Win7 and Win10
.NET target platform is .net 6.0
Using SDK: Aspose.PDF for .Net 24.3

Testing code

void ConvertPDFtoWordDocAdvanced(string filename)
{
    var pdfFile = filename;
    var docFile = filename + ".docx";
    Document pdfDocument = new Document(pdfFile);
    DocSaveOptions saveOptions = new DocSaveOptions
    {
        Format = DocSaveOptions.DocFormat.DocX,
        // Set the recognition mode as Flow
        Mode = DocSaveOptions.RecognitionMode.Flow,
    };
    pdfDocument.Save(docFile, saveOptions);
}

It seems RecognitionMode.Textbox works well.

@kngstr

Can you please share the sample PDF with us and the error that you are facing in Aspose.Words for .NET while reading the output file?

@asad.ali
The file was uploaded before in this topic.

Aspose.Words for .net 24.4 got the same exception as before:
System.InvalidOperationException:“More than 63 cells per row is not supported for this file format.”

System.InvalidOperationException
  HResult=0x80131509
  Message=More than 63 cells per row is not supported for this file format.
  Source=Aspose.Words
  StackTrace:
   在 oC4.d(Row d)
   在 Aspose.Words.CompositeNode.AcceptCore(DocumentVisitor visitor)
   在 Aspose.Words.CompositeNode.AcceptChildren(DocumentVisitor visitor)
   在 Aspose.Words.CompositeNode.AcceptCore(DocumentVisitor visitor)
   在 Aspose.Words.CompositeNode.AcceptChildren(DocumentVisitor visitor)
   在 Aspose.Words.CompositeNode.AcceptCore(DocumentVisitor visitor)
   在 Aspose.Words.CompositeNode.AcceptChildren(DocumentVisitor visitor)
   在 Aspose.Words.CompositeNode.AcceptCore(DocumentVisitor visitor)
   在 Aspose.Words.CompositeNode.AcceptChildren(DocumentVisitor visitor)
   在 Aspose.Words.CompositeNode.AcceptCore(DocumentVisitor visitor)
   在 LC4.d(OCK d)
   在 Aspose.Words.Document.d(Stream d, String v, SaveOptions c)
   在 Program.<Main>$(String[] args) 在 C:\.NET\ConsoleApp.Words\ConsoleApp.Words\Program.cs 中: 第 5 行

Testing code:

using Aspose.Words;

Document doc = new Document("工程项目施工的成本控制 80p-45.pdf.docx");
doc.Save("工程项目施工的成本控制 80p-45.pdf.docx");

@kngstr

We have tested using below code snippet and could not replicate the issue. Below is the code snippet that we tried. Please check it and let us if we missed anything.

Document pdfDocument = new Document(dataDir + @"工程项目施工的成本控制 80p-45.pdf");

DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.Format = DocSaveOptions.DocFormat.DocX;
saveOptions.Mode = DocSaveOptions.RecognitionMode.Flow;
//saveOptions.RelativeHorizontalProximity = 2.5f;
//saveOptions.RecognizeBullets = true;
pdfDocument.Save(dataDir + "工程项目施工的成本控制 80p-45.docx", saveOptions);

Aspose.Words.Document wordDocument = new Words.Document(dataDir + "工程项目施工的成本控制 80p-45.docx");
wordDocument.Save(dataDir + "finaldocument.docx");

@asad.ali

The code is OK.

In my envirment, it gets that exception every time.
error.screensho


vs.screenshot

This is the test project. It’s a console application.
ConsoleApp.Words.7z (129.7 KB)

@kngstr

In such case, it looks like the issue is related to Aspose.Words. We are moving the thread to the respective forum where you will be assisted accordingly.

@kngstr @asad.ali The problem is not in Aspose.Words. The DOCX document produced by Aspose.PDF is not valid. MS Word also cannot open DOCX document produced by Aspose.PDF. The exception message clearly shows what the problem is. MS Word allows maximum 63 columns in the table. So I move the topic back to Aspose.PDF form for further handling.
Input PDF: 工程项目施工的成本控制 80p-45.pdf (344.8 KB)
DOCX produced by Aspose.PDF: tmp.docx (57.3 KB)

Code:

Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document(@"C:\Temp\in.pdf");

Aspose.Pdf.DocSaveOptions saveOptions = new Aspose.Pdf.DocSaveOptions();
saveOptions.Format = Aspose.Pdf.DocSaveOptions.DocFormat.DocX;
saveOptions.Mode = Aspose.Pdf.DocSaveOptions.RecognitionMode.Flow;
pdfDocument.Save(@"C:\Temp\tmp.docx", saveOptions);

Aspose.Words.Document wordDocument = new Aspose.Words.Document(@"C:\Temp\tmp.docx");
wordDocument.Save(@"C:\Temp\out.docx");

@kngstr

We have logged this issue as PDFNET-57085 in our issue tracking system for the sake of correction. We will look into its details and keep you posted with the status of its correction. Please be patient and spare us some time.

We are sorry for the inconvenience.

@alexey.noskov

You are right. I tested it with older version of Aspose.Words and it did not throw any exception on my end. Also, the generated DOCX by Aspose.PDF was opened in WPS which also did not show any error and displayed the file correctly. Nevertheless, the details have been logged with the ticket for review.