Issues converting PDF to DOCX

I’m converting a PDF to a DOCX with Aspose.PDF - current build 19.11 and I am having a few issues with the resulting .DOCX

Brief_TEST BRIEF_431.pdf (71.4 KB)

When I open the resulting DOCX I see:

  • Word opens in “Compatibility Mode”.
  • If Word happens to open in Web Layout, it looks like the content is all over the place.

See: Brief_TEST BRIEF_431.zip (33.4 KB)

The conversion code is very straightforward:

    Dim sResult As String = String.Empty
    Dim oPDF As Aspose.Pdf.Document = Nothing
    Try
        GetPDFLicence()
        oPDF = New Aspose.Pdf.Document(filename:=sPDFFile)
        oPDF.Save(sDOCXFile, Aspose.Pdf.SaveFormat.DocX)
    Catch ex As Exception
        Exit Sub
    Finally
        If Not oPDF Is Nothing Then
            Try
                oPDF.Dispose()
            Catch

            End Try
            oPDF = Nothing
        End If
    End Try

Is there any way I can improve on this output?

Ok … I think some of the problem is due to the same, or a similar, issue to:

I note that this was raised in February 2018, escalated to “Paid Support” in March 2018, and that no apparent progress was reported back on the issue since.

We give our users the option to convert to Word in order to give them an easy way to make minor modifications to the document. E.g. they might choose to change the text in the table in the first page. Because the Table is rendered as an Image in the DOCX, you cannot, for example, add further text on a second line as the table “cell” no longer expands to fit its content.

@rozeboosje

Thank you for contacting support.

We have been able to reproduce the issue while PDF to DOCX conversion and a ticket with ID PDFNET-47376 has been logged in our issue management system for further investigations.

About referred thread, currently tables are rendered as images and PDFNET-44291 has not been supported yet owing to the complications involved. We will let you know once it will be supported.

1 Like

Thanks Farhan,

If I might make a suggestion … I can imagine how difficult it might be to render PDF tables with perfect accuracy for use in Word so I can understand why the choice would be made to render them as images, superimposing the text in the correct locations so that it appears to be “inside” cells.

In some cases, though, it may be more important to the end user that the end result is rendered as text inside an actual Word table than it would be to have the document rendered with absolute visual accuracy. So perhaps having this as an optional setting could be a solution here? Default option: render visually accurate but the table is rendered as an image, New option: render as a table but complicated borders and shading and so on may not be rendered with perfect visual accuracy.

@rozeboosje

We appreciate your contribution and the suggestion has been recorded under same ticket. We will certainly consider your concerns and will inform you once the feature would be supported.

1 Like

The issues you have found earlier (filed as PDFNET-44291) have been fixed in Aspose.PDF for .NET 22.4.

1 Like