Converting pdf to Word is different compared to online conversion

Hi Team,

We have observed huge difference in result between converting pdf to word online and using the code. Online converted word file retains header, footer, fonts and alignments are good. Whereas code converted word file looks fonts missing, alignment & header are not maintained. We have tried different combination of setting and save methods. Nothing works like online conversion. Kindly let us know what different settings or conversion method is used in online tool.
using (MemoryStream inputStream = new MemoryStream())
{
using (MemoryStream outputStream = new MemoryStream())
{
inputStream.Write(pdfBytes, 0, pdfBytes.Length);
// Open pdf document
Document pdfDocument = new Document(inputStream);

                    //pdfDocument.Save(@"c:\code\TestAspose.docx", SaveFormat.DocX);
                    
                    var docSaveOptions = new DocSaveOptions()
                    {
                        Format = DocSaveOptions.DocFormat.DocX,
                        Mode = DocSaveOptions.RecognitionMode.Flow,
                        RecognizeBullets = true,
                        RelativeHorizontalProximity = 2.5f
                    };
                    // Save output in docx
                    pdfDocument.Save(outputStream, docSaveOptions);
                    wordBytes = outputStream.ToArray();
                }
            }

@I30256

Could you please ZIP and attach your input PDF and problematic output DOCX here for testing? We will investigate the issue and provide you more information on it.

TestForAspose.zip (183.8 KB)
Herewith attached the files. FYI - We do have valid license.
TestForAspose.pdf – pdf file

TestForAspose.docx – Converted online

TestForAspose_NoOptions.docx – Saved using code without any DocSaveOptions

TestForAspose_Flow.docx – Saved using code with Flow option

@I30256

The online PDF to Word converter uses Aspose.Words for .NET. The fonts of output DOCX are incorrect because fonts are not installed on the system that are used in your PDF.

Using Aspose.PDF for .NET, there is no issue with fonts and alignment of text in DOCX. However, the PDF header is not exported as header of Word document. For the sake of correction, we have logged this problem in our issue tracking system as PDFNET-51473 . You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Thanks for logging this problem in issue tracker. It’s not only the header and footer, the whole flow of document is different. Please try switching to web layout in word in both (online & code) documents to see the difference. Pdf uses simple fonts (Times & Calibri). They are not converting in word properly.

You mentioned that online uses Aspose.Words for .NET, how does that work? you load pdf to Aspose.Words doc? Could you please share the snippet for this. Thanks.

@I30256

The input PDF has other font for body text. Please check the attached image.
image.png (358.0 KB)

Could you please ZIP and attach your expected output Word document here for our reference? We will investigate this issue further and provide you more information on it.

Please read the following article about converting PDF to DOCX file format using Aspose.Words.
Convert PDF to Other Document Formats