When converting a PDF to a Word DocX file, almost all of the spaces in the text are removed

I am attempting to convert a PDF to a DocX file in order to manipulate the text and then convert that DocX file back to PDF. Before I wrote all the code to do that manipulation, I wanted to first test to make sure I could take a PDF, convert it to DocX, and then convert it back into a PDF without any issue. For some reason, when I convert the PDF to a DocX almost all of the spaces in the text are being removed. Other than this issue, it seems like everything else worked correctly and the DocX->PDF converted document looks great.
Here is the code I am using to do the conversions:

    Aspose.Pdf.License pdfLicense = new Aspose.Pdf.License();
    Aspose.Words.License wordLicense = new Aspose.Words.License();
    public Aspose.Pdf.Document PDFDocument { get; set; }
    public Aspose.Words.Document WordDocument { get; set; }
    public string DocumentTitle { get; set; }

    public void DoWork()
    {
        var dir = Directory.GetCurrentDirectory() + "/";
        var licenseFile = dir + "Aspose.Total.lic";
        pdfLicense.SetLicense(licenseFile);
        wordLicense.SetLicense(licenseFile);

        PDFDocument = new Aspose.Pdf.Document(dir + "Equipping-Families.pdf");

        using (var wordStream = new MemoryStream())
        {
            var pdfOptions = new DocSaveOptions()
            {
                Format = DocSaveOptions.DocFormat.Doc,
                Mode = DocSaveOptions.RecognitionMode.Flow,
                RecognizeBullets = true
            };
            PDFDocument.Save(wordStream, pdfOptions);
            WordDocument = new Aspose.Words.Document(wordStream);
        }

        var options = new Aspose.Words.Saving.PdfSaveOptions();
        options.SaveFormat = Aspose.Words.SaveFormat.Pdf;
        options.ExportDocumentStructure = true;
        WordDocument.Save(dir + "Equipping-Families-Doc-To-PDF-Convert" + ".pdf", options);
        WordDocument.Save(dir + "Equipping-Families-PDF-To-Doc-Convert" + ".docx");
    }

And these are the files I’m using/getting from the above code:
Original PDF: Equipping-Families.pdf (235.5 KB)
DocX File after being converted from the original PDF: Equipping-Families-PDF-To-Doc-Convert.docx (77.2 KB)
PDF File after being converted from the Converted DocX: Equipping-Families-Doc-To-PDF-Convert.pdf (307.7 KB)

I feel like I must be missing something obvious either in how the original PDF is constructed or in my DocX Save Options. What’s really odd is that the spaces in all but the date text on the last page are preserved correctly in the DocX and the converted PDF files.

@gandalfwg

Could you please make sure that you use the latest version of the API i.e. 21.4 along with the below code snippet to convert PDF to DOCX as we tested the scenario in our environment and did not notice any issue in the output DOCX file.

Document pdfDocument = new Document(dataDir + @"Equipping-Families.pdf");
DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.Format = DocSaveOptions.DocFormat.DocX;
saveOptions.Mode = DocSaveOptions.RecognitionMode.Flow;
saveOptions.RelativeHorizontalProximity = 2.5f;
saveOptions.RecognizeBullets = true;
pdfDocument.Save(dataDir + @"output_flow.docx", saveOptions);

output_flow.docx (206.8 KB)

The license I have would only let me upgrade to PDF 11.9.0 and Words 16.7.0 but that did the trick! But now I’m having an issue going from the DocX to PDF. For some reason most of the text isn’t coming over when saving the Words Document to PDF.
Here’s the code now:

    Aspose.Pdf.License pdfLicense = new Aspose.Pdf.License();
    Aspose.Words.License wordLicense = new Aspose.Words.License();
    public Aspose.Pdf.Document PDFDocument { get; set; }
    public Aspose.Words.Document WordDocument { get; set; }
    public string DocumentTitle { get; set; }

    public void DoWork()
    {
        var dir = Directory.GetCurrentDirectory() + "/";
        var licenseFile = dir + "Aspose.Total.lic";
        pdfLicense.SetLicense(licenseFile);
        wordLicense.SetLicense(licenseFile);

        PDFDocument = new Aspose.Pdf.Document(dir + "Equipping-Families.pdf");

        using (var wordStream = new MemoryStream())
        {
            DocSaveOptions saveOptions = new DocSaveOptions();
            saveOptions.Format = DocSaveOptions.DocFormat.DocX;
            saveOptions.Mode = DocSaveOptions.RecognitionMode.Flow;
            saveOptions.RelativeHorizontalProximity = 2.5f;
            saveOptions.RecognizeBullets = true;
            PDFDocument.Save(wordStream, saveOptions);
            WordDocument = new Aspose.Words.Document(wordStream);
        }

        var options = new Aspose.Words.Saving.PdfSaveOptions();
        options.SaveFormat = Aspose.Words.SaveFormat.Pdf;
        options.ExportDocumentStructure = true;
        WordDocument.Save(dir + "Equipping-Families-Doc-To-PDF-Convert" + ".pdf", options);
        WordDocument.Save(dir + "Equipping-Families-PDF-To-Doc-Convert" + ".docx");
    }

DocX File after being converted: Equipping-Families-PDF-To-Doc-Convert.docx (3.0 MB)
PDF file after being converted from the above DocX file: Equipping-Families-Doc-To-PDF-Convert.pdf (395.5 KB)

This is of secondary concern, but the Word doc has ballooned in size to 3 MB (over 10 times the size of the PDF, let alone the old 77.2 KB Word doc) so while I like the results, I would love to be able to get that down a bit. I tried to use the original Words DLL that I have with the new PDF DLL (so, Words 15.6.0 and PDF 11.9.0) which fixed the file size issue but not the mostly-blank DocX-to-PDF issue.

I was able to fix the issue using Words 15.6.0 and PDF 11.9.0 by just converting to a Doc file rather than DocX. I hope this can help you figure out what is going on with the DocX file type in this situation. The only issues now is formatting/look of the newly converted PDF. Here are my files:
New Doc-post Convert: Equipping-Families-PDF-To-Doc-Convert.docx (783.5 KB) - You’ll have to rename it to a .doc file after downloading it because your system won’t let me upload .doc files.
New PDF-post Convert: Equipping-Families-Doc-To-PDF-Convert.pdf (415.2 KB)

@gandalfwg

Please note that the support is provided on the basis of the latest version. Also, we always recommend to use the latest version of the APIs as they contain more fixes and enhancements. There are a lot of methods and classes in older versions which have been obsolete and discontinued and we are not maintaining them neither providing support against them.

Please try to use the latest versions of the both APIs and in case you are facing some issue, please let us know. Furthermore, you can please post the Aspose.Words related issues in respective category where you will be assisted accordingly.