When pdf is converted to docx- text is placed in sym markup instead of regular text

memoq · September 30, 2016, 8:22am

Dear Aspose team,

I am converting a pdf file to docx (both files are attached in a zip archive) and my problem is that in some cases the content of bullet lists is not represented in the xml markup as normal text runs with simple text, but each character is converted into a w:sym element.

Can you tell me what makes this text special, what prohibits it being just converted into w:r/w:t tags?

Thanks in advance,

Best regards,

Gergely Vándor
0028699

tilal.ahmad · October 3, 2016, 11:28am

Hi Gergely,

Thanks for your inquiry. I have tested the conversion using Aspose.Pdf for .NET 12.0.0 with following code snippet and unable to notice sym markup in resultant DOCX file. I will appreciate it if you please download and try latest version of Aspose.Pdf for .NET with following code and share the results.

// Open the source PDF document
Document pdfDocument = new Document(@"orig.pdf");

// Save using save options

// Create DocSaveOptions object

Aspose.Pdf.DocSaveOptions saveOptions = new Aspose.Pdf.DocSaveOptions();

saveOptions.Format = Aspose.Pdf.DocSaveOptions.DocFormat.DocX;

// Set the recognition mode as Flow

saveOptions.Mode = Aspose.Pdf.DocSaveOptions.RecognitionMode.Flow;

// Set the Horizontal proximity as 2.5

saveOptions.RelativeHorizontalProximity = 2.5f;

// Enable the value to recognize bullets during conversion process

saveOptions.RecognizeBullets = true;

// Save the resultant DOC file

pdfDocument.Save(@"saveOptionsOutput_out_.docx", saveOptions);

Best Regards,