Word wrapping is turned into paragraph breaks after converting PDF to DOC(X)

Hello,

This is the second of two issues that are stopping us from rewewing our subscription. If this doesn’t get resolved, we will probably never renew our subscription.

Our software takes PDF files and converts them to DOC(X) for further processing. Wherever there is word wrapping (long sentence spanning at least two lines) in the PDF, thw word wrapping becomes a paragraph break in the DOC(X) file. This is an issue for us because we forther need to process the text, and for that the sentences must remain "intact."

I’m attaching a PDF. This was originally created in MS Word, and then saved to PDF. This was just a long sentence I typed, no manual paragraph break was introduced. After saving this back to DOC(X) using Aspose, a paragraph break is introduced.

The version we use in production is 6.8.0.0, but I beleive we have checked newer versions that had the same issue, and we couldn’t resolve it. But if I’m wrong and a newer version of Aspose.PDF resolves this, please let me know.

This is our code:

///


/// Converts input pdf file to docx and returns the path of it.
///

///
///
public string ConvertToDocx(string inputFilePath)
{
string filePathDocx = Path.Combine(workingDirectory, “temp.docx”);

// if the file is already docx, copy it and return
// this can happen in case of monolingual review
if (Path.GetExtension(inputFilePath).ToLowerInvariant() == “.docx”)
{
File.Copy(inputFilePath, filePathDocx);
return filePathDocx;
}

//Set licenses for Aspose.Pdf and Aspose.Doc assemblies
AsposeLicenseInit.AssurePdfLicenseSet();
AsposeLicenseInit.AssureWordsLicenseSet();

this.filePathPdf = inputFilePath;

string tempFolder;
using (var tempFoler = TempPathHelper.GetTempFolderForScope(out tempFolder))
{
Directory.CreateDirectory(tempFolder);
string filePathDoc = Path.Combine(tempFolder, “temp.doc”);

//open the source PDF document
Document pdfDocument = new Document(inputFilePath);
DocSaveOptions saveOptions = new DocSaveOptions();
//Setting conversion options
saveOptions.RecognizeBullets = RecognizeBullets;
saveOptions.Mode = (ConversionType == PdfConversionType.TextFlow)
? DocSaveOptions.RecognitionMode.Flow
: DocSaveOptions.RecognitionMode.Textbox;
if (ProximityIsSet) saveOptions.RelativeHorizontalProximity = RelativeHorizontalProximity;

//1. Conversion to .doc
pdfDocument.Save(filePathDoc, saveOptions);

//2. Conversion to .docx
Aspose.Words.Document asposeDocument = new Aspose.Words.Document(filePathDoc);
asposeDocument.Save(filePathDocx);
return filePathDocx;
}
}

Best regards,
Gergely

Hi Gergely,

Thanks for your inquiry. While testing the scenario with your shared document, we have managed to reproduce the paragraph break issue with latest version of Aspose.Pdf for .NET 9.6.0. We have logged a ticket PDFNEWNET-37577 in our issue tracking system for further investigation and resolution. We will notify you as soon as it is resolved.

We are sorry for the inconvenience caused.

Best Regards,

Hi Aspose folks,

Are there any plans to fix this eventually?

Best regards,
Gergely

Hi Gergely,


Thanks for your inquiry. I am afraid your reported issue is still not resolved. Currently our product team is busy to resolve other issues in the queue, reported earlier. However we have increased the issue priority and requested our team to share an ETA at their earliest. We will notify you as soon as we made some significant progress towards issue resolution.


We are sorry for the inconvenience caused.


Best Regards,

Hello,

Any update? This is (still) preventing us from renewing
our subscription. We will renew if this and another bug is fixed. The
other bug is this one:
https://forum.aspose.com/t/83136

BR,
Gergely

Hi Gergely,


Thanks for your inquiry. Our product team has start working on the issue PDFNEWNET-37577 and hopefully if every thing worked as per plan then its fix will be available in Aspose.Pdf for .NET 11.3.0. It will released in February, 2016. However we will keep you updated about the issue resolution.

Furthermore in reference to other issue, please follow the issue in respective thread so related developer can provide you the update.

Thanks for your patience and cooperation.

Best Regards,

Hi Gergely,


Thanks for your patience.

We are pleased to share that the issue reported earlier is resolved in latest release of Aspose.Pdf for .NET 11.7.0. In order to cater this requirement, we have introduced a new option AddReturnToLineEnd and it should be false to prevent adding of hard break of lines inside paragraphs.

[C#]

var saveOptions = new DocSaveOptions<o:p></o:p>

{

Format = DocSaveOptions.DocFormat.Doc,

Mode = DocSaveOptions.RecognitionMode.Textbox

RecognizeBullets = true,

AddReturnToLineEnd = false

};