Hello,
This is the second of two issues that are stopping us from rewewing our subscription. If this doesn’t get resolved, we will probably never renew our subscription.
Our software takes PDF files and converts them to DOC(X) for further processing. Wherever there is word wrapping (long sentence spanning at least two lines) in the PDF, thw word wrapping becomes a paragraph break in the DOC(X) file. This is an issue for us because we forther need to process the text, and for that the sentences must remain "intact."
I’m attaching a PDF. This was originally created in MS Word, and then saved to PDF. This was just a long sentence I typed, no manual paragraph break was introduced. After saving this back to DOC(X) using Aspose, a paragraph break is introduced.
The version we use in production is 6.8.0.0, but I beleive we have checked newer versions that had the same issue, and we couldn’t resolve it. But if I’m wrong and a newer version of Aspose.PDF resolves this, please let me know.
This is our code:
///
/// Converts input pdf file to docx and returns the path of it.
///
///
///
public string ConvertToDocx(string inputFilePath)
{
string filePathDocx = Path.Combine(workingDirectory, “temp.docx”);
// if the file is already docx, copy it and return
// this can happen in case of monolingual review
if (Path.GetExtension(inputFilePath).ToLowerInvariant() == “.docx”)
{
File.Copy(inputFilePath, filePathDocx);
return filePathDocx;
}
//Set licenses for Aspose.Pdf and Aspose.Doc assemblies
AsposeLicenseInit.AssurePdfLicenseSet();
AsposeLicenseInit.AssureWordsLicenseSet();
this.filePathPdf = inputFilePath;
string tempFolder;
using (var tempFoler = TempPathHelper.GetTempFolderForScope(out tempFolder))
{
Directory.CreateDirectory(tempFolder);
string filePathDoc = Path.Combine(tempFolder, “temp.doc”);
//open the source PDF document
Document pdfDocument = new Document(inputFilePath);
DocSaveOptions saveOptions = new DocSaveOptions();
//Setting conversion options
saveOptions.RecognizeBullets = RecognizeBullets;
saveOptions.Mode = (ConversionType == PdfConversionType.TextFlow)
? DocSaveOptions.RecognitionMode.Flow
: DocSaveOptions.RecognitionMode.Textbox;
if (ProximityIsSet) saveOptions.RelativeHorizontalProximity = RelativeHorizontalProximity;
//1. Conversion to .doc
pdfDocument.Save(filePathDoc, saveOptions);
//2. Conversion to .docx
Aspose.Words.Document asposeDocument = new Aspose.Words.Document(filePathDoc);
asposeDocument.Save(filePathDocx);
return filePathDocx;
}
}
Best regards,
Gergely
Hi Gergely,
Thanks for your inquiry. While testing the scenario with your shared document, we have managed to reproduce the paragraph break issue with latest version of Aspose.Pdf for .NET 9.6.0. We have logged a ticket PDFNEWNET-37577 in our issue tracking system for further investigation and resolution. We will notify you as soon as it is resolved.
We are sorry for the inconvenience caused.
Best Regards,
Hi Aspose folks,
Are there any plans to fix this eventually?
Best regards,
Gergely
Hi Gergely,
We are sorry for the inconvenience caused.
Best Regards,
Hello,
Any update? This is (still) preventing us from renewing
our subscription. We will renew if this and another bug is fixed. The
other bug is this one:
https://forum.aspose.com/t/83136
BR,
Gergely
Hi Gergely,
Hi Gergely,
var saveOptions = new DocSaveOptions<o:p></o:p>
{
Format = DocSaveOptions.DocFormat.Doc,
Mode = DocSaveOptions.RecognitionMode.Textbox
RecognizeBullets = true,
AddReturnToLineEnd = false
};