Aspose.Words PDF conversion adding carriage return line feed after each line

We are using Aspose.Words to convert .docx documents to PDF. We noticed that the converted PDF will have CR LF after each line even though this is not the case in the .docx file. So if the docx file has like 5 long paragraphs in it, each divided in multiple lines, the converted PDF will have CR LF aftef each line eventhough only 5 CR LF should be present.

Is there some option in Aspose.Words to preserve the orginal docx document structure?

Hi
Thanks for your request. Could you please attach your document here for testing? We will check the issue and provide you more information.
Best regards,

Thank you for your quick response. Here are two example documents.

Hi Ville,
Thank you for additional information. But as I can see the output PDF looks fine. Do you mean a paragraph breaks that are added after each line when you copy paste content from PDF? If so, it is not a bug, this is a specific of PDF format. Unlike MS Word document PDF document is a fixed page format. Each line of text and even each word or letter can be a separate text object. Each text object is absolutely positioned on the page. When you copy content from PDF document, the reader just determines where text objects are on different lines and inserts a line break at this place.
Best regards,

Hi Alexey,

Yes I ment the “paragraph breaks that are added after each line when you copy paste content
from PDF”. We need to copy paste content from the PDF to a multiline textbox in our Windows Forms application and do some processing to it on user input and here we need to know the orginal paragrapg breaks.

I have added another PDF example which is made with MS Word 2007 save as PDF function. As you can see there are no line breaks after each line when you copy the content from this PDF. So the orginal document structure is still present. I know that we have had this kind of result with Aspose.Words also about a year ago when we were using the evaluation version of it. Just recently we discovered this new behaviour while doing some testing with this copy paste funcionality.

Hi Ville,
Thank you for additional information. I logged the issue into our defect database. We will let you know once it is resolved.
Maybe in your case, you can extract content directly from MS Word documents using Aspose.Words.
Best regards,

Thank you for taking this issue into your defect database. Unfortunately extracting content directly from MS Word is not an option for us as you suggested. We’ll be waiting for your fix on the issue and hope to hear from you again soon.

Hi Ville,
We will let you know once the issue is resolved or we have more information.
Best regards,

Hello, Ville.

The described issue occurs because MS Word uses special PDF “Logical Structure” feature which is not supported by Aspose.Words yet.
Unfortunately we have no plans for implementing it in the nearest releases. But we probably will return to this question later this year.

Best regards,

Hi,

Has there been any plans to implement the “Logical structure” into Aspose since June? Our customer is anxiously waiting when they are able to use the feature we have promised to deliver them.

Hello, Ville.

Thanks for your request. Unfortunately we still have no plans for implementing it in the nearest releases. There are a lot of other important issues we have to work on. And we just do not have free hands to implement this right now. Thanks for your understanding and patience.
Best regards,

The issues you have found earlier (filed as WORDSNET-4887) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

A post was split to a new topic: Carriage return line feed after each line in PDF