Superfluous line break characters after PDF conversion


#1

Hi,

After conversion of any docx or txt files with any plain text to pdf on latest Aspose.Words for .NET 17.8:
new Document(path).Save(stream, SaveFormat.Pdf)
Each line of text in output pdf contains line-break.
In the same time, if save specified file as pdf from Microsoft Office Word’s UI - no additional line breaks.

For example,
Original text or text copied from pdf generated by Microsoft Office Word
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec accumsan finibus elit, quis facilisis augue aliquet vitae. Nunc sit amet tempor est, quis congue ante. In eleifend nunc ac lorem varius, ut fermentum ligula fringilla. Integer turpis ligula, facilisis non viverra id, vulputate quis nisi. Cras sollicitudin tristique nisl vel pharetra. Nullam nunc magna, fermentum sed facilisis vel, imperdiet sed est. In hac habitasse platea dictumst. Ut egestas molestie nisl a ultricies. Proin sit amet dolor ac orci aliquet fermentum tempor eget nulla.

Text copied from pdf generated by Aspose.Words
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec accumsan finibus elit, quis facilisis augue\r\n
aliquet vitae. Nunc sit amet tempor est, quis congue ante. In eleifend nunc ac lorem varius, ut fermentum\r\n
ligula fringilla. Integer turpis ligula, facilisis non viverra id, vulputate quis nisi. Cras sollicitudin tristique nisl\r\n
vel pharetra. Nullam nunc magna, fermentum sed facilisis vel, imperdiet sed est. In hac habitasse platea\r\n
dictumst. Ut egestas molestie nisl a ultricies. Proin sit amet dolor ac orci aliquet fermentum tempor eget\r\n
nulla.

Thanks.


#2

@WorkZone,

Thanks for your inquiry. Please ZIP and attach your input Word document here for testing. Please also share the steps to reproduce this issue at our end. We will investigate the issue on our side and provide you more information.


#3

Hi,

As I wrote it can be easily reproduced with any docx or txt files with any plain text, but you can use attached one Test.zip (9.0 KB).

STR:

  1. Open Test.docx in Microsoft Office Word
  2. Save files “As PDF”
  3. Open pdf from previous step in Acrobat Reader and copy text from it.
  4. Paste text, for example, to Microsoft Office Word and turn on special characters highlighting - no line breaks after each line.
  5. Run code new Document("Test.docx").Save("Test.pdf", SaveFormat.Pdf);
  6. Open Test.pdf in Acrobat Reader and copy text from it.
  7. Paste text, for example, to Microsoft Office Word and turn on special characters highlighting - line breaks after each line.

Thanks.


#4

@WorkZone,

Thanks for sharing the detail. Please use PdfSaveOptions.ExportDocumentStructure property to get or set a value determining whether or not to export document structure. Please use the following code example to get the desired output. Hope this helps you.

Document doc = new Document(MyDir + "Test.docx");
PdfSaveOptions options = new PdfSaveOptions();
options.ExportDocumentStructure = true;
doc.Save(MyDir + "17.9.pdf", options);

#5

Thank you,

Your solution works as expected.


#6

Hi again,

In our application, after we use code above to convert several Word documents, we need to concatenate them into single pdf. We use following code to concatenate pages:
new Document("Test.docx").Save("Test.pdf", SaveFormat.Pdf);
var pdfMain = new Aspose.Pdf.Document();
var pdf = new Aspose.Pdf.Document("Test.pdf"); // Fine, no line-breaks
pdfMain.Pages.Add(pdf.Pages);
pdfMain.Save("Main.pdf"); // Fail, again we see line-breaks!

Could you, please, help us?

Thanks.


#7

@WorkZone,

Can you please share some details regarding the issue which you are facing while concatenating PDF files. If possible, please share the input PDF files, so that we can test the scenario in our environment. We are sorry for this inconvenience.


#8

Hi,

Steps To Reproduce:

  1. Create new Microsoft Office Word document and add any long text paragraph on page. Or use attached file Files.zip (50.2 KB).
  2. Run following code
    new Aspose.Words.Document("Test.docx").Save("Test.pdf", Aspose.Words.Saving.SaveFormat.Pdf);
    var pdfMain = new Aspose.Pdf.Document();
    var pdf = new Aspose.Pdf.Document("Test.pdf");
    pdfMain.Pages.Add(pdf.Pages);
    pdfMain.Save("Main.pdf");
  3. Open Main.pdf in Acrobat Reader and copy text from it.
  4. Paste text, for example, to Microsoft Office Word and turn on special characters highlighting - line breaks after each line.

Best regards.


#9

@WorkZone,
We managed to replicate the problem of line breaks after adding pages of the source PDF to another new PDF document. It has been logged under the ticket ID PDFNET-43335 in our bug tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates. In reference to the Aspose.Words API part, please use the ExportDocumentStructure property as recommended by Tahir in an earlier reply:

[C#]

Document doc = new Document(MyDir + "Test.docx");
PdfSaveOptions options = new PdfSaveOptions();
options.ExportDocumentStructure = true;
doc.Save(MyDir + "17.9.pdf", options);

#10

Hi there

Could you please give us an update on this issue? When do you plan to fix the issue?


#11

@WorkZone,

The linked ticket ID PDFNET-43335 is not resolved yet. We will investigate as per the development schedules, and notify you once it is fixed. Besides this, we recommend our clients to post their critical issues (or ticket IDs) in the paid support forum. Please refer to this helping link: Aspose support options


#12

Hi Support

Now some time has gone, can you give us status on this issue, can we expect it to be solved in the near furture or…?


#13

@WorkZone

Pages.Add() method copies only page contents and does not copy logical structure. Please use PdfFileEditor which implements more complex concatenation algorithm and CopyLogicalStructure option should be set to true:

// Create 1st document (empty)
var pdfMain = new Aspose.Pdf.Document();
MemoryStream data1 = new MemoryStream();
pdfMain.Save(data1);
FileStream fs = new FileStream("Test.pdf",  FileMode.Open, FileAccess.Read);
PdfFileEditor fe = new PdfFileEditor();
FileStream output = new FileStream("Main01.pdf", FileMode.Create, FileAccess.ReadWrite);
// set CopyLogicalStructure option 
fe.CopyLogicalStructure = true;
// concatenate documents                
fe.Concatenate(new Stream[] {data1, fs}, output);

The earlier logged ticket is closed now. Please try above code snippet with latest version of the API (Aspose.PDF for .NET 18.10) and in case of any issue, please feel free to contact us.