Incorrect layout with non-printable characters

Hi,

Have problem with document that contains some strange invisible/non-printable characters. Posibly it’s document author’s error (copy paste result). But because Word does layout other way than Aspose.Words it’s much harder to find them and correct in large document.
Tested With Aspose.Words 11.5

[Test]
[Explicit]
[Ignore]
public void Converts_Correct_Layout_Unprintable_characters()
{
    string inputFile = @"TestData\GeneratePdf\Unprintable.doc";
    string outputFile = @"Unprintable.pdf";

    Document doc = new Document(inputFile);
    doc.Save(outputFile, SaveFormat.Pdf);

    var pdfDocument = new Aspose.Pdf.Document(outputFile);

    Assert.AreEqual(1, pdfDocument.Pages.Count, "Incorrect layout. Wrong number of pages");
}

Regards
Jacek Bator

Hi,

Thanks for your inquiry.

You can turn these selected formatting marks off by following the steps below in MS WORD 2010:

  1. Click the File tab.
  2. Under Help, click Options.
  3. Click Display - Under Always show these formatting marks on the screen, clear the check boxes for any formatting marks that you do not want to show in your documents at all times.

I hope this will help.

Best Regards,

Hi,

System I’m working on is automatic document processing system. I’ using Aspose.Words to extract page numbers and content for some paragraphs. And when MS Word layout and Aspose.Words layout are not in sync links that system produces are leading to incorrect pages.

So question is are you able to fix it (make tak it behave same as MS Word) or is there anything I can do on Aspose.Words side to be in sync with MS Word layout.

Best Regards
Jacek Bator

Hi
Jacek,

Thanks for the additional information. We are working over your query and will get back to you as soon as possible.

Best Regards,

Hi
Jacek,

Thanks for your patience.

I managed to reproduce this issue on my side. I have logged this issue in our bug tracking system. The issue ID is WORDSNET-6586. Your request has been linked to this issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.

Moreover, I have found that these invisible/non-printable characters are actually ‘\r’ (ControlChar.Cr) characters. As a temporary work around, please try using the following code snippet:

Document doc = new Document(@"C:\test</span>unprintable.doc");
Node[] chars = doc.GetChildNodes(NodeType.Run, true).ToArray();
foreach(Run r in chars)
{
    if (r.Text.Contains(ControlChar.Cr))
    {
        r.Text = " ";
    }
}
doc.Save(@"C:\test\out.pdf");

I hope, this will help.

Best Regards,

The issues you have found earlier (filed as WORDSNET-6586) have been fixed in this Aspose.Words for .NET 18.5 update and this Aspose.Words for Java 18.5 update.

The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan