Documents with mixed (RTL and LTR) text

Hi,
We are using Aspose.Words for .NET and got issues with documents containing mixed (RTL and LTR) text. We opent source DOCX documents and convert them to PDF by the such code:

Document wordDocument = new Document(contentStream);

// Get all comments
var comments = wordDocument.GetChildNodes(NodeType.Comment, true);
// Remove all comments
comments.Clear();

wordDocument.FieldOptions.FieldUpdateCultureSource = Aspose.Words.Fields.FieldUpdateCultureSource.FieldCode;

wordDocument.LayoutOptions.IsShowHiddenText = false;
wordDocument.LayoutOptions.IsShowParagraphMarks = false;
wordDocument.LayoutOptions.RevisionOptions.ShowOriginalRevision = false;
wordDocument.LayoutOptions.RevisionOptions.ShowRevisionBalloons = false;
wordDocument.LayoutOptions.RevisionOptions.ShowRevisionBars = false;
wordDocument.LayoutOptions.RevisionOptions.ShowRevisionMarks = false;

wordDocument.UpdateFields();
wordDocument.Save(outputStream, SaveFormat.Pdf);

If you try to open the resulting PDF file in Adobe Reader, and find the line in the first parentheses (highlight both words, in Hebrew and Greek), the search of this two words does not work. But if you save the original document as a PDF by Microsoft Word, the search for the line in brackets works.

Please see initial 1_sentence.docx, 1_sentence.docx.pdf (poduced by Aspose.Words), and 1_sentence.pdf (poduced by MS Word) in attached zip.

Thanks,
Roman

DocsWithIssuesInAspose.zip (256.8 KB)

1 Like

Sorry I forgot to specify a version.We are using Aspose.Words 17.7.

Thanks,
Roman

@cap.aspose

Thanks for your inquiry. We have converted your shared document using Aspose.Words for .NET 17.9 and unable to notice the reported text search issue. We will appreciate it if you please share some more details along with screenshots. We will further investigate the issue and will guide you accordingly.
1_sentence_179.pdf (32.5 KB)

image.png (51.7 KB)

Hi Tilal,

As I wrote above, you need to search not one word but both at once. Acrobat Reader finds each of them separately. But both at once - no. In order to reproduce the issue, you need to do the following:

  1. Open the “1_sentence.docx” file in Microsoft Word.

  2. Select two words, as in the screenshot below, and copy them to the clipboard.
    CopyToCliboardInWord.png (16.9 KB)

  3. Open the “1_sentence_179.pdf” file which you attached above in Acrobat Reader.

  4. Press Ctrl+F, insert in search field the content of the clipboard buffer and press Enter key or Next button. The result is in the screenshot below.
    SearchInAsposeProducedPDF.png (45.2 KB)

But if you try to do same search in “1_sentence.pdf” from my ZIP, the search works correctly. This file created by Microsoft Word. The result in the screenshot below.
SearchInMSWordProducedPDF.png (38.0 KB)

Thanks,
Roman

Hi,

We found a second issue with the conversion of the “1_sentence.docx” file to PDF by Aspose.Words. Please try to open in Acrobat Reader the “1_sentence.pdf” file from my ZIP (which is created by MS Word) and the “1_sentence_179.pdf” file you attached. Both files look the same.

But if we copy whole content of the “1_sentence.pdf” file (by pressing Ctl+A, then Ctlr+C in Reader), place content in translate.google.com (before doing it it’s need to select Hebrew language), we see proper words owder. And when we click on translate button we get a proper translation.

If we do the same thing with content from the “1_sentence_179.pdf” file, after we place the content in translate.google.com, we see that some words in the first sentence not in their places. Naturally, the translation of this sentence turns out to be incorrect.

It looks like the words in the document are not in their places, but Reader renders them where necessary. Perhaps these two issues are somehow related, because document contains mixed (LTR and RTL) content.

The correct arrangement of words in the document is critical for us, because our product searches for phrases in PDF and if the words are in different parts of the sentence, our search will not be able to find the phrase.

Thamks,
Roman

@cap.aspose

Thanks for providing additional information. We have tested the scenario as per your directions and noticed the reported issue. We have logged a ticket WORDSNET-15879 in our issue tracking system for further investigation and rectification. We will notify you as soon as it is resolved.

1 Like

@cap.aspose

Thanks for sharing your findings. We have shared it with the product team, they will consider it during issue investigation.

A post was split to a new topic: Issue with mixed (RTL and LTR) text

Hi,
I downloaded the latest version of Words (19.1) and I see that in this version the issue still exists yet.
Is there any updates on this issue?

Thanks,
Roman

@cap.aspose

Thanks for your inquiry. We regret to share with you that the implementation of this issue has been postponed (no estimate is available at the moment). We will inform you via this thread as soon as this issue is resolved. We apologize for your inconvenience.

Hi, what is the update on WORDSNET-15879 please

@cap.aspose

Unfortunately, there is no update available on this issue. We have converted the shared PDF file using the latest version of Aspose.Words for .NET 21.4 and attached it with this post. Could you please check if it is your desired output? 21.4.pdf (39.7 KB)