Hebrew text- BIDI options

milancutlac · August 2, 2010, 7:54am

Hi support,

Are there any options available for the pdf output, in order to have correct Hebrew text ?
At this time, Hebrew words are reversed, and letters in words are reversed too.

Regards,
Milan

codewarior · August 2, 2010, 2:56pm

Hello Milan,

Can you please share the code snippet so that we can test the scenario at our end. We apologize for your inconvenience.

milancutlac · August 3, 2010, 3:43am

Hi Nayyer,

The problem is that Hebrew text requires special display : the words must be reversed, the order of letters in words also, and the alignment must be right-to-left. In the code below, I read a txt file that has some Hebrew and English text . The pdf output looks like the source, but it should look different, at least it should be displayed different. Take a look over the screenshots also :

Pdf pdf = new Pdf();
Section sec1 = pdf.getSections().add();
String str = “”;
try
{
FileInputStream fstream = new FileInputStream(“D:\test.txt”);
DataInputStream in = new DataInputStream(fstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in, “UTF-8”));
String strLine;
while ((strLine = br.readLine()) != null)
{
str += strLine;
}
in.close();
}
catch (Exception e)
{
System.err.println("Error: " + e.getMessage());
}
Text text = new Text(sec1, str); //$NON-NLS-1$
text.getTextInfo().setFontName(“Arial Unicode MS”);
text.getTextInfo().setIsUnicode(true);
sec1.getParagraphs().add(text);
pdf.save(new FileOutputStream(new File(“C:/Temp/Bidi.pdf”))); //$NON-NLS-1$

word.jpg is desired output. Note that English words remain, and must remain unmodified. The pdf.jpg is the output file for current code. Note the differences between word.jpg and pdf.jpg

Regards,
Milan

codewarior · August 3, 2010, 1:57pm

Hi Milan,

Thanks for sharing the code snippet and the source documents.

I have tested the scenario and I am able to reproduce the same problem. For the sake of correction I have logged it in our issue tracking system as PDFJAVA-19004. We will investigate this issue in detail and will keep you updated on the status of a correction.

We apologize for your inconvenience.

Gao · December 20, 2010, 11:51pm

Hi Nayyer,

Any updates on this issue PDFJAVA-19004?

Regards,
- Gao

codewarior · December 22, 2010, 1:59am

Hello Gao,

Thanks for using our products.

I am pleased to inform you that the reported issue has been resolved and its hotfix will be available in upcoming release version. As soon as the new version is published, we would be pleased to update you with the status of availability within this forum thread. Please be patient and spare us little time.

Your cooperation and comprehension is significantly admired in this regard. We are sorry for the delay and inconvenience.

aspose.notifier · February 20, 2011, 3:21pm

The issues you have found earlier (filed as 19004) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

garkler · April 28, 2011, 3:01am

Hey,

Is this update for the .net aspose dlls too ?

thanks,

Garkler

codewarior · April 28, 2011, 4:25am

Hello Garkler,

Thanks for using our products.

The above specified link towards the hotfix is for Aspose.Pdf for Java. However, Aspose.Pdf for .NET 5.2.0 also supports the capability to render Hebrew/non-English text into PDF document. But there is a minor issue in latest release that if there are numbers or English text inside Unicode text, its direction is also reversed while placing the contents inside PDF. i.e. פה עברתי שורה ואכתוב באנגלית text1 text2 ואחזור לעברית will be displayed as ואחזור לעברית txet 1txet פה עברתי שורה ואכתוב באנגלית in resultant PDF. For the sake of correction, the problem has already been logged in our issue tracking system as PDFNET-25909 and our development team is already working hard to get it fixed ASAP. Once the solution becomes available, we would be more than happy to share the information with you.

For testing purpose, I have used following code snippet while using Aspose.Pdf for .NET 5.2.0. For your reference, I have also attached the resultant PDF with this post. We are really sorry for this inconvenience.

[C#]

//Instantiate Pdf object by calling its empty constructor
Aspose.Pdf.Pdf pdf1 = new Aspose.Pdf.Pdf();
//Create a new section in the Pdf object
Aspose.Pdf.Section sec1 = pdf1.Sections.Add();

//Create a new text paragraph and pass the text to its constructor as argument
Aspose.Pdf.Text SampleText = new Aspose.Pdf.Text("פה עברתי שורה ואכתוב באנגלית text1 text2 ואחזור לעברית");
// specify the font information for text object
SampleText.TextInfo.FontName = "Arial Unicode MS";
// specify that the text object contains unicode contents
SampleText.TextInfo.IsUnicode = true;
// add the text object to paragraphs collection of section
sec1.Paragraphs.Add(SampleText);
// specify that PDF document contains RTL text
pdf1.IsRightToLeft = true;
// save the resultant PDF
pdf1.Save(@"d:/pdftest/HebrewText_RTL.pdf");

aspose.notifier · May 11, 2011, 4:37pm

The issues you have found earlier (filed as 25909) have been fixed in this update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(1)

drormu · October 24, 2018, 5:52am

HI,
any solution for TextExtraction from PDF (Net) ?

Farhan.Raza · October 24, 2018, 1:17pm

@drormu

Thank you for contacting support.

Please always create separate topics for separate inquiries, where you can refer to any thread link that appears related to your query. Moreover, you can extract text from a PDF document as explained in Extract Text from PDF.

We hope this will be helpful. Please feel free to contact us if you need any further assistance and we will be more than happy to help you further.