Hebrew text fails to extract on alpine 3.8


#1

Hi,

Extracting Hebrew text fails on alpine 3.8 due to a negative index error. The same code works on windows so it might be due to some missing dependencies. But we were unable to identify these dependencies.

Code snippet:

                Aspose.Pdf.Document pdf = new Aspose.Pdf.Document(pdfFilePath);

                var txtFile = Path.Combine(arguments.TargetDirectory, string.Format("{0}.txt", Guid.NewGuid()));

                var device = new Aspose.Pdf.Devices.TextDevice(Encoding.UTF8);

                using (FileStream fs = new FileStream(txtFile, FileMode.OpenOrCreate))
                {
                    foreach (Aspose.Pdf.Page page in pdf.Pages)
                    {
                        device.Process(page, fs);
                    }

                    fs.Close();
                }

Stactrace:

2019-05-14 17:42:09.1446|ERROR|*.UnhandledExceptionLogger|System.ArgumentOutOfRangeException: StartIndex cannot be less than zero.
Parameter name: startIndex
   at System.String.Substring(Int32 startIndex, Int32 length)
   at #=zgp9CpQXG767U8n66dAdbA$A3NFtKnzk3ZZI1Hpp5jq2Cr9BFQg==.#=zgIBZTcw9IMv_(Stream #=zMCASuu0=, Encoding #=zryCDvQg=)
   at #=zAVT_PM5qsUpKqNbRjkZoSvwlXO1CkB79slQ0LXg$CMde.#=zgIBZTcw9IMv_()
   at #=z7GS$1utcjEvFgZ_8ESrPzsPObFTuC0RfM1KwPaGM53EXmO2lZ5vsU1g=..ctor(List`1 #=zcK4kXajD7omD, Rectangle #=zJBkfw3Y=, TextExtractionOptions #=z7whGLa0=)
   at #=zIiLJFlmpyWhFpRxTGDlVzmf0tTbnx3lDjqdIkaCMa2Q950ujGqxNj2$hDz_p.#=zDL8SjUoeELn0(TextExtractionOptions #=z7whGLa0=)
   at Aspose.Pdf.Text.TextAbsorber.#=zBgENoDqn6YhZ(#=zIiLJFlmpyWhFpRxTGDlVzmf0tTbnx3lDjqdIkaCMa2Q950ujGqxNj2$hDz_p #=zDIzP1xR6d9Y5PfUKcA==, Boolean #=z6pE$l4g=)
   at Aspose.Pdf.Text.TextAbsorber.#=zKE5lKIBPsTzy(#=zIiLJFlmpyWhFpRxTGDlVzmf0tTbnx3lDjqdIkaCMa2Q950ujGqxNj2$hDz_p #=zDIzP1xR6d9Y5PfUKcA==, Boolean #=z6pE$l4g=)
   at Aspose.Pdf.Text.TextAbsorber.Visit(Page page)
   at Aspose.Pdf.Devices.TextDevice.Process(Page page, Stream output)
   at *.Converter.ProcessPdfDocument(String pdfFilePath, AsposeArguments arguments) in */Converter.cs:line x
   at *.Converter.Process(AsposeArguments arguments) in */Converter.cs:line x
   at *.Program.Main(String[] args) in */Program.cs:line x

As we could only reproduce this on our docker images, here are the details of the environment:
Base image: microsoft/dotnet:2.2.2-runtime-alpine3.8
Apk packages:
* libgdiplus-5.6.1-r0 x86_64
* msttcorefonts-installer-3.6-r2 x86_64
* fontconfig-2.12.6-r1 x86_64

We’re currently running the Aspose.PDF 19.2 nuget package, but in our tests we also reproduced it using Aspose.PDF 19.4

example pdf: Practico - Basic Biblical Hebrew.pdf (4.8 MB)

Kind regards,
Koen


#2

@kobellem

Thanks for contacting support.

We have logged an investigation ticket as PDFNET-46398 in our issue tracking system. We will further look into details of the issue and keep you posted with the status of its rectification. Please be patient and spare us little time.

We are sorry for the inconvenience.


#3

Hi, any update on the status of this ticket? Many thanks!


#4

@Stylelabs

The earlier logged ticket is still pending for analysis and have not been yet resolved. We will definitely let you know as soon as we make some progress towards its resolution. Please spare us little time.

We are sorry for the inconvenience.


#5

Hello @asad.ali can you please share a status update on this ticket? Thank you.


#6

@Stylelabs

We regret to share that earlier logged ticket is not yet resolved due to other pending issues in the queue. As ticket was logged under free support model, it has low priority and will be resolved on first come first serve basis. We will surely notify you as soon as ticket is resolved. Please spare us little time.

We are sorry for the inconvenience.