Free Support Forum - aspose.com

Hebrew text fails to extract on alpine 3.8


#1

Hi,

Extracting Hebrew text fails on alpine 3.8 due to a negative index error. The same code works on windows so it might be due to some missing dependencies. But we were unable to identify these dependencies.

Code snippet:

                Aspose.Pdf.Document pdf = new Aspose.Pdf.Document(pdfFilePath);

                var txtFile = Path.Combine(arguments.TargetDirectory, string.Format("{0}.txt", Guid.NewGuid()));

                var device = new Aspose.Pdf.Devices.TextDevice(Encoding.UTF8);

                using (FileStream fs = new FileStream(txtFile, FileMode.OpenOrCreate))
                {
                    foreach (Aspose.Pdf.Page page in pdf.Pages)
                    {
                        device.Process(page, fs);
                    }

                    fs.Close();
                }

Stactrace:

2019-05-14 17:42:09.1446|ERROR|*.UnhandledExceptionLogger|System.ArgumentOutOfRangeException: StartIndex cannot be less than zero.
Parameter name: startIndex
   at System.String.Substring(Int32 startIndex, Int32 length)
   at #=zgp9CpQXG767U8n66dAdbA$A3NFtKnzk3ZZI1Hpp5jq2Cr9BFQg==.#=zgIBZTcw9IMv_(Stream #=zMCASuu0=, Encoding #=zryCDvQg=)
   at #=zAVT_PM5qsUpKqNbRjkZoSvwlXO1CkB79slQ0LXg$CMde.#=zgIBZTcw9IMv_()
   at #=z7GS$1utcjEvFgZ_8ESrPzsPObFTuC0RfM1KwPaGM53EXmO2lZ5vsU1g=..ctor(List`1 #=zcK4kXajD7omD, Rectangle #=zJBkfw3Y=, TextExtractionOptions #=z7whGLa0=)
   at #=zIiLJFlmpyWhFpRxTGDlVzmf0tTbnx3lDjqdIkaCMa2Q950ujGqxNj2$hDz_p.#=zDL8SjUoeELn0(TextExtractionOptions #=z7whGLa0=)
   at Aspose.Pdf.Text.TextAbsorber.#=zBgENoDqn6YhZ(#=zIiLJFlmpyWhFpRxTGDlVzmf0tTbnx3lDjqdIkaCMa2Q950ujGqxNj2$hDz_p #=zDIzP1xR6d9Y5PfUKcA==, Boolean #=z6pE$l4g=)
   at Aspose.Pdf.Text.TextAbsorber.#=zKE5lKIBPsTzy(#=zIiLJFlmpyWhFpRxTGDlVzmf0tTbnx3lDjqdIkaCMa2Q950ujGqxNj2$hDz_p #=zDIzP1xR6d9Y5PfUKcA==, Boolean #=z6pE$l4g=)
   at Aspose.Pdf.Text.TextAbsorber.Visit(Page page)
   at Aspose.Pdf.Devices.TextDevice.Process(Page page, Stream output)
   at *.Converter.ProcessPdfDocument(String pdfFilePath, AsposeArguments arguments) in */Converter.cs:line x
   at *.Converter.Process(AsposeArguments arguments) in */Converter.cs:line x
   at *.Program.Main(String[] args) in */Program.cs:line x

As we could only reproduce this on our docker images, here are the details of the environment:
Base image: microsoft/dotnet:2.2.2-runtime-alpine3.8
Apk packages:
* libgdiplus-5.6.1-r0 x86_64
* msttcorefonts-installer-3.6-r2 x86_64
* fontconfig-2.12.6-r1 x86_64

We’re currently running the Aspose.PDF 19.2 nuget package, but in our tests we also reproduced it using Aspose.PDF 19.4

example pdf: Practico - Basic Biblical Hebrew.pdf (4.8 MB)

Kind regards,
Koen


#2

@kobellem

Thanks for contacting support.

We have logged an investigation ticket as PDFNET-46398 in our issue tracking system. We will further look into details of the issue and keep you posted with the status of its rectification. Please be patient and spare us little time.

We are sorry for the inconvenience.