Hi,
Extracting Hebrew text fails on alpine 3.8 due to a negative index error. The same code works on windows so it might be due to some missing dependencies. But we were unable to identify these dependencies.
Code snippet:
Aspose.Pdf.Document pdf = new Aspose.Pdf.Document(pdfFilePath);
var txtFile = Path.Combine(arguments.TargetDirectory, string.Format("{0}.txt", Guid.NewGuid()));
var device = new Aspose.Pdf.Devices.TextDevice(Encoding.UTF8);
using (FileStream fs = new FileStream(txtFile, FileMode.OpenOrCreate))
{
foreach (Aspose.Pdf.Page page in pdf.Pages)
{
device.Process(page, fs);
}
fs.Close();
}
Stactrace:
2019-05-14 17:42:09.1446|ERROR|*.UnhandledExceptionLogger|System.ArgumentOutOfRangeException: StartIndex cannot be less than zero.
Parameter name: startIndex
at System.String.Substring(Int32 startIndex, Int32 length)
at #=zgp9CpQXG767U8n66dAdbA$A3NFtKnzk3ZZI1Hpp5jq2Cr9BFQg==.#=zgIBZTcw9IMv_(Stream #=zMCASuu0=, Encoding #=zryCDvQg=)
at #=zAVT_PM5qsUpKqNbRjkZoSvwlXO1CkB79slQ0LXg$CMde.#=zgIBZTcw9IMv_()
at #=z7GS$1utcjEvFgZ_8ESrPzsPObFTuC0RfM1KwPaGM53EXmO2lZ5vsU1g=..ctor(List`1 #=zcK4kXajD7omD, Rectangle #=zJBkfw3Y=, TextExtractionOptions #=z7whGLa0=)
at #=zIiLJFlmpyWhFpRxTGDlVzmf0tTbnx3lDjqdIkaCMa2Q950ujGqxNj2$hDz_p.#=zDL8SjUoeELn0(TextExtractionOptions #=z7whGLa0=)
at Aspose.Pdf.Text.TextAbsorber.#=zBgENoDqn6YhZ(#=zIiLJFlmpyWhFpRxTGDlVzmf0tTbnx3lDjqdIkaCMa2Q950ujGqxNj2$hDz_p #=zDIzP1xR6d9Y5PfUKcA==, Boolean #=z6pE$l4g=)
at Aspose.Pdf.Text.TextAbsorber.#=zKE5lKIBPsTzy(#=zIiLJFlmpyWhFpRxTGDlVzmf0tTbnx3lDjqdIkaCMa2Q950ujGqxNj2$hDz_p #=zDIzP1xR6d9Y5PfUKcA==, Boolean #=z6pE$l4g=)
at Aspose.Pdf.Text.TextAbsorber.Visit(Page page)
at Aspose.Pdf.Devices.TextDevice.Process(Page page, Stream output)
at *.Converter.ProcessPdfDocument(String pdfFilePath, AsposeArguments arguments) in */Converter.cs:line x
at *.Converter.Process(AsposeArguments arguments) in */Converter.cs:line x
at *.Program.Main(String[] args) in */Program.cs:line x
As we could only reproduce this on our docker images, here are the details of the environment:
Base image: microsoft/dotnet:2.2.2-runtime-alpine3.8
Apk packages:
* libgdiplus-5.6.1-r0 x86_64
* msttcorefonts-installer-3.6-r2 x86_64
* fontconfig-2.12.6-r1 x86_64
We’re currently running the Aspose.PDF 19.2 nuget package, but in our tests we also reproduced it using Aspose.PDF 19.4
example pdf: Practico - Basic Biblical Hebrew.pdf (4.8 MB)
Kind regards,
Koen