Aspose SplitToPages create corrupted PDFs on Linux

I’m using Aspose to split a PDF into pages (one PDF per page). My method is a copy of your example code located at Split PDF pages|Aspose.PDF for .NET (C#)

// Create PdfFileEditor object
var pdfEditor = new PdfFileEditor();
var fileNumber = 1;
// Split to pages
var outBuffer = pdfEditor.SplitToPages(fileInfo.FullName);
// Save individual page files
foreach (var aStream in outBuffer)
{
    var outStream = new FileStream(Path.Combine(fileInfo.Directory.FullName, "File_" + fileNumber.ToString() + "_out.pdf"), FileMode.Create);
    aStream.WriteTo(outStream);
    outStream.Close();
    fileNumber++;
}
return fileNumber;

When run under Windows (on my developer machine) it creates PDF files without any problems. But when I run it in Docker container (Debian based), the created files are corrupted. I’m attaching one such example (both corrupt and non-corrupt file). When I compare the binaries of the two files, I see that the negative sign characters are encoded differently in Windows and Linux. This seems to happen only when the character is in front of integers (and not when it is in a text string), which sounds like a “CultureInfo” problem (here is a discussion outlining this known issue: long.TryParse return false for negative numbers in ubuntu · Issue #24478 · dotnet/runtime · GitHub).

I wonder if you are able to reproduce this issue on Linux (or have any ideas as to how this can be fixed/worked around)?

Note1: When the corrupt file is opened with Adobe Reader we get an error saying “There was a problem reading this document (110).” Other PDF readers behave differently (some does not report any error at all).

Note2: I get the same results when I try the following code snippet:

var pdfEditor = new PdfFileEditor();
var fileNameTemplate = Path.Combine(fileInfo.Directory.FullName, $"File_%NUM%_out.pdf");
pdfEditor.SplitToPages(fileInfo.FullName, fileNameTemplate);

Attachments:
adobe-reader-error.png (4.6 KB)
binary-diff.png (14.4 KB)
File_1_out-corrupt.pdf (472.1 KB)
File_1_out-not-corrupt.pdf (472.1 KB)
LoremIpsum10Page-source-file.pdf (512.5 KB)

@eic

Please make sure that the fonts used in your PDF file are installed on the machine where you are splitting PDF. Please check the Dockerfile from here:

How to run Aspose.PDF in Docker

In case you still face issue, please try the latest version of Aspose.PDF for .NET 22.5. Hope this helps you.

I’m already using the latest version of the library and I believe the fonts are already installed. Are you sure this is related to fonts?

Changing the current culture to Invariant seem to help with this problem, that’s why I’m not sure this is related to fonts.

Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;|

@eic

We have tested the scenario using the latest version of Aspose.PDF for .NET 22.5 and have not found the shared issue. Please check the attached images for detail.
image.png (100.0 KB)
image.png (255.7 KB)

@tahir.manzoor Thank you for your time.

Have you tested this under Linux (it’s not reproducible on Windows)? If so, do you know which locale is set at the OS level?

I have tested that if I change the CurrentCulture to InvariantCulture, the problem disappears:

    var originalCulture = Thread.CurrentThread.CurrentCulture;
    Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;

    // Do Aspose stuff

    Thread.CurrentThread.CurrentCulture = originalCulture;

So, may I suggest a change in your test? Could you change the CurrentCulture to nb-NO before doing the split operation and test again? Something like…

Thread.CurrentThread.CurrentCulture = new CultureInfo("nb-NO");

I basically have a workaround that is good enough for my case but I suspect this may be a bug in the Aspose library and if that’s the case you may want to address it…

@eic

We have tested the scenario using above line of code and have not faced any issue.

Could you please create a simple Visual Studio project with Docker implementation and share it here for further testing? We will investigate the issue and provide you more information on it.

@tahir.manzoor I’m not sure I will find time for that any time soon. But, if you want me to run some commands on the existing Docker container (to gather more info) please let me know. locale returns the following output:

locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=nb_NO.UTF-8
LANGUAGE=nb:no
LC_CTYPE="nb_NO.UTF-8"
LC_NUMERIC="nb_NO.UTF-8"
LC_TIME="nb_NO.UTF-8"
LC_COLLATE="nb_NO.UTF-8"
LC_MONETARY="nb_NO.UTF-8"
LC_MESSAGES="nb_NO.UTF-8"
LC_PAPER="nb_NO.UTF-8"
LC_NAME="nb_NO.UTF-8"
LC_ADDRESS="nb_NO.UTF-8"
LC_TELEPHONE="nb_NO.UTF-8"
LC_MEASUREMENT="nb_NO.UTF-8"
LC_IDENTIFICATION="nb_NO.UTF-8"
LC_ALL=nb_NO.UTF-8

And we have the following in our Dockerfile:

RUN locale-gen nb_NO.UTF-8
ENV LANG=nb_NO.UTF-8 \
    LANGUAGE=nb:no \
    LC_ALL=nb_NO.UTF-8

@eic

Please check the attached docker application that we used to reproduce the issue. If your application is different, please modify it and share it back for further testing.
DockerAspose.zip (2.7 KB)

Thanks for your cooperation.

I’m not a Docker expert by any stretch of imagination, but your Docker image does not seem to be Linux based. The problem is only reproducible under Linux (in my case, I use a Debian base image) with a specific locale.

@eic

The PDF file is generated by old version of Aspose.PDF. Please use the latest version of Aspose.PDF for .NET 22.5, generate the output PDF file, and share it here for further investigation.