Word text extraction in a Docker doesn't work

I use Aspose version 21.3 to extract Text on word, in a windows, centos, ubuntu environment, aspose find 4 pages , in docker environment aspose find 5 pages
the word file : JS3-T000-PRO-CAL-MUS-000-99999-00.zip (44.4 KB)
Below is a single-line command that anyone with Docker installed can run (at least on Linux: I haven’t tried on Windows):

docker run --rm -v aspose-test.zip:/tmp/aspose-test.zip openjdk:11 bash -c “mkdir /tmp/aspose-test && cd /tmp/aspose-test && unzip …/aspose-test.zip && java -Xms1G -classpath “*” com.idox.test.aspose.numberPage JS3-T000-PRO-CAL-MUS-000-99999-00.docx”

The aspose-test.zip file include only java code implementation, and the word file.
.
aspose-test.zip (48.0 KB)

The full dependencies is too big, so you have add aspose word depencies in the same folder of aspose-idox.jar

see folder image
image.png (55.1 KB)

Just for your information, here’s an explanation of the command above:

  • docker run: Start up and run a new Docker container
  • –rm: Remove this Docker container when it’s finished running
  • -v aspose-test.zip:/tmp/aspose-test.zip: Mount the zip file into the Docker container, i.e. make it available inside the Docker container. The value to the left of the colon is the path on the host, so that might need to change when running this command, depending on where they put the zip file. The value to the right of the colon is the path in the Docker container where this zip file will appear.
  • openjdk:11: The Docker image (from Docker Hub) to base this container on. This is the standard Open JDK 11 image. I’ve tried with Java 8 as well and got the same results.
  • bash -c: Tell the Docker container that what it should do when it starts is start a bash shell and run the command that follows this. This command is as follows:
    • mkdir /tmp/aspose-test && cd /tmp/aspose-test && unzip …/aspose-test.zip: Create a temporary folder and extract the mounted zip file into it.
    • java -Xms1G -classpath “*” com.idox.test.aspose.numberPage JS3-T000-PRO-CAL-MUS-000-99999-00.docx: This is the actual Java executable line that run the Aspose test.

@fabien.levalois The difference in page count most likely occurs because in Docker Aspose.Words cannot find the required fonts. You can mount folder with fonts and specify it in font settings. See the following article for more information:

For example, you can use option like this to mount folder with font when run docker.

--mount type=bind,source=C:\Windows\Fonts,target=/root/.fonts

Thanks for your reponse, Very helpful i have the good number of pages
One question more, is it possible to specify a font directory used by Aspose process

Thanks

@fabien.levalois,

Please copy the required font files into a separate folder and try running the following code of latest 21.3 version of Aspose.Words for Java:

Document doc = new Document("word.docx");

FontSettings fontSettings = new FontSettings();
addFontFolder(fontSettings, myDir + "CustomFonts/");
doc.setFontSettings(fontSettings);

doc.save("output.pdf");

private static void addFontFolder(FontSettings fontSettings, String folder)
{
    FontSourceBase[] fontSourceBases = fontSettings.getFontsSources();
    FontSourceBase[] newFontSourceBases = new FontSourceBase[fontSourceBases.length + 1];
    System.arraycopy(fontSourceBases, 0, newFontSourceBases, 0, fontSourceBases.length);
    newFontSourceBases[newFontSourceBases.length - 1] = new FolderFontSource(folder, true);
    fontSettings.setFontsSources(newFontSourceBases);
}

Please also check the following article: