Aspose Word Giving Incorrect Doc Page Count

Hi, I have been using an Ubuntu 20-04 image for Aspose, and when processing .doc files I am consistently getting a smaller page count when using streams then the document actually has. When I switch to use my Debian 10 image I get the right number, but I had to use some workarounds to get past

Process terminated. Couldn't find a valid ICU package installed on the system.

What is the recommended image for using Aspose products?

@acn Most likely the problem on your side might occur because the fonts used in your input document are not available on the machine where document is processed. The fonts are required to build document layout and calculate page count. If Aspose.Words cannot find the font used in the document, the font is substituted. This might lead into fonts mismatch and document layout differences due to the different fonts metrics and as a result incorrect page count. You can implement IWarningCallback to get notifications when font substitution is performed.
Please see our documentation to learn where Aspose.Words looks for fonts:
https://docs.aspose.com/words/net/specifying-truetype-fonts-location/

You should install ICU package to avoid this:

# Install ICU package.
RUN yum -y install icu

There are no such recommendations. But anyways, the fonts are required to properly build document layout and return correct page count.

How can I make sure I install the fonts?

@acn In .NET, Java and C++ versions of Aspose.Words you can use IWarningCallback to get notification about font substitution. Unfortunately, currently there is no way to use the callback in Python.

You can use the following Python code to get list of fonts available in the specified font sources:

# Default font settings
font_settings = aw.fonts.FontSettings.default_instance

for fsb in font_settings.get_fonts_sources() :
    print(fsb.type)
    for pfi in fsb.get_available_fonts() :
        print(pfi.full_font_name)
    print("================================================")

To clarify, I have a Debian cloud workspace that runs the document I’m trying to process just fine and gives the correct page count. I have another cloud workspace on the same platform with Ubuntu that gives the incorrect page count. If one machine has fonts that the other does not, how can I add the missing fonts to the Ubuntu machine?

To provide some further information, this happens only in doc files, not docx, and the fonts that are in one but not the other are Droid Sans Fallback and Noto Mono

@acn You can put the required fonts into any accessible folder and use this folder as font source. Please see our documentation to learn where Aspose.Words looks for fonts and how to specify fonts location:
https://docs.aspose.com/words/python-net/specifying-truetype-fonts-location/

Thank you so much for your help. Another question. I ran the font list to see what was available on my machine, and sure enough there was an inconsistently between local and platform. Fixed that and page count was correct.

I then wanted to ensure I had enough font coverage for other potential docs my program may receive. When looking, I realized one of the docs I was smoke testing with uses a font that is not listed on the printed list from the code you gave. However, it converts just fine, and retains the same font. Why is this?

@acn Probably, the font is embedded into the source document. In this case Aspose.Words uses this font for document rendering.

How do I know when that’s the case and when it’s not?

@acn You can use the following code to check whether document has embedded fonts and which of the fonts are embedded:

doc = aw.Document("C:\\Temp\\in.docx")

print(doc.font_infos.embed_true_type_fonts)

fontStyles = [aw.fonts.EmbeddedFontStyle.REGULAR,aw.fonts.EmbeddedFontStyle.BOLD,aw.fonts.EmbeddedFontStyle.ITALIC,aw.fonts.EmbeddedFontStyle.BOLD_ITALIC];

for fi in doc.font_infos :
    for style in fontStyles :
        if fi.get_embedded_font(aw.fonts.EmbeddedFontFormat.OPEN_TYPE, style) != None :
            print("Font '" + str(fi.name) + "' " + str(style) + " is embbeded.")