LayoutCollector.GetStartPageIndex gives wrong pagenumber when processed from Linux container deployed version if footnotes are present in document

Hi,

The below code gives the correct page number when it is tested in our local (i.e., Windows machine) but the same code gives wrong page number for some search text present in Word document when the same code is deployed to Linux container (microservices deployed in cert environment). I could notice that the wrong page number is when footnotes or endnotes are present in the document.

Code - string pagenumber = _collector.GetStartPageIndex(currentNode).ToString();

Could you please let me know if we have any workaround to fix this issue OR need to use different Aspose method/api to find page number of search text which works fine in all scenarios.

Regards,
Chetan

@KCSR The problem might occur because fonts used in your document are not available in the environment where you process the document.
As you may know, MS Word documents are flow documents and do not contain any information about document layout. The consumer applications, like MS Word or Open Office builds document layout on the fly. Aspose.Words uses it’s own layout engine to build document layout while rendering the document to fixed page formats (PDF, XPS, Image etc.). The same layout engine is used for providing document layout information via LayoutCollector and LayoutEnumerator classes.
To built proper document layout the fonts used in the original document are required. If Aspose.Words cannot find the fonts used in the document the fonts are substituted . This might lead into the layout difference (incorrect page number returned by LayoutCollector), since substitution fonts might have different font metrics. You can implement IWarningCallback to get a notification when font substitution is performed.

Thanks @alexey.noskov for reply.

Fonts used in the document are “Arial” and “Times New Roman” which are most common ones. I think these fonts will also be available in environment in which we process the document.

After looking at your comment I made some R&D on the test document and below are my findings -

  1. I Checked the font of the test document - It was of mixed fonts, most of the text in document are in “Arial” and in footnotes some part (i.e., footnote number) is in Times New Roman and remaining footnotes part are in Arial this might be because when the footnotes were added by someone they would have copied the text might from the same document and pasted in footnote section so it would have pasted along with the same format. - When I process this document, I can see wrong page numbers.

  2. I removed all the footnotes from the test document, then I processed the document and found that the page numbers were coming up correctly. This shows that there is some issue with how footnotes are added.

  3. Now manually I tried to add footnotes in all pages in test document (which had 4 pages), When I click on “References” tab in Word document and “Insert Footnote”, it inserted footnote number which is of font “Times New Roman” I think this is default behavior of Word and then I manually typed the footnote text so that entire footnote will be in same font which is “Times New Roman”. I saved this document, processed it and found that page numbers were showing up correctly.

I think from above observation it looks like the issue is with how the footnotes are added to document. Please let me know your opinion.

Regards,
Chetan

@KCSR Thank you for additional information. Though “Arial” and “Times New Roman” are most common fonts, they are not available in Linux by default. Have you tried implementing IWarningCallback to check whether fonts are substituted or not while document processing? I am sure, if the document is rendered differently in different environments, the problem is in fonts, not in the document structure.

Thanks Alexey. Probably I did not understand the substitution call back correctly. Could you please let me know how and where the substitution notification is recorded? I have the below code,

ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
{
    Node currentNode = e.MatchNode;
    #region Font Substitution Callback
    Aspose.Words.Font userDocFont = ((Aspose.Words.Inline)currentNode).Font;
    //FontInfoCollection fontInfo = builder.Document.FontInfos;
    LogDiagnosticMessage("User Document Font - " + userDocFont);


    FontSubstitutionWarningCollector callback = new FontSubstitutionWarningCollector();
    currentNode.Document.WarningCallback = callback;

    // Store the current collection of font sources, which will be the default font source for every document
    // for which we do not specify a different font source.
    FontSourceBase[] originalFontSources = FontSettings.DefaultInstance.GetFontsSources();

    // For testing purposes, we will set Aspose.Words to look for fonts only in a folder that does not exist.
    FontSettings.DefaultInstance.SetFontsFolder("", false);

    // When rendering the document, there will be no place to find the "Times New Roman" font.
    // This will cause a font substitution warning, which our callback will detect.
    //userDocument.Save("C:\\" + "SubstitutionWarning.pdf");

    FontSettings.DefaultInstance.SetFontsSources(originalFontSources);

    #endregion Calback

    //Find Page number of match 
    string pagenumber = _collector.GetStartPageIndex(currentNode).ToString();
    _pagenumberList = pagenumber;
    return ReplaceAction.Stop;
}

You had mentioned in previous reply that - The same layout engine is used for providing document layout information via LayoutCollector and LayoutEnumerator classes.
To built proper document layout the fonts used in the original document are required.

Current node’s font can be different than original document(i think this font will be last cursor position font) font right?

How do we set font for LayoutCollector since it is forming the flowdocument right?

I think i am missing something, could you please help me out?

Also above exercise is to prove/check if the issue is caused because of font is unavailable in linux environment. Could you please let me know if there is a way to install the fonts used in test document to linux environment where we are processing the document? Or any other way to solve this issue?

@KCSR You can use IWarningCallback like shown in the following code:

Document doc = new Document("in.docx");
// Set warning callback
doc.WarningCallback = new WarningCallback();

// Execute your code here
// ....
private class WarningCallback : IWarningCallback
{
    public void Warning(WarningInfo info)
    {
        if (info.WarningType == WarningType.FontSubstitution)
            Console.WriteLine(info.Description);
    }
}

Such implementation will print font substitution information into condole, you can change it to write the result into file.

In the document content can be formatted with different fonts.

There is no way to set font for LayoutCollector. Layout engine build document layout using the information from the document.

Please see the following articles to learn how to install fonts in Linux or specify location of fonts:
https://docs.aspose.com/words/net/installing-truetype-fonts-on-linux/
https://docs.aspose.com/words/net/specifying-truetype-fonts-location/

Thanks Alexey, I added the IWarningCallback and I could see the font substitution warnings. Fonts such as Arial, Times New Roman, Calibri are getting substituted to Fanwood font.

I then installed the fonts using the below command in docker file -
FROM alpine:latest
RUN apk --no-cache add msttcorefonts-installer fontconfig &&
update-ms-fonts &&
fc-cache -f

Looks like these fonts (Screenshot below) got installed, I did not see any errors related to above command while deploying. Is there any way to confirm if they are really installed?

image.png (73.8 KB)

When I test my application after installing fonts, I still see same warnings for font substitution.

Regards,
Chetan

@KCSR Fonts can be installed in different folders. Please see our documentation to learn where Aspose.Words looks for fonts in Linux:
https://docs.aspose.com/words/net/specifying-truetype-fonts-location/#where-asposewords-looks-for-truetype-fonts-on-linux

You can also put the required fonts into any accessible folder and use this folder as font source:
https://docs.aspose.com/words/net/specifying-truetype-fonts-location/#loading-fonts-from-folder

1 Like

How many hearts you want? You are amazing!! Finally, I got this working!!

I did copy all the available font in my windows machine to a folder in my application and this folder will be deployed on to linux container. Then I used the below code to do FontSettings and this solved my issue. :slight_smile:

FontSettings FontSettings = new FontSettings();

// Note that this setting will override any default font sources that are being searched by default. Now only these folders will be searched for
// Fonts when rendering or embedding fonts. To add an extra font source while keeping system font sources then use both FontSettings.GetFontSources and
// FontSettings.SetFontSources instead.
FontSettings.SetFontsFolder(@"C:\MyFonts\", false);

// Set font settings
doc.FontSettings = FontSettings;

Regards,
Chetan

@KCSR It is perfect that you managed to resolve the problem. On my side for testing, I just mount windows font folder to docker image and use it as fonts source:

docker run --mount type=bind,source=C:\Temp,target=/temp --mount type=bind,source=C:\Windows\Fonts,target=/winfonts --rm awtest from Docker
doc.FontSettings= new FontSettings();
doc.FontSettings.SetFontsSources(new FontSourceBase[] { new SystemFontSource(), new FolderFontSource(@"/winfonts", true) });

This might be useful for testing, to avoid deploy large amount of fonts to Docker image.

Hi Alexey,
When I tried to run this command in my Docker file, i get this error - failed to solve with frontend dockerfile.v0: failed to create LLB definition: dockerfile parse error line 13: unknown instruction: DOCKER

Regards,
Chetan

@KCSR Please, note, I run this command on windows after building test image using the following command:

docker build -t awtest .

Hi @alexey.noskov - hope you remember this discussion. With this approach we will still be embedding/copying the windows fonts to linux container right?

@KCSR No, with the approach described above the fonts are not copied into the Docker container. The folder with fonts is mounted to the container, so it has access to it, but physically the fonts are still located on the host machine.

@alexey.noskov - Unfortunately for some security reason I won’t be able to use “C:\Windows\Fonts” …
Instead, I came up with one more approach to install MS fonts that is provided by open source.
MS Fonts(Alpine Linux packages) -

Could you please let me know if i can install these fonts on Linux container through docker file and set the FontSettings to use the installed Font folder?

I was trying using this command in docker file but this doesn’t seem to be working -

FROM alpine:latest
WORKDIR /app
RUN mkdir -p /usr/share/fonts/truetype/
RUN apk --no-cache add msttcorefonts-installer fontconfig && update-ms-fonts && fc-cache -f /usr/share/fonts/truetype/

Any help is much appreciated. Thanks

@KCSR I am afraid, msttcorefonts-installer package does not include all MS fonts, it includes only basic fonts. Unfortunately, there is no a single package that contain all MS fonts. You can implement IWarningCallback to check what fonts are missed upon rendering the document.

@alexey.noskov - Can I print all the fonts that are available in FontsSources folder? To check if the fonts are getting installed or not in that folder on Linux container?

 // Set warning callback
 userDocument.WarningCallback = new WarningCallback();

 FontSettings FontSettings = new FontSettings();

 // Note that this setting will override any default font sources that are being searched by default. Now only these folders will be searched for
 // Fonts when rendering or embedding fonts. To add an extra font source while keeping system font sources then use both FontSettings.GetFontSources and
 // FontSettings.SetFontSources instead.
 LogDiagnosticMessage("Setting Font Source - Start");
 FontSettings.SetFontsSources(new FontSourceBase[] { new SystemFontSource(), new FolderFontSource(@"**/usr/share/fonts/truetype/msttcorefonts**", true) });
 LogDiagnosticMessage("Setting Font Source - Complete");

 // Set font settings
 userDocument.FontSettings = FontSettings;

When i try to install the Alpine Package to my local container, i see below Fonts are getting installed -

But when the changes are deployed to my remote Linux Container - I see Font Substitution happening even after specifying the FontSource folder.

@KCSR First of all you can specify warning callback in font source using FontSourceBase.WarningCallback property to check whether there are issues with loading fonts from the specified font source.

You can use the following code to get the list of fonts available in the specified font sources:

/// <summary>
/// Prints the fonts avaialble in the specified font settings.
/// </summary>
public static void PrintAvaialbleFonts(FontSettings fs)
{
    foreach (FontSourceBase fsb in fs.GetFontsSources())
    {
        Console.WriteLine(fsb.Type);
        foreach (PhysicalFontInfo pfi in fsb.GetAvailableFonts())
        {
            Console.WriteLine(pfi.FullFontName);
        }
        Console.WriteLine("================================================");
    }
}

Thanks @alexey.noskov , I added the code to print available fonts and it looks like there are no fonts inside that folder.
Below are logs -

This is my code - Could you please let me know if there is something wrong with my FontSource Setting the folder Or Do you think the fonts are not getting installer in that folder?

// Set warning callback
userDocument.WarningCallback = new WarningCallback();

FontSettings FontSettings = new FontSettings();

// Note that this setting will override any default font sources that are being searched by default. Now only these folders will be searched for
// Fonts when rendering or embedding fonts. To add an extra font source while keeping system font sources then use both FontSettings.GetFontSources and
// FontSettings.SetFontSources instead.
LogDiagnosticMessage("Setting Font Source - Start");
FontSettings.SetFontsSources(new FontSourceBase[] { new SystemFontSource(), new FolderFontSource(@"/usr/share/fonts/truetype/msttcorefonts", true) });
LogDiagnosticMessage("Setting Font Source - Complete");

//Print available fonts
PrintAvaialbleFonts(FontSettings);

// Set font settings
userDocument.FontSettings = FontSettings;

// Set font source warning callback
FontSourceBase source = FontSettings.GetFontsSources()[0];
FontSourceWarningCollector callback = new FontSourceWarningCollector();
source.WarningCallback = callback;

public static void PrintAvaialbleFonts(FontSettings fs)
{
    LogDiagnosticMessage("PrintAvaialbleFonts Started");

    foreach (FontSourceBase fsb in fs.GetFontsSources())
    {
        LogDiagnosticMessage("Font Source Base Type : " + fsb.Type);
        foreach (PhysicalFontInfo pfi in fsb.GetAvailableFonts())
        {
            LogDiagnosticMessage("Font Full Name : " + pfi.FullFontName);
        }
    }

    LogDiagnosticMessage("PrintAvaialbleFonts Completed");
}

@KCSR Please try specifying warning callback in font source using FontSourceBase.WarningCallback property to check whether there are issues with loading fonts from the specified font source. Set callback before setting the source source in font settings. Most likely Aspose.Words does not have access to the specified folder and cannot read fonts from there.