Some Characters are Rendered as Boxes | DOCX to PDF Conversion using .NET

I have created word document by copying text from different sources. When converting from Word to PDF, I found that it’s failing for certain characters, thouhgh for few characters it’s working fine. Can you please suggest possible solution for this issue.

image.png (2.1 KB)

@saurabhmauryabu

Please note that Aspose.Words requires TrueType fonts when rendering document to fixed-page formats (JPEG, PNG, PDF or XPS). You need to install fonts that are used in your document on the machine where you are converting documents to PDF. Please refer to the following articles:

Using TrueType Fonts
Manipulating and Substitution TrueType Fonts/)

If you still face problem, please ZIP and attach your input Word document along with problematic and expected output PDF here for testing. We will investigate the issue and provide you more information on it.

thanks @tahir.manzoor . I have tried to use true type fonts by download fonts used in word file and placed them in directory. Further I have written code as shared below to convert word into PDF. However its distorting style of document, can you please suggest why so. I am also sending sample word and pdf file for reference.Sample_Doc.docx (18.1 KB)
Output.pdf (66.4 KB)

image.png (40.5 KB)

public static void ConvertToPDFUsingAspose(string inputFile, string outputPath)
{
    // The path to the documents directory.
    Document originalDoc = new Document(inputFile);

    // Provide PDFSaveOption compliance to PDF17
    // or just convert without SaveOptions
    PdfSaveOptions pso = new PdfSaveOptions();
    pso.Compliance = PdfCompliance.Pdf17;
    //font changes

    FontSettings fontSettings = new FontSettings();

    fontSettings.SetFontsFolder(@"D:\Temp\Fonts\", false);
    originalDoc.FontSettings = fontSettings;

    //font changes end
    originalDoc.Save(outputPath + "\\Output.pdf", pso);
}

@saurabhmauryabu

You are facing this issue due to missing of fonts. Please put your custom fonts in D:\Temp\Fonts\ folder and use following code example to avoid the shared issue.

Following code example implements IWarningCallback interface. This will notify you about missing fonts while rendering document to PDF.

Document document = new Document("input.docx");
document.LayoutOptions.TextShaperFactory = HarfBuzzTextShaperFactory.Instance;

FontSourceBase[] originalFontSources = FontSettings.DefaultInstance.GetFontsSources();
// Create a font source from a folder that contains fonts.
FolderFontSource folderFontSource = new FolderFontSource(@"D:\Temp\Fonts\", true);

// Apply a new array of font sources that contains the original font sources, as well as our custom fonts.
FontSourceBase[] updatedFontSources = { originalFontSources[0], folderFontSource };
FontSettings.DefaultInstance.SetFontsSources(updatedFontSources);

// Verify that Aspose.Words has access to all required fonts before we render the document to PDF.
updatedFontSources = FontSettings.DefaultInstance.GetFontsSources();
document.WarningCallback = new HandleDocumentWarnings();
document.Save("output.pdf");
public class HandleDocumentWarnings : IWarningCallback
{
    /// <summary>
    /// Our callback only needs to implement the "Warning" method. This method is called whenever there is a
    /// potential issue during document procssing. The callback can be set to listen for warnings generated during document
    /// load and/or document save.
    /// </summary>
    public void Warning(WarningInfo info)
    {
        // We are only interested in fonts being substituted.
        if (info.WarningType == WarningType.FontSubstitution)
        {
            Console.WriteLine(info.WarningType + " :: " + info.Description.ToString());
        }
    }
}

thanks Tahir for prompt revert. Code example shared by you is really helpful. I had to remove ‘document.LayoutOptions.TextShaperFactory = HarfBuzzTextShaperFactory.Instance;’ as HarfBuzzTextShaperFactory was coming unidentified. After removing same, I ran program and got missing fonts. Two queries-

  • In the shared screenshot it’s pointing out fonts that I can see in MS word. Does Aspose consider fonts that are installed in System font directory only by default?
  • Is there a way to get document text along with warning to let user find that text in document and make font changes easily.

image.png (5.3 KB)

@saurabhmauryabu

You can install Aspose.Words.Shaping.HarfBuzz 21.8 to avoid this issue.

The fonts are not installed on your system. You can check the installed fonts on the system using the code example shared in the following article.
Getting a List of Available Fonts

You can get the missing fonts notifications as suggested in my previous post. However, if you want to get the text of document for missing fonts, we suggest you following solution.

  1. Get the list of missing fonts by implementing IWarningCallback interface.
  2. Iterate over Run nodes of document and check either font of Run node is in the list or not.
  3. You can get the font of Run node using Run.Font.Name property.

Hello Tahir, thank you so much for your response. I am able to get installed fonts by using ‘Getting a List of Available Fonts’ .

I have used ‘Run’ while creating word document like below

Run run = new Run(doc, "hello");
run.Font.Bold = false;

However this is a different case where I am not creating a word document rather different users create word document using ‘Microsoft Word’, put text from different sources and send application to convert it to PDF. Issue I am getting is that certain text in document is not getting converted due to Font issue. As per your earlier suggestion, I am able to get missing font families using IWarningCallback however not sure how to follow below steps-

Can you share if there is any sample code for reference. I just want to text of document as well whose font cause issue so that they can straight find that text in document and make correction.

Thanks again for your help.

@saurabhmauryabu

Could you please share why you need the text of document for which fonts are missing? You can simply implement IWarningCallback interface and you will get the warning notifications as shown below.

FontSubstitution :: Font ‘Helvetica’ has not been found. Using ‘Arial’ font instead. Reason: table substitution.
FontSubstitution :: Font ‘Iskoola Pota’ has not been found. Using ‘Nirmala UI’ font instead. Reason: font info substitution.
FontSubstitution :: Font ‘Mangal’ has not been found. Using ‘Courier New’ font instead. Reason: font info substitution.
FontSubstitution :: Font ‘Arial Unicode MS’ has not been found. Using ‘Yu Gothic’ font instead. Reason: alternative name from document.

If you still want to get the missing font text, please let us know. We will then write the code example and share it with you.

Hello Tahir,

Let me share the requirement for better clarify.

Basically we are collecting data from different sources and create word document manually. Content in the document could be in different languages. When document is finished, we upload it to application to let it convert into PDF format.

With you help by using IWarningCallback, I am able to get fonts that are not supported in word document. However since document may run into 10-15 pages, it will be tough for users to find which text’s font style is causing problem. Thus if we can get text along with font style that is causing problem then it will be really helpful for end user to search for highlighted text in document and correct it and upload document again to system.

@saurabhmauryabu

We are writing the code example for your case and will get back to you soon.

@saurabhmauryabu

Please use the following code example to get the desired output. Hope this helps you.

var document = new Document(MyDir + "Sample_Doc.docx");
GetDocumentWarnings missingfonts = new GetDocumentWarnings();
document.WarningCallback = missingfonts;
document.Save(new MemoryStream(), SaveFormat.Pdf);

foreach (Run run in document.GetChildNodes(NodeType.Run, true))
{
    if (missingfonts.warnings.Contains(run.Font.Name))
        Console.WriteLine(run.Font.Name + "- " + run.Text);
}
public class GetDocumentWarnings : IWarningCallback
{
    public List<String> warnings;
    public GetDocumentWarnings()
    {
        warnings = new List<String>();
    }

    public void Warning(WarningInfo info)
    {
        // We are only interested in fonts being substituted.
        if (info.WarningType == WarningType.FontSubstitution)
        {
            warnings.Add(info.Description.Split(new Char[] { '\'' })[1]);
        }
    }
}

Thanks Tahir. Code shared by you has helped and we are able to extract text where there is a mismatch of font. Please find attached screenshot, characters that has font issue is appearing as ‘?’ character. Can you please suggest what could be possible issue behind it. Also is it possible to get line number from file where font is creating issue so that user can go to that specific line to fix font issue.
image.jpg (83.3 KB)

@saurabhmauryabu

The output issue is not related to Aspose.Words rather it is related to Console.WriteLine. You may use Console.OutputEncoding property to avoid this. You can also save the output text to .txt file as shown below. Hope this helps you.

StringBuilder sb = new StringBuilder();

var document = new Document(MyDir + "Sample_Doc.docx");
GetDocumentWarnings missingfonts = new GetDocumentWarnings();
document.WarningCallback = missingfonts;
document.Save(new MemoryStream(), SaveFormat.Pdf);

foreach (Run run in document.GetChildNodes(NodeType.Run, true))
{
    if (missingfonts.warnings.Contains(run.Font.Name))
        sb.Append(run.Font.Name + "- " + run.Text).Append(Environment.NewLine);
}

File.WriteAllText(MyDir + "missing fonts.txt", sb.ToString());

You can use LayoutCollector.GetStartPageIndex method to get the page number of text.

var document = new Document(MyDir + "Sample_Doc.docx");
GetDocumentWarnings missingfonts = new GetDocumentWarnings();
document.WarningCallback = missingfonts;
document.Save(new MemoryStream(), SaveFormat.Pdf);
LayoutCollector collector = new LayoutCollector(document);
foreach (Run run in document.GetChildNodes(NodeType.Run, true))
{
    if (missingfonts.warnings.Contains(run.Font.Name))
        sb.Append("Page number : " + collector.GetStartPageIndex(run) + " - "+ run.Font.Name + "- " + run.Text).Append(Environment.NewLine);
}

Unfortunately, there is no direct API to get the line number of text. However, you can find the page number and line number of text using DocumentLayoutHelper class. You can build your logic to achieve your requirement using this class.

https://github.com/aspose-words/Aspose.Words-for-.NET

Thank you so much Tahir. Solution offered by you has worked and we are able to get desired result.

Hello @tahir.manzoor,

Need your help again on this issue. We have been using below code snippet and getting infinite loop error (as attached
image.png (9.2 KB)
) at statement ‘collector.GetStartPageIndex(run)’. Can you please suggest what could be the reason behind infinite loop and how we can fix same.

foreach (var item in lstFontFamily)
{
    foreach (Run run in doc.GetChildNodes(NodeType.Run, true))
    {
        string fontName = run.Font.Name;
        if (item.FontName == fontName && !getLstFontFamily.Any(x => x.Description == item.Description))
        {
            FonFamilyList fontWarningMsg = new FonFamilyList();
            fontWarningMsg.Description = item.Description;
            fontWarningMsg.PageNumber = collector.GetStartPageIndex(run);
            fontWarningMsg.WordDocText = run.Text;
            fontWarningMsg.FontName = item.FontName;
            getLstFontFamily.Add(fontWarningMsg);
            break;
        }
    }
}

Thanks,
Saurabh

@saurabhmauryabu

We suggest you please upgrade to the latest version of Aspose.Words for .NET 21.10 and let us know how it goes on your side. Hope this helps you.

If you still face problem, please attach the following resources here for testing:

  • Your input Word document.
  • Please create a standalone console application (source code without compilation errors) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

thanks @tahir.manzoor. As per your suggestion when we tried new version of dll then this issue got resolved.

We are using 16.1 version currently. Do you have any migration guidelines if need to migrate from 16.1 to latest version 21.10.

regards,
Saurabh

@saurabhmauryabu

You can find the API changes in release notes of Aspose.Words for .NET from here:
https://docs.aspose.com/words/net/aspose-words-for-net/

If you face any issue, please share the API detail that causes issue at your end. We will then share the correct APIs with you.