What Fonts Are Required for Hindi Language Support?

Hi, we’re using ASPOSE to convert PDF files into .docx format for translation and translation works fine for all the languages except Hindi. Looks like we don’t have all the necessary fonts for its support in Aspose.
Attached is the version in Hindi, where you can see the fonts issue Accounting_Adjustment_Impact_Assessment_20170919.Hindi.pptx.pdf (35.4 KB)

Original version:
image.png (378.5 KB)
in Russian:
image.jpg (402.4 KB)
in French:
image.jpg (418.5 KB)
In hindi:
image.png (184.8 KB)

Could you please let us know which fonts are missing for Hindi?

Thanks in advance

@DaryaDel

The shared PDF file looks like it was generated with Aspose.PDF for .NET. However, you have mentioned that you are converting the PDF into DOCX. Would you please share the sample code snippet that you are using to convert the file into DOCX. Also, please confirm do you want to generate a readable .docx file from the very PDF that you have already shared? We will further proceed to assist you accordingly.

Hi, thank you for your reply. Yes, we convert this PDF into DOCX and here are the code samples:

using System.IO;
using Deloitte.Cortex.ConverterService.Models;
using Deloitte.Cortex.ConverterService.ViewModels.ConvertFileToWord.Requests;

namespace Deloitte.Cortex.ConverterService.Services.ToWordConverters.PdfToWordConverter
{
    public interface IPdfToWordConverterService
    {
        TempFile Convert(TempFile srcStream, WordConversionSetting settings);
    }
}


using System;
using System.IO;
using Aspose.Pdf;
using Deloitte.Cortex.ConverterService.Models;
using Deloitte.Cortex.ConverterService.ViewModels.ConvertFileToWord.Requests;
using Microsoft.Extensions.Logging;

namespace Deloitte.Cortex.ConverterService.Services.ToWordConverters.PdfToWordConverter
{
    internal class PdfToWordConverterService : IPdfToWordConverterService
    {
        private readonly ILogger<PdfToWordConverterService> _logger;

        public PdfToWordConverterService(ILogger<PdfToWordConverterService> logger)
        {
            _logger = logger;
        }

        public TempFile Convert(TempFile srcStream, WordConversionSetting settings)
        {
            if (srcStream == null || srcStream.FileInfo.Length == 0)
                throw new ArgumentNullException(nameof(srcStream), "File should be passed");

            // Save using save options
            var saveOptions = new DocSaveOptions
            {
                AddReturnToLineEnd = settings?.AddReturnToLineEnd ?? true,
                BatchSize = settings?.BatchSize ?? 100,
                ImageResolutionX = settings?.ImageResolutionX ?? 300,
                ImageResolutionY = settings?.ImageResolutionY ?? 300,
                MaxDistanceBetweenTextLines = settings?.MaxDistanceBetweenTextLines ?? 0.25F,
                RecognizeBullets = settings?.RecognizeBullets ?? false,
                Format = settings?.Format ?? DocSaveOptions.DocFormat.DocX,
                Mode = settings?.RecognitionMode ?? DocSaveOptions.RecognitionMode.Textbox,
                RelativeHorizontalProximity = settings?.RelativeHorizontalProximity ?? 0,
            };

            TempFile tempFile = new TempFile(_logger);

            using var pdfDocument = new Document(srcStream.FilePath);
            pdfDocument.Save(tempFile.FilePath, new DocSaveOptions { Format = DocSaveOptions.DocFormat.DocX } );

            return tempFile;
        }
    }
}

using System;
using System.IO;
using System.Threading.Tasks;
using Deloitte.Cortex.ConverterService.Utils;
using Deloitte.Cortex.Shared.ExceptionHandling;
using Microsoft.Extensions.Logging;

namespace Deloitte.Cortex.ConverterService.Models
{
    public class TempFile : IDisposable, IAsyncDisposable
    {
        private readonly ILogger _logger;
        private FileStream _stream;
        private readonly string _filePath;
        private bool _disposed;

        public TempFile(ILogger logger) : this(Path.Combine(Path.GetTempPath(), Path.GetTempFileName()), null, logger)
        {
        }

        public TempFile(string filePath, ILogger logger) : this(filePath, null, logger)
        {
        }

        public TempFile(string filePath, string fileName, ILogger logger)
        {
            _logger = logger;
            _filePath = filePath;
            if (!File.Exists(_filePath))
            {
                File.Create(_filePath);
            }
            FileName = fileName ?? FileInfo.Name;
            _logger?.LogInformation($"File:{FileInfo.Name}; {SizeUtils.SizeSuffix(FileInfo.Length)} - init.");
        }

        ~TempFile()
        {
            Dispose();
        }

        public string FilePath => _filePath;

        public string FileName { get; }

        public FileInfo FileInfo => new FileInfo(_filePath);

        public void CloseFileStream()
        {
            _stream?.Dispose();
            _stream = null;
        }

        public FileStream FileStream
        {
            get
            {
                lock (this)
                {
                    CloseFileStream();

                    _logger?.LogInformation($"FileStream:{FileInfo.Name}; {SizeUtils.SizeSuffix(FileInfo.Length)}");

                    if (_disposed)
                        throw new ObjectDisposedException(_filePath);

                    if (!FileInfo.Exists)
                        throw new NotFoundException(_filePath);

                    _stream = new FileStream(_filePath, FileMode.Open, FileAccess.ReadWrite);
                    return _stream;
                }
            }
        }

        public ValueTask DisposeAsync()
        {
            return new ValueTask(Task.Run(Dispose));
        }

        public void Dispose()
        {
            if (_disposed)
                return;

            try
            {
                lock (this)
                {
                    _stream?.Dispose();
                    _stream = null;
                    if (File.Exists(FilePath))
                    {
                        _logger?.LogInformation($"File:{FileInfo.Name}; {SizeUtils.SizeSuffix(FileInfo.Length)} - disposed.");
                        File.Delete(FilePath);
                    }
                }

            }
            catch (Exception ex)
            {
                _logger?.LogError(ex, "TempFile Dispose failed");
            }

            _disposed = true;
        }
    }
}

Thank you

@DaryaDel

The PDF that you have shared contained blocks instead of readable text. Could you please confirm if you want to repair it and generate .docx in readable format? OR please share a valid PDF file with correct content so that we can test the scenario accordingly and share our feedback with you.

Sure, here is the original document

@DaryaDel

The document that you have shared is a PPTX and it also does not have Hindi characters in it. Please share a PDF having Hindi Language characters in it. We will try to convert it into DOCX using Aspose.PDF and share our feedback with you.

Ok, in that case please use the file that I’ve provided before - I’ve provided original PDF in English and its version in Hindi after translation (if you look at that file you will see why we think that some Hindi characters are missing - this is what we see instead of Hindi Language characters). Also please review some of our code samples below:

Background:
netcoreapp3.1
Aspose.Slides for .NET 21.2.0
image: mcr.microsoft.com/dotnet/core/aspnet:3.1.13-focal

Code:

private void Test()
{​​​​​​​
    const string staticFilesFonts = "StaticFiles/Fonts";
    string fullPathStaticFilesFonts = GetFullPathToFontFolder(staticFilesFonts);

    Aspose.Slides.FontsLoader.ClearCache();
    Aspose.Slides.FontsLoader.LoadExternalFonts(new[] {​​​​​​​ fullPathStaticFilesFonts, staticFilesFonts }​​​​​​​);

    using var presentation = new Aspose.Slides.Presentation("myfile.pptx");
    presentation.Save("myfile.pdf", Aspose.Slides.Export.SaveFormat.Pdf);
}​​​​​​​


private static string GetFullPathToFontFolder(string path)
{​​​​​​​
    string assemblyFilePath = Assembly.GetExecutingAssembly().Location;
    string assemblyPath = Path.GetDirectoryName(assemblyFilePath);
    string fullPathStaticFilesFonts = Path.Combine(assemblyPath, path);

    return fullPathStaticFilesFonts;
}​​​​​​​

Fonts: here

@DaryaDel

Your original inquiry was related to PDF to DOCX conversion and the PDF you shared with it contained blocks instead of text. Please note that the source PDF file should have valid text in order to generate readable DOCX file. Therefore, we requested you to share a sample PDF document which contains proper and readable Hindi text so that we could try to convert it in DOCX and observe if there was any issue.

Furthermore, it also seems like you are generating PDF from PPTX files. However, the PPTX file shared by you did not contain any Hindi text either. If you are preparing a PDF document by translating the content, Arial Unicode MS font should be enough for correct rendering of Hindi text. Please check the below sample code snippet that we used to generate a PDF document having Hindi:

Document pdfDocument = new Document();
Page page = pdfDocument.Pages.Add();
// using text fragment
var textFragment = new TextFragment("नमस्ते दुनिया");
textFragment.Position = new Position(596, 579);
page.Paragraphs.Add(textFragment);
pdfDocument.Save(dataDir + "outputusingtextfragment.pdf");

outputusingtextfragment.pdf (83.6 KB)

Also, the code snippet that you have shared seems fine and does not require any change. Please make sure that Arial Unicode MS font is installed in your system. In case issue still persists, please share a sample source file (which has Hindi text) and an expected output file which you want to create using Aspose.PDF. We will further proceed to assist you accordingly.

sorry for confusion here - the problem occurs when converting PPTX into PDF

@DaryaDel

We are moving your inquiry to Aspose.Slides forum where you will be assisted shortly. Meanwhile, please share a PPTX file which has Hindi Text so that the conversion to PDF can be tested in our environment as well.

Hi there.
Thank you.

We were using pptx ‘myfile.pptx’ (100.3 KB).

Let me remind you of the right piece of code that we used:

private void Test()
{
const string staticFilesFonts = “StaticFiles/Fonts”;
string fullPathStaticFilesFonts = GetFullPathToFontFolder(staticFilesFonts);

Aspose.Slides.FontsLoader.ClearCache();
Aspose.Slides.FontsLoader.LoadExternalFonts(new[] { fullPathStaticFilesFonts, staticFilesFonts });

using var presentation = new Aspose.Slides.Presentation("myfile.pptx");
presentation.Save("myfile.pdf", Aspose.Slides.Export.SaveFormat.Pdf);

}

private static string GetFullPathToFontFolder(string path)
{
string assemblyFilePath = Assembly.GetExecutingAssembly().Location;
string assemblyPath = Path.GetDirectoryName(assemblyFilePath);
string fullPathStaticFilesFonts = Path.Combine(assemblyPath, path);

return fullPathStaticFilesFonts;

}

@Stan.Stan,
Welcome to our community!

  1. Please specify the OS version on which the code is running.
  2. Check the availability of fonts in your external font folders (try to read a font file, for example).
  3. Check messages about font substitutions as below:
var loadOptions = new LoadOptions();
loadOptions.WarningCallback = new FontWarningHandler();

using (var presentation = new Presentation(pptxPath, loadOptions))
{
    presentation.Save(pdfPath, SaveFormat.Pdf);
}

More details: Getting Warning Callbacks for Fonts Substitution

  1. We wrote before:

Also:

$ cat /etc/os-release

NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
  1. I’ve just checked - fonts are available.
  2. Updated code:
private void Test()
{
    const string staticFilesFonts = "StaticFiles/Fonts";
    string fullPathStaticFilesFonts = GetFullPathToFontFolder(staticFilesFonts);

    Aspose.Slides.FontsLoader.ClearCache();
    Aspose.Slides.FontsLoader.LoadExternalFonts(new[] { fullPathStaticFilesFonts, staticFilesFonts });

    Aspose.Slides.LoadOptions loadOptions = CreateLoadOptions();

    using var presentation = new Aspose.Slides.Presentation("myfile.pptx", loadOptions);
    presentation.Save("myfile.pdf", Aspose.Slides.Export.SaveFormat.Pdf);
}

private static string GetFullPathToFontFolder(string path)
{
    string assemblyFilePath = Assembly.GetExecutingAssembly().Location;
    string assemblyPath = Path.GetDirectoryName(assemblyFilePath);
    string fullPathStaticFilesFonts = Path.Combine(assemblyPath, path);

    return fullPathStaticFilesFonts;
}

private Aspose.Slides.LoadOptions CreateLoadOptions()
{
    var loadOptions = new Aspose.Slides.LoadOptions
    {
        WarningCallback = new AsposeSlidesFontsWarningsHandler(_logger)
    };

    return loadOptions;
}
internal sealed class AsposeSlidesFontsWarningsHandler : IWarningCallback
{
    private readonly ILogger _logger;

    public AsposeSlidesFontsWarningsHandler(
        ILogger logger
    )
    {
        _logger = logger;
    }

    public ReturnAction Warning(IWarningInfo warning)
    {
        _logger.LogWarning($"AsposeSlidesFontsWarningsHandler, WarningType: {warning.WarningType}");
        _logger.LogWarning($"AsposeSlidesFontsWarningsHandler, Description: {warning.Description}");

        return ReturnAction.Continue;
    }
}  

@Stan.Stan,

Please share warning messages that appear in your warning handler.

Sure.
t-log.zip (1.7 KB)

@Stan.Stan,
As you can see in the logs, Aspose.Slides didn’t find Arial font in your environment. Please try to install Arial and other fonts as below:

RUN echo “ttf-mscorefonts-installer msttcorefonts/accepted-mscorefonts-eula select true” | debconf-set-selections
RUN apt-get install -y --no-install-recommends fontconfig ttf-mscorefonts-installer
RUN fc-cache -f -v

apt-get install -y --no-install-recommends fontconfig ttf-mscorefonts-installer

Reading package lists… Done
Building dependency tree
Reading state information… Done
fontconfig is already the newest version (2.13.1-2ubuntu3).
ttf-mscorefonts-installer is already the newest version (3.7ubuntu6).
0 upgraded, 0 newly installed, 0 to remove and 4 not upgraded.

fc-cache -f -v

/usr/share/fonts: caching, new cache contents: 0 fonts, 6 dirs
/usr/share/fonts/X11: caching, new cache contents: 0 fonts, 2 dirs
/usr/share/fonts/X11/encodings: caching, new cache contents: 0 fonts, 1 dirs
/usr/share/fonts/X11/encodings/large: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/X11/util: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cMap: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cmap: caching, new cache contents: 0 fonts, 5 dirs
/usr/share/fonts/cmap/adobe-cns1: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cmap/adobe-gb1: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cmap/adobe-japan1: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cmap/adobe-japan2: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/cmap/adobe-korea1: caching, new cache contents: 0 fonts, 0 dirs
/usr/share/fonts/opentype: caching, new cache contents: 0 fonts, 1 dirs
/usr/share/fonts/opentype/urw-base35: caching, new cache contents: 35 fonts, 0 dirs
/usr/share/fonts/truetype: caching, new cache contents: 0 fonts, 5 dirs
/usr/share/fonts/truetype/dejavu: caching, new cache contents: 6 fonts, 0 dirs
/usr/share/fonts/truetype/droid: caching, new cache contents: 1 fonts, 0 dirs
/usr/share/fonts/truetype/liberation: caching, new cache contents: 16 fonts, 0 dirs
/usr/share/fonts/truetype/msttcorefonts: caching, new cache contents: 60 fonts, 0 dirs
/usr/share/fonts/truetype/noto: caching, new cache contents: 1 fonts, 0 dirs
/usr/share/fonts/type1: caching, new cache contents: 0 fonts, 1 dirs
/usr/share/fonts/type1/urw-base35: caching, new cache contents: 35 fonts, 0 dirs
/usr/local/share/fonts: caching, new cache contents: 0 fonts, 0 dirs
/root/.local/share/fonts: skipping, no such directory
/root/.fonts: skipping, no such directory
/usr/share/fonts/X11: skipping, looped directory detected
/usr/share/fonts/cMap: skipping, looped directory detected
/usr/share/fonts/cmap: skipping, looped directory detected
/usr/share/fonts/opentype: skipping, looped directory detected
/usr/share/fonts/truetype: skipping, looped directory detected
/usr/share/fonts/type1: skipping, looped directory detected
/usr/share/fonts/X11/encodings: skipping, looped directory detected
/usr/share/fonts/X11/util: skipping, looped directory detected
/usr/share/fonts/cmap/adobe-cns1: skipping, looped directory detected
/usr/share/fonts/cmap/adobe-gb1: skipping, looped directory detected
/usr/share/fonts/cmap/adobe-japan1: skipping, looped directory detected
/usr/share/fonts/cmap/adobe-japan2: skipping, looped directory detected
/usr/share/fonts/cmap/adobe-korea1: skipping, looped directory detected
/usr/share/fonts/opentype/urw-base35: skipping, looped directory detected
/usr/share/fonts/truetype/dejavu: skipping, looped directory detected
/usr/share/fonts/truetype/droid: skipping, looped directory detected
/usr/share/fonts/truetype/liberation: skipping, looped directory detected
/usr/share/fonts/truetype/msttcorefonts: skipping, looped directory detected
/usr/share/fonts/truetype/noto: skipping, looped directory detected
/usr/share/fonts/type1/urw-base35: skipping, looped directory detected
/usr/share/fonts/X11/encodings/large: skipping, looped directory detected
/var/cache/fontconfig: cleaning cache directory
/root/.cache/fontconfig: not cleaning non-existent cache directory
/root/.fontconfig: not cleaning non-existent cache directory
fc-cache: succeeded

Nothing is changed.

Also I would like to say that Arial font is in the folder.
image.png (58.9 KB)

We use other Aspose assemblies such as Aspose.Word, Aspose.Cells, Aspose.Pdf to initiate fonts and we don’t experience such issue.
The real code of font initialization for all Aspose assemblies that we use is here:

public class AsposeFontInitializer
{
private const string StaticFilesFonts = “StaticFiles/Fonts”;
private string _fullPathStaticFilesFonts;

    public void Initialize()
    {
        SetFullPathToFontFolder();

        InitializeWords();
        InitializeCells();
        InitializePdf();
        InitializeSlides();
    }

    private void SetFullPathToFontFolder()
    {
        string assemblyFilePath = Assembly.GetExecutingAssembly().Location;
        string assemblyPath = Path.GetDirectoryName(assemblyFilePath);
        _fullPathStaticFilesFonts = Path.Combine(assemblyPath, StaticFilesFonts);
    }

    private static void InitializeWords()
    {
        var fontSources = new Aspose.Words.Fonts.FontSourceBase[]
        {
            new Aspose.Words.Fonts.SystemFontSource(1),
            new Aspose.Words.Fonts.FolderFontSource(StaticFilesFonts, true, 0),
        };

        Aspose.Words.Fonts.FontSettings.DefaultInstance.SetFontsSources(fontSources);
    }

    private static void InitializeCells()
    {
        var fontSources = new Aspose.Cells.FontSourceBase[]
        {
            new Aspose.Cells.FolderFontSource(StaticFilesFonts, true),
        };

        Aspose.Cells.FontConfigs.SetFontSources(fontSources);
    }

    private static void InitializePdf()
    {
        var fontSources = new Aspose.Pdf.Text.FontSource[]
        {
            new Aspose.Pdf.Text.SystemFontSource(),
            new Aspose.Pdf.Text.FolderFontSource(StaticFilesFonts),
        };

        foreach (Aspose.Pdf.Text.FontSource fontSource in fontSources)
            Aspose.Pdf.Text.FontRepository.Sources.Add(fontSource);
    }

    private void InitializeSlides()
    {
        Aspose.Slides.FontsLoader.ClearCache();
        Aspose.Slides.FontsLoader.LoadExternalFonts(new[] { _fullPathStaticFilesFonts, StaticFilesFonts });
    }
}

Also we used the following commands:
RUN apt update
RUN apt install -y fontconfig
RUN apt install -y libgdiplus
RUN apt install -y --no-install-recommends libc6-dev
RUN echo “yes” | apt install ttf-mscorefonts-installer -y

@Stan.Stan,
Please check your docker configuration for Aspose.Slides: How to Run Aspose.Slides in Docker.