Arabic text in SVG is rendered reversed (LTR instead of RTL) when svg is added to pdf on linux

Hello,

This is loosely related to the thread here: Font Issues with content containing Arabic and Chinese when saving to PDF on Linux

The issues from the original thread regarding improper font have all been resolved, for which I thank you for your help in that matter. However, it was brought to my attention that if we add an svg containing Arabic to a pdf from a Linux environment, the Arabic word order is reversed. The rest of the text in the pdf maintains the correct order. Additionally, pdfs generated in a windows environment do not show this behavior and work as expected, as do Docx files generated in both Linux and Windows environments.

Here is a docx file generated on the Linux environment:
Process Narrative - الاوامر الثابتة (1) (4).docx (26.4 KB)

Here is the INCORRECT pdf generated on Linux, showing the Arabic word order changed:
Process Narrative - الاوامر الثابتة (1) (14).pdf (95.0 KB)

For comparison, here is a CORRECT pdf generated on Windows, showing correct order:
WINDOWS_Process Narrative - الاوامر الثابتة (1).pdf (146.2 KB)

For example, notice that in the diagram svg, الاوامر اصدار appears instead of اصدار الاوامر, in addition to other order changes (the text under ‘Shape List’ and elsewhere in the pdf has remained in the correct order; only the SVG is affected).

Here is the content of the svg before it is added to the document:
svgAsString.pdf (94.0 KB)

And finally, here is the basic flow of our code (same as from the other thread, with the addition of fallback and substitution settings and harfbuzz as was recommended):

downloadPDF(String svg)
{
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    Document doc = new Document();
    DocumentBuilder builder = new DocumentBuilder(doc);

    if (SystemUtils.IS_OS_LINUX)
    {
        FontSettings fontSettings = new FontSettings();
        fontSettings.getSubstitutionSettings().getFontConfigSubstitution().setEnabled(true);
        fontSettings.getSubstitutionSettings().getTableSubstitution().load(
            this.getClass().getResourceAsStream("/substitutionSettings.xml")
        );
        fontSettings.getFallbackSettings().load(
            this.getClass().getResourceAsStream("fallbackSettings.xml")
        );
        builder.getDocument().setFontSettings(fontSettings);
    }
    builder.insertHtml(someHtmlGoesHere, true);
    builder.insertBreak(BreakType.PARAGRAPH_BREAK);

    builder.insertImage(svg.getBytes(StandardCharsets.UTF_8));

    builder.insertBreak(BreakType.PARAGRAPH_BREAK);
    builder.insertHtml(someOtherHtmlGoesHere, true);

    doc.getLayoutOptions().setTextShaperFactory(HarfBuzzTextShaperFactory.getInstance());

    SaveOptions saveOptions = DocSaveOptions.createSaveOptions(SaveFormat.PDF);
    // SaveFormat.DOCX for word version

    doc.save(baos, saveOptions);
    return baos.toByteArray();
}

I would appreciate any assistance or advice you can offer regarding this issue. Please let me know if you need any more information.

Best regards,
Avery Norris

1 Like

@averyscottnorris I was managed to reproduce your issue on my side. I have logged it as WORDSNET-24659 in our defect tracking system. We will keep you informed and let you know once it is resolved.

2 Likes

Just wanted to check in and see if there’s been any updates here. Is it possible that this would be fixed in the next few months? We were looking to replace our use of Apache FOP’s PdfTranscoder with Aspose words, but there is resistance from some who would like to see this issue fixed before we make the transition.

@averyscottnorris We have completed analyzing the issue. Problem happens because Arabic texts inside SVG are separated by a space for example: "منطقه فروع". Open Sans does not have glyphs for Arabic letters, but it has the glyph for “a space”. So, Open Sans font for Arabic letters is substituted by another font (Amiri on developer’s machine). Because of that, such separated-by-space Arabic texts are split to three pieces and rendered (and reversed) separately.
Unfortunately, the issue is not yet scheduled for development, so no estimates are available yet. We will keep you updated and let you know once the issue is resolved or we have more information for you.

1 Like

The issues you have found earlier (filed as WORDSNET-24659) have been fixed in this Aspose.Words for Java 23.3 update.

1 Like