Free Support Forum - aspose.com

Convertig Word to Pdf including utf-8 signs is not working correctly

Hello Aspose,

I am trying to convert a word file (.docx) into a pdf file (.pdf) using Aspose.Words.
The word documents includes Checkboxes represented by the utf8-signs ☒ (true/checked) and ☐ (false/unchecked). While running the programm the signs will be translated and in the generated pdf you can see the “Checkbox image”. This works perfectly fine while running the programm on a Windows OS but not for a Linux OS.
Is there anything I can do to replace the utf-signs on a linux OS or maybe do I have to use different signs?

Thanks in advance

@KHd,

Please ZIP and upload your input Word document and Aspose.Words generated output PDF file showing the undesired behavior here for testing. We will then investigate the issue on our end and provide you more information.

Hello @awais.hafeez,

I got the solution. A windows OS will translate the signs automatically in a font where the checkbox signs are included, for instance the font “Segoe UI Symbol”. In Linux environment this is not working automatically so I check if the sign to set is a checkbox and if true I will set the font manually.
Like this
if (text.hashCode() == 9746 || text.hashCode() == 9744){
Font font = run.getFont();
font.setName(“Segoe UI Symbol”);
}
run.setText(text);

Is there a possibility to check if the actual setted font of the current run (run.getFont()) does include the text I wanna set? I am using different fonts so in some cases I don´t even need to replace the font with Segoe UI Symbol because the font does include the sign by default.

Thanks in advance

@KHd,

You can get notification of missing fonts and font substitution during rendering Word document to PDF by using Aspose.Words for Java. Please check the following article:

How to Receive Notification of Missing Fonts and Font Substitution during Rendering

@awais.hafeez,

thanks for your fast reply.
I checked your recommended article and maybe I just did not get it right but I can not find any way how to check a font for supporting symbols.
I am setting runs and I wanna know if there is a way to be sure that the sign I want to set is included in a specific font and if not I wanna get an error/exception/warning.
Here is an example:
The word “true” is a placholder for this sign: ☒.
My program gets a word document and should transfer this document into a pdf. Placholders should be replaced.
Let´s say if you find the word “true” replace it with this sign. The problem is a user can write “true” in any kind of font. Because some font does not include this sign the output pdf shows nothing.
I hope I could explain the problem good enough for you to understand.

@KHd,

I am afraid, this does not seem possible.

However, you can assign a ‘known font’ to the placeholder text (i.e. ‘true’) before saving to PDF. Please check the following code:

Document doc = new Document("D:\\Temp\\input.doc");

FindReplaceOptions opts = new FindReplaceOptions();
opts.setDirection(FindReplaceDirection.BACKWARD);
opts.setReplacingCallback(new ReplacingCallback());

doc.getRange().replace("true", "" , opts);

doc.save("D:\\temp\\awjava-18.8.doc");
/////////////////////////////////////////////////
static class ReplacingCallback implements IReplacingCallback {

    public int replacing(ReplacingArgs e) throws Exception {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.getMatchNode();

        // The first (and may be the only) run can contain text before the match,
        // in this case it is necessary to split the run.
        if (e.getMatchOffset() > 0)
            currentNode = splitRun((Run) currentNode, e.getMatchOffset());

        ArrayList runs = new ArrayList();

        // Find all runs that contain parts of the match string.
        int remainingLength = e.getMatch().group().length();
        while ((remainingLength > 0) && (currentNode != null) && (currentNode.getText().length() <= remainingLength)) {
            runs.add(currentNode);
            remainingLength = remainingLength - currentNode.getText().length();

            // Select the next Run node.
            // Have to loop because there could be other nodes such as BookmarkStart etc.
            do {
                currentNode = currentNode.getNextSibling();
            } while ((currentNode != null) && (currentNode.getNodeType() != NodeType.RUN));
        }

        // Split the last run that contains the match if there is any text left.
        if ((currentNode != null) && (remainingLength > 0)) {
            splitRun((Run) currentNode, remainingLength);
            runs.add(currentNode);
        }

        //Change font of all runs in the sequence.
        for (Run run : (Iterable<Run>) runs)
        {
            run.getFont().setName("Arial");
            run.getFont().setColor(Color.GREEN);
        }

        // Signal to the replace engine to do nothing because we have already done all what we wanted.
        return ReplaceAction.SKIP;
    }

    /**
     * Splits text of the specified run into two runs. Inserts the new run just
     * after the specified run.
     */
    private Run splitRun(Run run, int position) throws Exception {
        Run afterRun = (Run) run.deepClone(true);
        afterRun.setText(run.getText().substring(position));
        run.setText(run.getText().substring((0), (0) + (position)));
        run.getParentNode().insertAfter(afterRun, run);
        return afterRun;
    }
}

@awais.hafeez
Thanks for your answer. I tried your suggestion but sadly it is not working like expected.
This is the code snipped I use for replacing the checkbox.

//Set CheckboxString and Segoe UI Symbol Font to currentRun
//pCheckboxToReplace: the utf8-sign of the checkbox (can be String.valueOf((char) 9746) or String.valueOf((char) 9744))
checkboxRun = (Run) pRun.deepClone(true);
checkboxRun.setText(pCheckboxToReplace);
checkboxRun.getFont().setName(“Segoe UI Symbol”);
//Set no Textstyle to Checkbox
checkboxRun.getFont().setBold(false);
checkboxRun.getFont().setItalic(false);
checkboxRun.getFont().setUnderline(0);
paragraph.insertAfter(checkboxRun, (firstRun != null ? firstRun : pRun));

Running the Code on a Windows OS I get the attached WindowsTest.pdf and if checking the properties of the Document the font Segoe UI Symbol is included.

When running the Code on a Linux OS the output pdf LinuxTest.pdf does not include the Segoe UI Symbol font and the cDocuments.zip (308.8 KB)
heckboxes are therefore shown like a unknown sign. Using “setEmbedFullFonts” does not make a difference at all.
What does make a difference is if you add any text to the word document in Segoe UI Symbol (WithSegoeUISymbol.docx) then everything works as it should. Output pdf : LinuxWithSegoeUiSymbol.pdf

The other word document which is used on both systems -> NoSegoeUISymbol.docx

Is there a possibility to get the result like the LinuxWithSegoeUiSymbol.pdf but with using the NoSegoeUISymbol.docx?

For your Information: One String which is replaced to a checkbox is LinuxTest.pdf (84.1 KB)
LinuxWithSegoeUiSymbol.pdf (68.9 KB)
WindowsTest.pdf (74.6 KB)
next to the text “Datennutzungserklärung vorliegend” on the first page of both word documents.

Sorry the upload went kind of wrong. Hope you can read the text anyway.

@KHd,

In addition to PDF, please also save the final output to DOCX format and share the output Word file here for further testing.

Please also create a standalone runnable simple Java application (source code without compilation errors) that helps us to reproduce your problem on our end and attach it here for testing.

Thanks for your cooperation.

@awais.hafeez

Please find attached the Java application “AsposeTestApplication”.
Input Parameter would be the docx file you wanna transform to pdf (java -jar WordToPdf.jar NoSegoeUISymbol.docx)
In the zip folder you will find a short version of the word file which I have already attached in the last post.
I will also attach my output files (OutputDocuments.zip (111.3 KB)) I generated with the same java application “AsposeTestApplication”.

I hope this will help.
Thanks in advance

I can not upload my zip folder. the size is around 63 MB.

@KHd,

The problem occurs because of a missing font i.e. ‘Segoe UI Symbol’. There is another option that you can try:
FontSettings.setFontSubstitutes method

FontSettings fs = new FontSettings();
fs.setFontSubstitutes("Segoe UI Symbol", new String[] { "Arial" });

Document doc = new Document("D:\\temp\\input.docx");
doc.setFontSettings(fs);
doc.save("D:\\temp\\awjava-18.9.pdf");

@awais.hafeez,

thank you very much for your answer.
I am using instead of Segou UI Symbol the DejaVu Sans font. This font is working for Windows as well as for Linux.

@KHd,

It is great that you were able to find what you were looking for. Please let us know any time you have any further queries.