Save to PDF is breaking the "bullet" character, shows up as rectangle

Our web application supports saving a document both to Word and PDF, using Aspose.Words for Java. For Word, we have no problem. But, for PDF, it is rendering the “disc” bullets used for an unordered list in a way that they appear in the PDF document as a rectangle instead of a solid circle. The call we are using is Document.save() while providing the PDF options as the second arg.

I have done much searching on this issue and understand it requires the host system to have TrueType fonts that contain the bullet character (in this case U+2022, the default “disc” bullet). The system that generates the PDF needs the font available to insert into the PDF because it must be viewable standalone without the fonts installed, unlike a Word document that will use the fonts on the system where they are being viewed.

However, the installed fonts look fine as far as I can tell. Our code defaults to attempting to use Calibri which is not installed by default. So, on macOS it’ll substitute Arial Unicode MS. And, on our Docker containers running CentOS it’ll substitute DejaVuSans. All systems have these fonts, and I confirm these fonts also know about U+2022 bullet character. Additionally, I made use of the IWarningCallback during the save() call to print out any font-related warnings. When substitutions happen, it does print them out. But, I see no other font-related information that looks suspect. I also tried setting it to several other fonts, such as Courier New and Arial… while this changes the font, it doesn’t fix the bullets.

If I nest the unordered lists, the bullets under the top level (2+ levels deep) look fine. For example, the hollow circle bullet renders with no problem. It is only the top-level bullet, U+2022, the one you get if you specify in HTML list-style = “disc” that has the issue. If I override that and force it to use any other (like “square”) it works fine even for the top level list.

And, some systems still work fine, while others don’t. We’re not sure what the difference is, but I have a hypothesis the issue may be activated by small differences in how the Docker images are being generated. There are small JVM and Docker version differences between those that have the issue and those that don’t. But, the Docker images have no font differences.

I analyzed the generated PDFs using Acrobat Pro. When it works, I can see the bullets in the PDF are U+2022. When it doesn’t work, I can see the PDF generation that happened on our Docker containers saved it as U+2023, the “triangle bullet”. But, these fonts all render the triangle bullet as a rectangle, which I guess is “correct” as far as rendering them goes, and thus the issue is actually that this bullet somehow got saved as U+2023.

Now, on macOS it is even stranger. Instead of saving the bullet as U+2023, it saves it as these 3 bytes:
EF BF BF
which apparently represents “non character”.

Here’s what the affected part of the PDF looks like:

Here’s Acrobat compare feature showing how a PDF that works compares with one that doesn’t.

This small Java program will reproduce the issue on macOS, but not on our Docker containers that have the issue (no idea why). Our real code is of course more complex than this, but I think it is a fair representation of how we are using your API:

Main.7z (594 Bytes)

And we can also reproduce this using DocumentBuilder.insertHtml() to add UL/LI elements to a Document; I made use of “builder.getListFormat().setList()” in that example to rule out possible problems unique to insertHtml. In that code, if I change ListTemplate.BULLET_DEFAULT to any other bullet, it starts working.

Thanks!

@chuckw For rendering bullets Symbol font is used. Most likely the problem is caused by a known peculiarity. Windows “Symbol” font is a symbolic font (like “Webdings”, “Wingdings”, etc.) which uses Unicode PUA. MacOS or Linux “Symbol” font on the other hand is a proper Unicode font (for example Greek characters are in the U+0370…U+03FF Greek and Coptic block). So these fonts are incompatible and Mac/Linux “Symbol” font cannot be used instead of Windows “Symbol” without additional actions. In your particular case it is required to change the bullet codepoint from PUA U+F0B7 (or U+00B7 which also can be used in MS Word for symbolic fonts) to the U+2022 in the document.
Please try using the following code as a workaround:

Document doc = new Document("/tmp/BulletPoint.docx");

for (com.aspose.words.List lst : doc.getLists())
{
    for (com.aspose.words.ListLevel level : lst.getListLevels())
    {
        if (level.getFont().getName().equals("Symbol") && level.getNumberFormat().equals("\uF0B7"))
        {
            level.setNumberFormat("\u2022");
        }
    }
}

doc.save("/tmp/out.pdf");

Thanks for your quick response; I’ll try that out and get back to you. I had been wondering how the Symbol font factors into this. Is there any need to try to get a Symbol font installed on the systems running this code or will it work without it? Our Docker containers only have the DejaVuSans fonts installed, nothing else at all, although they somehow used to be able to render the bullets without this Symbol font. And, my macOS laptop has something called Symbol.ttf installed, not by me, was just there… I don’t know if this is the same as the Symbol font that would be on Windows. It is located at /System/Library/Fonts/Symbol.ttf.

Thanks!

@chuckw Yes, to properly render symbol, including bullets the Symbol font is required. And no, Symbol font in MacOS is not the same as Symbol font in Windows as described above. Though they have the same name and purpose.

Thanks, I had some success but not complete success using that code snippet to correct the bullets, on macOS. (I did not yet try a Docker container as that’ll take a lot more effort to deploy a debug build, get Symbol font out there, etc.) I saw the code snippet would correct some of the bullets depending on where they were located in the Word template document we start with. But, not all the bullets are corrected.

I should give a bit of background about that… We start with a Word mail merge template, feed that into Aspose and have it do the mail merge where we insert our data into the mail merge fields. There’s a mail merge handler callback that basically calls DocumentBuilder.insertHtml() to insert the content we care about into it (and this may have bullets in it). That content is very simple in my example, just a UL with LI elements under it, no formatting.

To make this simpler to analyze by removing variables, here’s what I did. I deleted nearly everything from our template, including the actual mail merge fields; the only thing I have in it is a list I typed into it. Then, I commented out the part of our code that invokes the mail merge. So, it should basically take the template as-is and save it back out again as PDF with no substantial changes. This output PDF still has a problem with the bullets not being corrected. Then, I made a new Word file from scratch (chose the “Blank Document” option and went from there), typed the bullet list into it, and used that as the template – this one actually corrects the bullets if I have the code snippet in place.

The two templates look the same to me, but clearly something subtle differs about them. We can’t use it as a solution to just make new templates from scratch. Customers copy and modify these, so we have no way to go update all templates out in the wild. The document that doesn’t work properly reaches that setNumberFormat() call 9 times, so it is finding stuff, although I’m not sure where 6 of the 9 times are from since there’s only 3 bullets in it. And, the document that does work properly reaches setNumberFormat() call 3 times.

Are you able to tell what may be going on if I attach the sample files? Here are the 2 templates:

This template works:
TemplateFromScratch.zip (6.2 KB)

This template does not work:
TemplateSimplified.zip (9.3 KB)

If I generate Word docs using Aspose, I get these (bullets are good in both):

FromScratch_Generated.doc.zip (3.0 KB)

Simplified_Generated.doc.zip (5.4 KB)

And here are the PDFs generated from those same templates, with Simplified.pdf not having good bullets:

FromScratch.pdf (21.4 KB)

Simplified.pdf (22.1 KB)

Thanks!

@chuckw Could you please attach your Symbol font here, so we test with the same fonts?

The code provided above does not loop through the list items in the document. It loops through the list definition in the document. There might be list definitions, which are not used in the document’s body. So setNumberFormat() method is called more than 3 times.

The workaround is not required when the target format is MS Word document. It should be used only if the target format requires document rendering, i.e. PDF, XPS, Image, HtmlFixed etc. And if the Symbol font is a proper Unicode font.

Actually, the simplest solution of the problem is deploying Symbol font from Windows to your Docker image. Then no workaround is required to properly render bullets.
symbol.zip (46.2 KB)

I can put in a check to see if we’re saving to Word and then skip the workaround, thanks for that tip.

And yes, I’ll see if I can get that symbol.zip onto one of the containers and see if that helps without a workaround.

Here’s the Symbol font for macOS that I’m using:

Symbol.ttf.zip (48.6 KB)

Thanks!

@chuckw Thank you for additional information. I have tested conversion of both your documents and the provided font using the following code:

// Specify font sources, which will be used upon rendering documents.
FontSettings.getDefaultInstance().setFontsSources(new FontSourceBase[] {
        new SystemFontSource(), // Default system font source.
        new FolderFontSource("/tmp/fonts", true) // folder contains Symbol font you have provided
});
        
Document doc = new Document("/tmp/in.doc");
        
for (com.aspose.words.List lst : doc.getLists())
{
    for (com.aspose.words.ListLevel level : lst.getListLevels())
    {
        if (level.getFont().getName().equals("Symbol") && level.getNumberFormat().equals("\uF0B7"))
        {
            level.setNumberFormat("\u2022");
        }
    }
}
    
doc.save("/tmp/out.pdf");

Thanks for trying those. Did your tests show that both templates had the correct bullets when run through Aspose?

I used various combinations of setFontsSources() to study this issue further, on macOS using the built-in Symbol font as well as the one you provided to see how they differ. I’m puzzled by what I’m seeing. I’ll summarize as best I can since I did a lot of combinations of testing:

If I remove all the font sources, so it doesn’t have access to any fonts at all not even system fonts, then all templates ran through Aspose show the correct bullets. They show the correct bullets with or without the workaround (but the workaround did actually reach those setNumberFormat calls). This is the only scenario where both templates show correct bullets. Of course, this is not a solution, just a data point, since we actually need to support various fonts.

If I only add this to the font sources, so that it uses built-in Symbols.ttf and not the one you provided:
new FolderFontSource(“/System/Library/Fonts”, true)
And if the workaround code is in place, then Simplified template does not work, but FromScratch does work. If I remove the workaround code, then neither of them work (FromScratch stops working).

If I only load exactly one font, the symbol.ttf you provided (this dir just has that one file):
new FolderFontSource(“/Users/myusername/fonts”, true)
And if the workaround code is in place, then Simplified template works. But, FromScratch no longer works. (And, all the characters other than bullets look like Greek letters too, so we can’t have that. But, I’m only doing this to analyze the bullets.) If I remove the workaround code, there is no change from with the workaround code; Simplified still works, anda FromScratch still does not work.

Then, I tried two sources:
new FolderFontSource(“/Users/myusername/fonts”, true),
new FolderFontSource(“/System/Library/Fonts”, true)
And if the workaround code is in place, Simplified template does not work, and FromScratch does work. If I remove the workaround code, then neither of them works.

I don’t know what differs between these templates that would result in one of the templates successfully pulling a bullet from one symbol font, and the other template successfully pulling a bullet from the other symbol font. They look like the same bullet to me.

I haven’t had a chance to try uploading the font to the Docker container yet, hope to get to that today.

Would you have any additional suggestions?

Progress! I managed to get the symbol.ttf file that you had attached (which I understand is some Windows-specific font) here earlier over to our docker containers, restarted them, now the bullets are correct when Aspose functionality invoked from there. And just to be clear, this is without any of that workaround code. We can probably live without a macOS fix if none can be figured out, since that’s really only used for development environments and we’ll just deal with it.

We’ll still have to do more testing as we have more than one way we deploy our software.

What might you suggest for dealing with getting one lone symbol.ttf file onto a Docker image? Sure, I could have the Dockerfile run a command to copy this lone file over to /usr/share/fonts, but that seems kind of hacky and I’m not sure I can get everyone to agree to go along with something odd like that. I’m expecting there’s some kind of package you’d install with a “yum” command, etc. Would there be a more official way of doing this?

@chuckw

Yes, both document have correctly rendered bullets. Here are output documents produced on my side:

  1. The fonts folder contains your Symbol and Calibri fonts. The workaround is applied:
    TemplateFromScratch.pdf (21.1 KB)
    TemplateSimplified.pdf (21.7 KB)

  2. The fonts folder contains your Symbol and Calibri fonts. The workaround is not applied:
    TemplateFromScratch_without_workaround.pdf (21.3 KB)
    TemplateSimplified_without_workaround.pdf (21.9 KB)
    In this case the problem is reproducible due to the MacOS Symbol font peculiarity described above.

  3. The fonts folder contains my Symbol and Calibri fonts. The workaround is not applied:
    TemplateSimplified_ms_symbol_without_workaround.pdf (24.0 KB)
    TemplateFromScratch_ms_symbol_without_workaround.pdf (23.3 KB)

So I cannot reproduce the original problem on my side.

Regarding deploying the Symbol font. Pleas see our documentation to learn how to install fonts on Linux:
https://docs.aspose.com/words/java/install-truetype-fonts-on-linux/

But I am afraid, there is no package that includes all MS fonts. So the easiest way is to copy the required fonts into your Docker container.

Thanks, that helps a lot. I’ll have to discuss with the rest of us here how we are going to handle Linux with regard to containers.

It looks like the Aspose font sources feature doesn’t require a font officially installed, just has to be a file in a directory somewhere. Thus, our app we deploy in the container might be able to handle keeping track of an additional font directory on its own if it internally contains this font file and deploys it somewhere then sets the font source to it (plus system). If I stored a font file in a jar as a Java resource, such that it could be read with getResourceAsStream(), is that something that could be set as a source in Aspose? Or, does it really need a file in a file system? (Putting a file like this in the file system is more complex, although something we could do if it is the only way.)

Thanks!

@chuckw Yes, you are absolutely right, you can store the fonts anywhere, it is not required to install the fonts in the system.
You can store the font in JAR. You can use StreamFontSource or MemoryFontSource to get fonts from your JAR file and provide them to Aspose.Words.

1 Like

Oh nice! That’s much better, I’ll see what I can do with those two *FontSource classes.

1 Like

It looks like Microsoft doesn’t allow redistribution of the symbol.ttf font, according to this:
https://learn.microsoft.com/en-us/typography/fonts/font-faq

Is there a way to get this working without requiring this font? We do have some deployments where somehow the bullet is generated correctly even though only DejaVuSans font is there, not sure what the difference is between a working system and one that has the issue. The only differences we could find were minor changes in the JVM versions and Docker versions used in these images.

And no, we can’t just require our customers to download and deploy the font themselves.

Thanks for any help!

@chuckw You can try using free Google Noto fonts as described here:
https://docs.aspose.com/words/java/manipulate-and-substitute-truetype-fonts/#predefined-font-fallback-settings-for-google-noto-fonts

In case of using Noto fonts, Noto Sans Symbols font is used as a substitution for Symbol. But in this case the above suggested workaround is required.

You can check the output PDF and see what fonts are used in the output document.

Unfortunately, without providing proper fonts it is impossible to properly render the document. Each particular font cannot contain all possible glyphs.