And the result is a .docx file, but with ‘default’ fonts. I’d like it to have all the fonts that .pdf file had embedded. Attaching the input file (was not able to attach .docx document, as it’s not authorized):
In order to generate an output using desired fonts, those fonts should be installed in the system so that API can access them during generation process. The embedded fonts inside PDF documents cannot be used for such purpose. However, we have logged an investigation ticket as PDFJAVA-39909 in our issue tracking system to further analyze your requirements. We will check the feasibility of logged feature and let you know as soon as the ticket is resolved. Please be patient and spare us some time.
@asad.ali
Thanks for your response. I have a question then - if have those fonts somewhere in the classpath, can I ‘force’ Aspose to use them as they were installed system fonts? Maybe something similar to FontSubstitution mechanism, or something?
You can surely use setLocalFontPaths() method to set the path to installed fonts in the system other than default folder. Furthermore, would you please share the sample output document which you have created at your side?
The problem seems to be that the PDF document have those fonts embedded: (checked using pdfDocument.getFontUtilities().getAllFonts()):
FONT NAME:Tinos-Regular
FONT NAME:Tinos-Bold
FONT NAME:ShadowsIntoLightTwo-Regular
FONT NAME:Lora-Regular
FONT NAME:JustMeAgainDownHere
FONT NAME:Lora-Regular
and I do have those in /usr/share/fonts (we use those fonts to substitute other fonts while using PDF->PNG conversion, and that works.
But when I list fonts in converted docx document, using (wordDocument.getFontInfos()) I see:
FONT INFO NAME: Times New Roman
FONT INFO NAME: Symbol
FONT INFO NAME: Arial
FONT INFO NAME: Calibri
FONT INFO NAME: Cambria Math
FONT INFO NAME: Tinos
FONT INFO NAME: IFFGTO+Tinos-Regular
FONT INFO NAME: UJNKSF+Lora Regular
FONT INFO NAME: Just Me Again Down Here
FONT INFO NAME: JRGWIJ+Shadows Into Light Two
those names don’t match, is it possible that this is the case?
We have further investigated and found that it was not A Bug in the API. The word documents have some default style with these system fonts:
Times New Roman
Symbol
Arial
Calibri
Cambria Math
You could verify it by converting an empty document:
Document pdfDocument = new Document();
pdfDocument.getPages().add();
pdfDocument.save(ouput, SaveFormat.DocX);
These fonts are not used for text in the output document normally but could be used for Aspose Watermark in evolution mode and for situations when some fonts are not found.
@asad.ali
could you please walk me through the process of converting PDF to DOCX using Aspose? I have attached the PDF file: inventoryChecklist.pdf (40.9 KB)
when I use your online converter (Convert Files Online - Word, PDF, HTML, JPG And Many More) it opens correctly on my Macbook, with proper fonts, etc. However, when I use the code from my original post, output .docx file has fonts missing and replaced by incorrect ones:
The online app which you are using for conversion implements Aspose.Words for .NET which uses different conversion engine to carry out PDF to DOCX. We used the below code snippet with Aspose.PDF for Java 21.2 and obtained the attached DOCX file. Could you please open it in MAC and let us know if you notice any issue?
Document doc = new Document(dataDir + "inventoryChecklist.pdf");
DocSaveOptions saveOption = new DocSaveOptions();
saveOption.setMode(DocSaveOptions.RecognitionMode.Flow);
saveOption.setFormat(DocSaveOptions.DocFormat.DocX);
saveOption.setRecognizeBullets(true);
doc.save(dataDir + "Sample_21.2.docx", saveOption);
Thanks. Attached .docx file looks good, fonts match those from PDF. However, when I try to convert it using the exact snippet you have provided, it still looks different on my machine. Might this be happening because of some missing system fonts, or, in other words, is it possible that the same code would produce different results based on what fonts are installed on the machine?
Btw. I’ve tried this on both 21.1 (got the license) and 21.2 (without license).
Yes, this is possible as API uses system fonts and chooses suitable fonts while producing PDF document. Which is why we recommend installing all Microsoft essential fonts in the system where API is being used. Please try placing all MS Core fonts in your system and convert the document again. Feel free to let us know if issue still persists.
When I list all the fonts installed on the OS, using following piece of code:
GraphicsEnvironment ge = GraphicsEnvironment.getLocalGraphicsEnvironment();
String[] families = ge.getAvailableFontFamilyNames();
for (String family : families) {
System.out.println(family);
}
I get the list:
**Cambria**
Caveat
Dancing Script
DejaVu Sans
DejaVu Sans Mono
DejaVu Serif
Dialog
DialogInput
Just Me Again Down Here
Liberation Mono
Liberation Sans
Liberation Serif
Lora
Monospaced
MS Gothic
Open Sans
SansSerif
Serif
Shadows Into Light Two
StandardSymL
**Times New Roman**
Tinos
and as we can see, both Times New Roman and Cambria are there, however, when I open the converted docx on Mac, I can see that those fonts appear missing:
If you convert PDF to DOCX in Windows Environment and open obtained DOCX in MAC, do you see missing fonts issue in the file?
OR are you performing the conversion inside MAC and facing the issue in file generated in MAC only? If so, did you try to open that file in Windows Environment?
.Arabic UI Display Black
.ArabicUIText
.Helvetica Neue DeskInterface
.SF Compact Display
.SF Compact Rounded
.SF Compact Text
.SF NS Display Condensed
.SF NS Text
.SF NS Text Condensed
Al Bayan
Al Nile
Al Tarikh
American Typewriter
Andale Mono
Apple Braille
Apple Chancery
Apple Color Emoji
Apple LiGothic
Apple LiSung
Apple SD Gothic Neo
Apple Symbols
AppleGothic
AppleMyungjo
Arial
Arial Black
Arial Hebrew
Arial Hebrew Scholar
Arial Narrow
Arial Rounded MT Bold
Arial Unicode MS
Athelas
Avenir
Avenir Book
Avenir Next
Avenir Next Condensed
Ayuthaya
Baghdad
Bangla MN
Bangla Sangam MN
Baoli SC
Baoli TC
Baskerville
Beirut
BiauKai
Big Caslon
Bodoni 72
Bodoni 72 Oldstyle
Bodoni 72 Smallcaps
Bodoni Ornaments
Bradley Hand
Brush Script MT
Chalkboard
Chalkboard SE
Chalkduster
Charter
Cochin
Comic Sans MS
Copperplate
Corsiva Hebrew
Courier
Courier New
Damascus
DecoType Naskh
Devanagari MT
Devanagari Sangam MN
Dialog
DialogInput
Didot
DIN Alternate
DIN Condensed
Diwan Kufi
Diwan Thuluth
Euphemia UCAS
Farah
Farisi
Futura
GB18030 Bitmap
Geeza Pro
Geneva
Georgia
Gill Sans
Gujarati MT
Gujarati Sangam MN
GungSeo
Gurmukhi MN
Gurmukhi MT
Gurmukhi Sangam MN
Hannotate SC
Hannotate TC
HanziPen SC
HanziPen TC
HeadLineA
Hei
Heiti SC
Heiti TC
Helvetica
Helvetica Neue
Herculanum
Hiragino Kaku Gothic Pro
Hiragino Kaku Gothic ProN
Hiragino Kaku Gothic Std
Hiragino Kaku Gothic StdN
Hiragino Maru Gothic Pro
Hiragino Maru Gothic ProN
Hiragino Mincho Pro
Hiragino Mincho ProN
Hiragino Sans
Hiragino Sans CNS
Hiragino Sans GB
Hiragino Sans GB W3
Hiragino Sans W0
Hiragino Sans W1
Hiragino Sans W2
Hiragino Sans W3
Hiragino Sans W4
Hiragino Sans W5
Hiragino Sans W6
Hiragino Sans W7
Hiragino Sans W8
Hiragino Sans W9
Hoefler Text
Impact
InaiMathi
Iowan Old Style
ITF Devanagari
ITF Devanagari Marathi
Kai
Kailasa
Kaiti SC
Kaiti TC
Kannada MN
Kannada Sangam MN
Kefa
Khmer MN
Khmer Sangam MN
Klee
Kohinoor Bangla
Kohinoor Devanagari
Kohinoor Telugu
Kokonor
Krungthep
KufiStandardGK
Lantinghei SC
Lantinghei TC
Lao MN
Lao Sangam MN
Liberation Mono
Liberation Sans
Liberation Serif
Libian SC
Libian TC
LiHei Pro
LingWai SC
LingWai TC
LiSong Pro
Lucida Grande
Luminari
Malayalam MN
Malayalam Sangam MN
Marion
Marker Felt
Menlo
Microsoft Sans Serif
Mishafi
Mishafi Gold
Monaco
Monospaced
MS Gothic
Mshtakan
Muna
Myanmar MN
Myanmar Sangam MN
Nadeem
Nanum Brush Script
Nanum Gothic
Nanum Myeongjo
Nanum Pen Script
New Peninim MT
Noteworthy
Noto Nastaliq Urdu
Optima
Oriya MN
Oriya Sangam MN
Osaka
Palatino
Papyrus
PCMyungjo
Phosphate
PilGi
PingFang HK
PingFang SC
PingFang TC
Plantagenet Cherokee
PT Mono
PT Sans
PT Sans Caption
PT Sans Narrow
PT Serif
PT Serif Caption
Raanana
Rockwell
Sana
SansSerif
Sathu
Savoye LET
Seravek
Serif
Shree Devanagari 714
SignPainter
Silom
Sinhala MN
Sinhala Sangam MN
Skia
Snell Roundhand
Songti SC
Songti TC
StandardSymL
STFangsong
STHeiti
STIXGeneral
STIXIntegralsD
STIXIntegralsSm
STIXIntegralsUp
STIXIntegralsUpD
STIXIntegralsUpSm
STIXNonUnicode
STIXSizeFiveSym
STIXSizeFourSym
STIXSizeOneSym
STIXSizeThreeSym
STIXSizeTwoSym
STIXVariants
STKaiti
STSong
Sukhumvit Set
Superclarendon
Symbol
System Font
Tahoma
Tamil MN
Tamil Sangam MN
Telugu MN
Telugu Sangam MN
Thonburi
Times
Times New Roman
Toppan Bunkyu Gothic
Toppan Bunkyu Midashi Gothic
Toppan Bunkyu Midashi Mincho
Toppan Bunkyu Mincho
Trattatello
Trebuchet MS
Tsukushi A Round Gothic
Tsukushi B Round Gothic
Verdana
Waseem
Wawati SC
Wawati TC
Webdings
Weibei SC
Weibei TC
Wingdings
Wingdings 2
Wingdings 3
Xingkai SC
Xingkai TC
Yuanti SC
Yuanti TC
YuGothic
YuKyokasho
YuKyokasho Yoko
YuMincho
YuMincho +36p Kana
Yuppy SC
Yuppy TC
Zapf Dingbats
Zapfino
البيان
التاريخ
النيل
بغداد
بيروت
جيزة
دمشق
ديوان ثلث
ديوان كوفي
صنعاء
فارسي
فرح
منى
مِصحفي
مِصحفي ذهبي
نديم
نسخ
وسيم
in which everything works correctly.
‘cloud’ env, which is basically a Docker container with an Ubuntu distro running underneath, with fonts from my previous post - here it does not work, it complains about missing fonts (even if they are there, as I can see in the response from getAvailableFontFamilyNames())
These fonts should be placed in “/usr/share/fonts/truetype/msttcorefonts” directory as Aspose.PDF scans this folder on Linux like operating systems. Furthermore, as shared earlier, you can use setLocalFontPaths() method to set the path to the fonts so that API can find those fonts during conversion. To check where the API is searching or scanning for the fonts, you can use getLocalFontPaths() methods as well. Please let us know in case suggested information did not help in resolving your issue. We will further proceed to assist you accordingly.
When I use getLocalFontPaths() the result I get is:
/System/Library/Fonts for Mac
and:
/usr/share/fonts and /usr/local/share/fonts
And all the fonts are copied there during Docker image build.
I have also tried to copy all of them into “/usr/share/fonts/truetype/msttcorefonts” and setting this folder using setLocalFontPaths(), but it didn’t solve the problem.
We will further investigate the reason behind the issue that you are facing. Would you kindly share the sample Docker file with us in .zip format so that we can setup the similar environment to investigate the issue. We will log another ticket in our issue tracking system and share the ID with you.
We tested the scenario in our environment (Linux Ubuntu and CentOS) but we could not replicate the same issue that you are facing. However, we have logged an investigation ticket as PDFJAVA-40251 in our issue tracking system for the sake of further analysis. We will look into details of it and keep you posted with the status of its rectification. Please be patient and spare us some time.
We are sorry for the inconvenience.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
Enables storage, such as cookies, related to analytics.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.
Sets consent for personalized advertising.
Cookie Notice
To provide you with the best experience, we use cookies for personalization, analytics, and ads. By using our site, you agree to our cookie policy.
More info
Enables storage, such as cookies, related to analytics.
Enables storage, such as cookies, related to advertising.
Sets consent for sending user data to Google for online advertising purposes.