Hello,
if we convert the attached Input Word File to PDF using german Hyphenation dictionary, the output PDF differs from the PDF which we export as PDF directly with Microsoft Word.
How can we ensure that the output PDF is the same (including the hyphenation) as in word?
We use following code:
Hyphenation Registration:
if (!Hyphenation.IsDictionaryRegistered(languageCode))
Hyphenation.RegisterDictionary(languageCode, hyphenationDictionaryFile);
Conversion after registration:
var loadOptions = new Words.Loading.LoadOptions
{
Encoding = EncodingHelper.DetectTextFileEncoding(InputFile),
};
var document = new Document(InputFile, loadOptions)
{
LayoutOptions =
{
CommentDisplayMode = CommentDisplayMode.Hide
}
};
var saveOptions = new Aspose.Words.Saving.PdfSaveOptions();
document.Save(outputFile, saveOptions);
Kind Regards,
Andy
AsposePdfConvert.pdf (60,0 KB)
Input.docx (16,5 KB)
WordPdfExport.pdf (46,4 KB)
@AStelzner
To ensure that the PDF output generated using Aspose.Words matches the hyphenation of the original Word document, you need to follow a few key steps. It appears that you are already registering the hyphenation dictionary correctly, but there are some additional considerations to ensure consistency with Microsoft Word’s output.
-
Register the Correct Hyphenation Dictionary: Make sure you are using the correct hyphenation dictionary for the German language. You can register the dictionary as follows:
if (!Hyphenation.IsDictionaryRegistered("de-CH"))
{
Hyphenation.RegisterDictionary("de-CH", hyphenationDictionaryFile);
}
-
Load the Document with Proper Encoding: When loading the document, ensure that you are detecting the correct encoding. You are already using EncodingHelper.DetectTextFileEncoding(InputFile)
, which is good.
-
Set Layout Options: You have set the CommentDisplayMode
to hide comments, which is fine. However, ensure that other layout options are set according to your needs. For example, you might want to check if there are any specific layout settings in the original Word document that need to be replicated.
-
Save the Document with PDF Options: When saving the document, you can specify additional options in PdfSaveOptions
if needed. For example, you might want to set the TextCompression
or other properties that could affect the output.
-
Testing and Comparison: After saving the document, compare the output PDF with the one generated by Microsoft Word. If there are discrepancies, check the specific words that are hyphenated differently and ensure that the hyphenation dictionary contains the necessary rules for those words.
Here is a sample code snippet that incorporates these suggestions:
var loadOptions = new Words.Loading.LoadOptions
{
Encoding = EncodingHelper.DetectTextFileEncoding(InputFile),
};
var document = new Document(InputFile, loadOptions);
if (!Hyphenation.IsDictionaryRegistered("de-CH"))
{
Hyphenation.RegisterDictionary("de-CH", hyphenationDictionaryFile);
}
var saveOptions = new Aspose.Words.Saving.PdfSaveOptions
{
// Set additional options if necessary
};
document.Save(outputFile, saveOptions);
By following these steps, you should be able to achieve a PDF output that closely matches the hyphenation and layout of the original Word document. If you continue to experience differences, consider reviewing the hyphenation rules in the dictionary you are using or testing with different versions of the dictionary.
If you have any further questions or need additional assistance, feel free to ask!
@AStelzner There are several issues in the output PDF.
-
Arial
font is used, but in the source document Aptos
font is specified. This is a cloud font, MS Word downloads it in demand. You can download these fonts here:
https://www.microsoft.com/en-us/download/details.aspx?id=106087
If Aspose.Words cannot find the font used in the document, the font is substituted. This might lead into fonts mismatch and document layout differences due to the different fonts metrics. You can implement IWarningCallback to get notifications when font substitution is performed.
Please see our documentation to learn where Aspose.Words looks for fonts:
https://docs.aspose.com/words/net/specifying-truetype-fonts-location/
-
To get the output closer to MS Word it is required to enable open type features. Aspose.Words.Shaping.Harfbuzz package provides support for OpenType features in Aspose.Words using the HarfBuzz text shaping engine. You should enabling open type features to get the expected result. To achieve this you should add reference to Aspose.Words Shaping Harfbuzz
plugin and use the following code to convert your document:
// "C:\Temp\fonts" contains Aptos fonts.
FontSettings.DefaultInstance.SetFontsSources(new FontSourceBase[] { new SystemFontSource(), new FolderFontSource(@"C:\Temp\fonts", true) });
// Register hyphenation dictionaries.
Hyphenation.RegisterDictionary("de-DE", @"C:\Temp\hyph_de_DE.dic");
Hyphenation.RegisterDictionary("en-US", @"C:\Temp\hyph_en_US.dic");
Document doc = new Document(@"C:\Temp\in.docx");
// Enable text shaping
doc.LayoutOptions.TextShaperFactory = Aspose.Words.Shaping.HarfBuzz.HarfBuzzTextShaperFactory.Instance;
doc.Save(@"C:\Temp\out.pdf");
Here is the produced output: out.pdf (32.0 KB)
The result is much closer to MS Word output. But you should note that MS Word might use different hyphenation dictionaries than used on your side. This also might cause the difference.
Hi,
Thanks for the tips.
Unfortunately, Aspose.Words.Shaping.HarfBuzz is not available, I’m using the latest Aspose.Words version 25.5.0.
Kind Regards,
Andy
Wait, I missed the hint with the plugin… I’ll try it!
1 Like
@AStelzner Yes, it is required to additionally install Aspose.Words.Shaping.Harfbuzz package.