Hi,
I am evaluating Aspose.Words and one of the types of conversions we need is to take HTML documents (that are really page fragments) and convert them to PDF. They are badly constructed and it’s impressive that the parsing is anywhere near correct but it differs very slightly from any modern browser (and from MS-Word).
The right aligned business address should be in Frutiger or Arial font (as specified in the HTML’s inline style), but it renders in a Serif (default?) font. Everything else seems to render accurately.
This is the (quite straight forward) C#:
MemoryStream outputStream = new MemoryStream();
Document HTMLdoc = new Document();
DocumentBuilder docBuilder = new DocumentBuilder(HTMLdoc);
docBuilder.InsertHtml(HTMLout);
HTMLdoc.Save(outputStream, SaveFormat.Pdf);
Where HTMLout is a string of HTML like this sample (which I have anonymised and indented, the original would have no line breaks, but otherwise left unchanged):
A new version of the document will be sent should new information become available
A new version of the document will be sent should new information become available
General Letter 2 March 2012
Customer Address Line 1 Customer Address Line 2 Customer Address Line 2 Town/City Post Code
|
Division A Business Group - Customer Care
Our Address Line 1 Our Address Line 2 Our Address Line 3 Town/City Post Code
Telephone: 012 3456 7890 Fax: 012 3456
|
MRS DUMMY Customer, Customer Address Line 1, Customer Address Line 2, Customer Address Line 2, Town/City. Post Code
DOB: 01/01/1900, Customer Number: 1234567
Copies to:
Partner Name
Partner Address Line 1
Partner Address Line 2
Partner Address Line 3
Town/City
Post Code
Yes, that is a complete example of the HTML I have to work with!
Ideally I would like the HTML to parse perfectly as it stands but if there is a small edit I can apply to the HTML to make it parse correctly then that would be a possible work-around.
Many thanks for any help you can offer,
Kevin
Hi Kevin,
Thanks for your inquiry. Please note that Aspose.Words mimics the same behavior as MS Word does. If you convert the shared html to Pdf by using MS Word, you will get the same output as generated by Aspose.Words.
I have tested the scenario and have not found the shared issue while using latest version of Aspose.Words for .NET 13.12.0. Please use the latest version of Aspose.Words for .NET 13.12.0. I have attached the output Pdf and Docx files with this post for your kind reference.
Thanks for your quick response. As you have given a DOCX document I am not sure your test is matching like with like. I have attached my equivalent examples to the ones you provided, they demonstrate the difference.
Ignoring the only other differences (Aspose’s watermark and successful image retrieval) you can see the problem. Aspose.Words does not apply the specified font when comparing its PDF conversion with using Word Interop to Save As PDF. This is the functionally equivalent comparison.
If you wish to try to reproduce the problem, the Interop dll (Microsoft.Office.Interop.Word) I am using is version 14.0.0.0.
Thanks.
P.S. As I am evaluating Aspose.Words, you can be sure that I am using Aspose.Words for .Net version 13.12.0 downloaded in the last couple of days.
P.P.S. Here’s the original HTML fragment directly opened in Word 2007 and saved into DOCX format. It correctly formats the font in the right-aligned address. A similar manual save as PDF also correctly formats the font.
Hi Kevin,
Thanks for your inquiry.
Kevin Hamer:
Please let me know how quickly you can issue a bug fix.
As I am evaluating whether to use the product or not (which obviously includes evaluating support), please also explain how Tahir’s sentence I quoted above can be true.
Please accept my apologies for your inconvenience.
Aspose.Words takes first value from font
names (“frutiger-bold,arial” …>) while rendering Pdf file. In your case, it is ‘frutiger’ (style=“FONT-SIZE: 9pt; FONT-FAMILY: frutiger-bold,arial”) . If
Aspose.Words does not find the first font, the default font is
used in Pdf.
If Aspose.Words does not find the first font (
Frutiger), the second font (
Arial) should be used by Aspose.Words. This was a missing feature in Aspose.Words. I have verified that this feature is available in
latest version of Aspose.Words.
Kevin Hamer:
The right aligned business address should be in Frutiger or Arial font (as specified in the HTML’s inline style), but it renders in a Serif (default?) font. Everything else seems to render accurately.
I have tested the scenario again and have managed to reproduce the same issue at my side. For the sake of correction, I have logged this problem in our issue tracking system as
WORDSNET-9607. I have linked this forum thread to the same issue and you will be notified via this forum thread once this issue is resolved.
We apologize for your inconvenience.
The issues you have found earlier (filed as WORDSNET-9607) have been fixed in this .NET update and this Java update.
This message was posted using Notification2Forum from Downloads module by aspose.notifier.