We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Word to html and then to pdf issue

Hi, i have an issue for html to pdf process.
For a word file, if save it to pdf directly, the pdf format is very perfect.
but if i save the same word to html and then convert the html to pdf, the pdf is not same as up.
for some reason, i must use method 2 to handle the html in fact, and then convert it to pdf.
the attachment is the demo file(aspose word 10.1 + vs2005), any suggestions for it.

Thanks.

private void button1_Click(object sender, EventArgs e)
{
word2html2pdf("demo.doc");
}

private void button2_Click(object sender, EventArgs e)
{
word2pdf("demo.doc");
}


///


/// word to pdf, the pdf file is perfect
///

///
private void word2pdf(string docFile)
{
string filename = new FileInfo(docFile).Name.Replace(new FileInfo(docFile).Extension, "");
Document doc = new Document(this.txtWord.Text);
doc.Save(filename + "(word2pdf).pdf");
MessageBox.Show("OK");
}

///


/// convert word to html, and then convert the html to pdf or word.
///

///
private void word2html2pdf(string docFile)
{
string filename = new FileInfo(docFile).Name.Replace(new FileInfo(docFile).Extension, "");

string htmlFile = filename + "(word2html).html";
string newDocFile = filename + "(html2word).doc";
string newPDFFile = filename + "(html2pdf).pdf";

Aspose.Words.Document doc = new Aspose.Words.Document(docFile);
doc.Save(htmlFile, Aspose.Words.SaveFormat.Html);

//handle the html file process...

Stream stream = File.OpenRead(htmlFile);
LoadOptions loadOptions = new LoadOptions();
loadOptions.BaseUri = Application.StartupPath;
loadOptions.LoadFormat = LoadFormat.Html;
doc = new Document(stream, loadOptions);
stream.Close();

doc.Save(newDocFile); //the word format is different with original doc
doc.Save(newPDFFile); //some data cann't show on the right
MessageBox.Show("OK");
}

Hi there,

Thanks for your inquiry.

I'm afraid that when you convert to HTML format and back to another format, the fidelity of the document most likely will not be preseved. This is because HTML is a different type of format compared with Word documents and it's not always possible to layout and preserve all elements in a document when converting to HTML.

I would suggest to simply convert directly from DOC to PDF and also convert from DOC to HTML. Could you please clarify why you need to convert to HTML first? We can perhaps suggest another way of achieving what it is you need.

Thanks,

Thanks for your reply. yes, the doc to pdf is very good, but i need html to pdf because after i convert word to html and show it in a webpage, i will add many dynamic html elements with jquery(it's hard to handle the word directly in some case), so the word file only is a template, and the last html maybe very different with orginal word, i think it's better to convert the new html to pdf.

I tried aspose pdf, it get error some time and aspose word format is more better.

Hi

Thank you for additional information. I think, you can try using code suggested here to adjust tables width when convert from HTML to PDF:

http://www.aspose.com/documentation/.net-components/aspose.words-for-.net/howto-autofit-a-table-to-page-width.html

However, I absolutely agree with Adam, it would be better to extract additional data from your HTML and put them into the Word document. Then convert Word document to PDF.

Best regards,

Hi there,

Thanks for this additional information.

In addition to Alexey's suggestion, please try using code below and the modified template attached to this post to see if the output can be made even a bit more better. Instead of using negative indent on tables, smaller margins are used along with exporting page setup information to HTML. This allows the margins to be imported when converting to PDF and the tables to be aligned a tad better.

Document doc = new Document("demo.doc");<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

HtmlSaveOptions htmlOptions = new HtmlSaveOptions();

htmlOptions.ExportPageSetup = true;

MemoryStream htmlOutput = new MemoryStream();

doc.Save(htmlOutput, htmlOptions);

Document doc2 = new Document(htmlOutput);

FitTableToPageWidth((Table)doc2.GetChild(NodeType.Table, 0, true));

doc2.Save("Output.pdf");

The PDF output looks nice on the template along apart for one of the rows in the first table. You can try to fix this row by making the table or page margins smaller. I'm not sure how it will look after your extra content is added though.

As Alexey has stated, the best way to produce a nice looking document is to try translate the tasks that you execute on the HTML from JQuery into tasks done directly onto the Document when it's loaded in Aspose.Words e.g using DocumentBuilder.

If you have any further queries, please feel free to ask.

Thanks,

Could you confirm below issue?
I want to insert checkbox or other input type element to document, it must use InsertTextInput, InsertCheckBox, InsertComboBox?

//ok
builder.InsertCheckBox("cb1", true, 12);
builder.Write("ABC ");
builder.InsertCheckBox("cb2", false, 12);
builder.Write("123");

Can I do it like below?
//no effect
builder.InsertHtml("ABC  123");

I thought it can insert any html fragment. but it seems that only can be div, p, b etc tags, but can't insert input tag directly, correct? if yes, that means i can't add radiobutton to document?

Thanks you guys help.

Hi

Thanks for your request. I cannot reproduce this problem on my side. I used the following code:

DocumentBuilder builder = new DocumentBuilder();

builder.InsertHtml("ABC  123");

builder.Document.Save(@"Test001\out.doc");

I attached the output document. As you can see form fields are there.

Best regards,

Thanks. I fixed my code bug.