How to resize the HTML to fit Word doc

Hi,

I am currently taking HTML pages for a client and converting them to word docs and noticed that when I use the following code:

var doc = new Document(url);
doc.Save(docPath);

It is hitting the URL and since it thinks it is too wide chops the right 20% off and puts on the page below (thus making the Word doc fairly unreadable).

I understand that Web pages are dynamic, but most of the pages the client is hitting have fairly consistent widths.

My questions are, 1. Can I shrink the HTML by 30% before converting to Word? How can I do this? 2. Is there a way for the Word doc to somehow “autodetect” and make it fit better?

Any help would be appreciated!

Mark.

Hi Mark,

Thanks for your request. Could you please attach your input HTML document here for testing? We will check the issue and provide you more information.

Best regards,

Hi Alexey, thanks for your quick reply, here is a sample URL that I am converting:
http://acis.doxtek.com/worksheet.htm

Thanks,

Mark.

Hi Mark,

Thank you for additional information. As I can see, there are tables in your HTML document. So, I think, you can try using code provided here to auto fit table to page width:

https://docs.aspose.com/words/net/working-with-tables/

Hope this helps.

Best regards,

Thanks Alexey, I implemented it, but it made the problem worse. Instead of the HTML spilling to 6 pages, it is now 12 pages.

It seems to be trying to do the resize after the doc is split into multiple pages or something. Not sure.

Hi

Thank you for additional information. As another option, you can change page orientation of your document:

Document doc = new Document(@"Test001\in.html");
foreach (Section section in doc.Sections)
{
    section.PageSetup.Orientation = Orientation.Landscape;
}
doc.Save(@"Test001\out.doc");

In this case the output document looks better. But still you should note that Aspose.Words was designed to work with MS Word documents, and HTML documents are quite different. That is why fidelity of conversion from HTML to Word and vice versa is not always perfect.

Best regards,

Alexey,

I tried another option, converting from HTML to PDF directly and with the code listed on your site:

HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
// set the Web Request timeout
request.Timeout = 10000; // 10 secs
// Retrieve request info headers
HttpWebResponse localWebResponse = (HttpWebResponse)request.GetResponse();
// Windows default Code Page (Include System.Text namespace in project)
Encoding encoding = Encoding.GetEncoding(1252);
// Read the contents of into StreamReader object
StreamReader localResponseStream = new StreamReader(localWebResponse.GetResponseStream(), encoding);
// Instantiate an object PDF class
Aspose.Pdf.Generator.Pdf pdf = new Aspose.Pdf.Generator.Pdf();
// add the section to PDF document sections collection
Aspose.Pdf.Generator.Section section = pdf.Sections.Add();
//Create text paragraphs containing HTML text
Aspose.Pdf.Generator.Text text2 = new Aspose.Pdf.Generator.Text(section, localResponseStream.ReadToEnd());
// enable the property to display HTML contents within their own formatting
text2.IsHtmlTagSupported = true;
// Add the text object containing HTML contents to PD Sections
section.Paragraphs.Add(text2);
// Specify the URL which serves as images database
pdf.HtmlInfo.ImgUrl = Path.GetDirectoryName(outputFileName);
//Save the pdf document
pdf.Save(outputFileName);

Got an error on the above “Save” command saying “Value is null or empty”, which is odd, the outputFileName parameter is valid and not null.

Any ideas? Thanks,

Mark.

Hi

Thanks for your request. You should report this problem in Aspose.Pdf forum. My colleagues from Aspose.Pdf team will answer you shortly.

Best regards,