Getting some problem when using DocumentBuilder.InsertHTML

alexey.noskov · August 26, 2009, 11:40pm

Hi

Thanks for your request. Please attach your input and output documents and code here for testing. I will check the issue and provide you more information.

Best regards.

thavika.kavitha · August 27, 2009, 1:31am

i am giving information.

AndreyN · August 27, 2009, 4:19am

Hi

Thanks for your request. Could you please attach your code here for testing? I will check the problem on my side and provide you more information.

Best regards,

thavika.kavitha · August 27, 2009, 4:32am

ok check it now

AndreyN · August 27, 2009, 7:23am

Hi

Thank you for additional information. There is no direct way to set width of a table using Aspose.Words. But you can set width of cells. So you can calculate width of each column. Please see code below, which demonstrate how to calculate width of columns.

// Calculate width of the page
PageSetup ps = builder.CurrentSection.PageSetup;
double pageWidth = ps.PageWidth - ps.RightMargin - ps.LeftMargin;
// Since we know number of columns in our table we can calculate width of each column.
double colWidth = pageWidth / 4;
// Set default column width
builder.CellFormat.Width = colWidth;

Hope this helps.

Best regards,

thavika.kavitha · September 17, 2009, 4:17am

hi ,
i have some in regular expression
Regex r = new Regex("href\s\*=\s\*(?:(?:\"(?[^\"]*)\")|(?[^\s]* ))"); this is is my code but itis error

in my html page i want links(hrefs).that link write in the word document.

and i am giving staring and ending html tags i want middle links .
that means href link in one line and that middle title in another line

plz help me

AndreyN · September 17, 2009, 5:59am

Hi

Thanks for your inquiry. Unfortunately, I am not sure what you would like to achieve. Could you please show me your code and attach your output and expected documents? I will check the issue and provide you more information.

Best regards,

thavika.kavitha · September 17, 2009, 10:37pm

i already attached the(input/output) document plz check it out once again

AndreyN · September 18, 2009, 1:33am

Hi

Thank you for additional information. What I can see in the document you have attached is HTML code. I suppose, this is your input. Is that right? Then, if I understand you correctly, you use your Regular Expression to somehow process this HTML. Please provide me simple code, which shows how you process your HTML.

Actually, it is not quite clear from the document you have attached, what the expected output is. Please clarify.

Best regards,

thavika.kavitha · September 18, 2009, 1:37am

http://highoncoding.com/Articles/105\_HTML\_Screen\_Scraping\_using\_C\_\_\_Net\_WebClient.aspx

like this

System.Net;

System.Text;

System.IO // If you plan to write in a file

// creates a button protected System.Web.UI.WebControls.Button Button1; 
// creates a byte array private byte[] aRequestHTML; 
// creates a string private string myString = null; 
// creates a datagrid protected System.Web.UI.WebControls.DataGrid DataGrid1; 
// creates a textbox protected System.Web.UI.WebControls.TextBox TextBox1; 
// creates the label protected System.Web.UI.WebControls.Label Label1; 
// creates the arraylist private ArrayList a = new ArrayList();

Okay now lets see some button click code that does the actual work.

private void Button1_Click(object sender, System.EventArgs e)
{
    // make an object of the WebClient class
    WebClient objWebClient = new WebClient();
    // gets the HTML from the url written in the textbox
    aRequestHTML = objWebClient.DownloadData(TextBox1.Text);
    // creates UTf8 encoding object
    UTF8Encoding utf8 = new UTF8Encoding();
    // gets the UTF8 encoding of all the html we got in aRequestHTML
    myString = utf8.GetString(aRequestHTML);
    // this is a regular expression to check for the urls
    Regex r = new Regex("href\\s *=\\s * (?: (?:\"(?[^\"]*)\")|(?[^\\s]* ))");
    // get all the matches depending upon the regular expression
    MatchCollection mcl = r.Matches(myString);

    foreach (Match ml in mcl)
    {
        foreach (Group g in ml.Groups)
        {
            string b = g.Value + "";
            // Add the extracted urls to the array list
            a.Add(b);

        }
    }
    // assign arraylist to the datasource
    DataGrid1.DataSource = a;
    // binds the databind
    DataGrid1.DataBind();

    // The following lines of code writes the extracted Urls to the file named test.txt
    StreamWriter sw = new StreamWriter(Server.MapPath("test.txt"));
    sw.Write(myString);
    sw.Close();

AndreyN · September 18, 2009, 3:17am

Hi

Thank you for additional information. Now, it is clearer what you would like to achieve. However, I have another question. How is this related to Aspose.Words?

If you need to extract all links from your HTML and write them into a Word document, you can try using code like the following:

// Get your HTML string
string html = File.ReadAllText(@"Test001\test.html");
// Create gegular expression, which will match links.
Regex urlRegex = new Regex("href\\s\*=\\s\*[\"']+(http(s)?://([\\w-]+\\.)+[\\w-]+(/[\\w- ./?%&=]\*)?)[\"']+");
MatchCollection matchs = urlRegex.Matches(html);
// Write matched URLS into the Word document
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
foreach (Match match in matchs)
{
    builder.Writeln(match.Groups[1].Value);
    // If you need to insert URLs into the Word docuemnt as Links, uncomment the next 4 lines and comment the previouse.
    //builder.Font.Underline = Underline.Single;
    //builder.Font.Color = Color.Blue;
    //builder.InsertHyperlink(match.Groups[1].Value, match.Groups[1].Value, false);
    //builder.Writeln();
}
// Save output document
doc.Save(@"Test001\out.doc");

Best regards,

aspose.notifier · January 1, 2014, 6:17am

The issues you have found earlier (filed as WORDSNET-228) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

aspose.notifier · August 9, 2015, 3:27am

The issues you have found earlier (filed as WORDSNET-866) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.

aspose.notifier · February 7, 2019, 4:31pm

The issues you have found earlier (filed as ) have been fixed in this update. This message was posted using BugNotificationTool from Downloads module by MuzammilKhan