Getting some problem when using DocumentBuilder.InsertHTML

hi,

i am not able to render the table using DocumentBuilder.InsertHTML() in word file as well as pdf file. i am using the latest dll (Word- Version 4.0.2.0 and PDF – Version 3.2.2.0). But stil my table rendering problem is not Solved. i have sent so many mails regarding this issue. And i got the reply from u saying that it is fixed in latest dll. But when i use latest dll, there is no change in word file as well as pdf file. There are other issues related to table

  1. Table is shrinking in word file.

  2. Table border color is not rendered in both word as well as pdf.

So pls give me a solution ASAP because i have to answer to the client regarding this issue.I am also sending the code and screenshots of word and pdf file which should match according to the HTML view.

With the following function i am trying to render the HTML using InsertHTML() for word.

Private Sub BindHTML(ByVal builder As Aspose.Words.DocumentBuilder, ByVal Content As String)

Content = Content.Replace(" ", "  ")
Dim \_CurrentNode As Aspose.Words.Node = builder.CurrentParagraph
If Content.IndexOf("<") > -1 And Content.IndexOf(">") > -1 Then
builder.InsertHtml(Content)

Else
builder.Write(System.Web.HttpUtility.HtmlDecode(Content))
End If
Do While (Not \_CurrentNode Is Nothing)
If \_CurrentNode.NodeType = NodeType.Paragraph Then
With CType(\_CurrentNode, Aspose.Words.Paragraph)
.ParagraphFormat.SpaceAfterAuto = False
.ParagraphFormat.SpaceAfter = 0
Call ChangeFont(.Runs)
End With
End If
_CurrentNode = _CurrentNode.NextSibling
Loop
End Sub

------------------------------------------------------

Following is the HTML code used for both word as well as pdf .

Lakshmi Chaya SharadaChaya
Chaya Sharada LakshmiChayaChaya
Sharada Lakshmi Chaya

____________________________________________________

With the following function i am trying to render the HTML using InsertHTML() for pdf.

Private Sub BindHTML(ByVal Content As String, ByVal currentNode As Aspose.Pdf.Cell)

Dim text As Aspose.Pdf.Text
If currentNode.Paragraphs.Count > 0 Then
text = currentNode.Paragraphs(0)
Else
Exit Sub
End If
Content = Content.Replace(" ", "  ")
If Content.IndexOf("<") > -1 And Content.IndexOf(">") > -1 Then

If text.Segments.Count = 1 Then
If text.Segments(0).Content Is Nothing Then
text.Segments.RemoveAt(0)
End If
End If
Dim doc As New Aspose.Words.Document
Dim builder As New Aspose.Words.DocumentBuilder(doc)

builder.InsertHtml(Content)

builder.CurrentParagraph.ParagraphFormat.SpaceAfterAuto = False
builder.CurrentParagraph.ParagraphFormat.SpaceAfter = 0

Dim ObjMemoryStream As New System.IO.MemoryStream
doc.Save(ObjMemoryStream, Aspose.Words.SaveFormat.FormatAsposePdf)

Dim tmpPDF As New Aspose.Pdf.Pdf
tmpPDF.BindXML(ObjMemoryStream, Nothing)
ObjMemoryStream.Close()

Dim intSection, intParagraph As Int16
Dim _Paragraph As Aspose.Pdf.Paragraph
For intSection = 0 To tmpPDF.Sections.Count - 1
For intParagraph = 0 To tmpPDF.Sections(intSection).Paragraphs.Count - 1
_Paragraph = tmpPDF.Sections(intSection).Paragraphs(intParagraph)
If TypeOf \_Paragraph Is Aspose.Pdf.Text Then
Call ChangeFont(CType(\_Paragraph, Aspose.Pdf.Text).Segments, text.TextInfo.IsTrueTypeFontBold)
End If
currentNode.Paragraphs.Add(\_Paragraph)
Next intParagraph
Next intSection
Else
Dim _Segment As Aspose.Pdf.Segment = GetSegment(HttpUtility.HtmlDecode(Content))
If text.TextInfo.IsTrueTypeFontBold Then
_Segment.TextInfo.IsTrueTypeFontBold = True
End If
text.Segments.Add(\_Segment)
End If
End Sub

With regards,

I Prabhaharan

hi,

i am not able to render Font background color and Horizantal tag(

tag) using DocumentBuilder.InsertHTML() in word file as well as pdf file. i am using the latest dll (Word- Version 4.0.2.0 and PDF – Version 3.2.2.0). But stil my Font background color and Horizantal tag problem is not Solved. i have sent so many mails regarding this issue. And i got the reply from u saying that it is fixed in latest dll. But when i use latest dll, there is no change in word file as well as pdf file.

So pls give me a solution ASAP because i have to answer to the client regarding this issue.I am also sending the code and screenshots of word and pdf file which should match according to the HTML view.And if possible pls tel me aproximately like when i can get solution for these issues.

With the following function i am trying to render the HTML using InsertHTML() for word.

Private Sub BindHTML(ByVal builder As Aspose.Words.DocumentBuilder, ByVal Content As String)

Content = Content.Replace(" ", "  ")
Dim \_CurrentNode As Aspose.Words.Node = builder.CurrentParagraph
If Content.IndexOf("<") > -1 And Content.IndexOf(">") > -1 Then
builder.InsertHtml(Content)

Else
builder.Write(System.Web.HttpUtility.HtmlDecode(Content))
End If
Do While (Not \_CurrentNode Is Nothing)
If _CurrentNode.NodeType = NodeType.Paragraph Then
With CType(\_CurrentNode, Aspose.Words.Paragraph)
.ParagraphFormat.SpaceAfterAuto = False
.ParagraphFormat.SpaceAfter = 0
Call ChangeFont(.Runs)
End With
End If
_CurrentNode = _CurrentNode.NextSibling
Loop
End Sub

Following is the HTML code(For Horizanatal rule and Font Background) used for both word as well as pdf .

FontBackground HTML code:

who is the father of our nation

who is the father of our nation

Horizantal Rule HTML Code:

Enter text below

Enter text below


With the following function i am trying to render the HTML using InsertHTML() for pdf.

Private Sub BindHTML(ByVal Content As String, ByVal currentNode As Aspose.Pdf.Cell)

Dim text As Aspose.Pdf.Text
If currentNode.Paragraphs.Count > 0 Then
text = currentNode.Paragraphs(0)
Else
Exit Sub
End If
Content = Content.Replace(" ", "  ")
If Content.IndexOf("<") > -1 And Content.IndexOf(">") > -1 Then

If text.Segments.Count = 1 Then
If text.Segments(0).Content Is Nothing Then
text.Segments.RemoveAt(0)
End If
End If
Dim doc As New Aspose.Words.Document
Dim builder As New Aspose.Words.DocumentBuilder(doc)

builder.InsertHtml(Content)

builder.CurrentParagraph.ParagraphFormat.SpaceAfterAuto = False
builder.CurrentParagraph.ParagraphFormat.SpaceAfter = 0

Dim ObjMemoryStream As New System.IO.MemoryStream
doc.Save(ObjMemoryStream, Aspose.Words.SaveFormat.FormatAsposePdf)

Dim tmpPDF As New Aspose.Pdf.Pdf
tmpPDF.BindXML(ObjMemoryStream, Nothing)
ObjMemoryStream.Close()

Dim intSection, intParagraph As Int16
Dim \_Paragraph As Aspose.Pdf.Paragraph
For intSection = 0 To tmpPDF.Sections.Count - 1
For intParagraph = 0 To tmpPDF.Sections(intSection).Paragraphs.Count - 1
_Paragraph = tmpPDF.Sections(intSection).Paragraphs(intParagraph)
If TypeOf \_Paragraph Is Aspose.Pdf.Text Then
Call ChangeFont(CType(\_Paragraph, Aspose.Pdf.Text).Segments, text.TextInfo.IsTrueTypeFontBold)
End If
currentNode.Paragraphs.Add(\_Paragraph)
Next intParagraph
Next intSection
Else
Dim \_Segment As Aspose.Pdf.Segment = GetSegment(HttpUtility.HtmlDecode(Content))
If text.TextInfo.IsTrueTypeFontBold Then
_Segment.TextInfo.IsTrueTypeFontBold = True
End If
text.Segments.Add(\_Segment)
End If
End Sub

With regards,

I Prabhaharan

You should understand that full support of all HTML quirks and dialects is a very complex task. Even some browsers are still not able to do this in full. But we are trying to make our HTML import better, continuously working to improve it, taking into account user feedback and suggestions. It simply takes time to do this and we are also working on many other features as well.

As for your current problems the current state of things is as follows:

Font highlighting with is supported now.

Font highlighting with is not supported yet. Request is logged as issue #1373 in our defect base.

tag is not supported yet. Request is logged as issue #1372 in our defect base.

I have also logged the request to support bgColor in tag as issue #1374.

The table indeed looks messed when inserted into the table cell of your document. However, it is rendered correctly when inserted to the empty document. That is why I thought that the issue was fixed. I have relogged this problem as #1375.

We will try to fix all these defects as soon as possible.

Thank you for your patience,

The defect #1375 no longer displays with the latest version (4.0.5). Please try with the new version and let me know if it worked ok for you.

i have same problem.in my content hyperlink is their.i don’t want hyperlinks.how can i delete content middle hyperlinks.plz tell me

Hi

Thanks for your request. Could you please attach your HTML here for testing? I will check the issue and provide you more information.

Also, please attach output and expected documents.

Best regards.

try this

builder.InsertHtml("<del><P></del>Left1&<del></P>&</del>"+"<BLOCKQUOTE dir=ltr style=MARGIN-RIGHT: 0px>"+"<del><P></del>Right1&<del></P></BLOCKQUOTE>&</del>"+"<del><P dir=ltr></del>Left2&<del></P>&</del>"+"<BLOCKQUOTE dir=ltr style=MARGIN-RIGHT: 0px>"+"<del><P dir=ltr></del>Right2&<del></P></BLOCKQUOTE>&</del>");
<ul><li><a href='http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=pubmed&cmd=Retrieve&><font size=2>Bratasz A, Weir NM, Parinandi NL, Zweier JL, Sridhar R, Ignarro LJ, Kuppusamy P;Reversal to cisplatin</a></li>

this is my content i don’t want to links in the document.how can i remove

Hi

Thank you for additional information.

  1. BLOCKQUOTE tags are not supported upon HTML import yet. I linked your request to the appropriate issue.

  2. To remove Hyperlinks you can try using Regular Expressions. For example, see the following code:

string html = File.ReadAllText(@"Test001\test.html");
Regex regex = new Regex("]\*>(.\*?)", RegexOptions.Singleline);
html = regex.Replace(html, "$1");
builder.InsertHtml(html);

Hope this helps.

Best regards.

thank you very much in another column in some content coming like this
i don’t want to that top and underline and bold
i have given

builder.Font.Bold = false;
builder.Underline = Underline.None;

plz help me

Thanking you,
kavitha

Hi

Thanks for your request. During inserting HTML into a document, Aspose.Words takes all formatting from HTML. If you need to suppress some formatting, you can try :

  1. Remove unnecessary formatting in HTML string (before inserting it into a document)

  2. Change formatting of the inserted content after inserting.

  3. Change formatting during inserting. You can do this using NodeInserted event handler. Please see the following link for more information:

https://reference.aspose.com/words/net/aspose.words/inodechangingcallback/methods/nodeinserted

Here is simple code:

// Create document and DocuemntBuidler
Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
// Add NodeInserted event handler.
doc.NodeInserted += new NodeChangedEventHandler(doc\_NodeInserted); 
// Insert Html
builder.InsertHtml("This is cool text");
// Save output docuemnt
doc.Save(@"Test001\out.doc");

void doc_NodeInserted(object sender, NodeChangedEventArgs e)
{
    if (e.Node.NodeType == NodeType.Run)
    {
        ((Run)e.Node).Font.Size = 24;
        ((Run)e.Node).Font.Name = "Arial";
    }
}

Hope this helps.

Best regards.

i am sorry say
its not working.

i don’t have unnecessary formatting in html .i have given to some example.
**Amyotrophic

December 2007**
Preclinic EN101 may have potential application in other lateral sclerosis.
Source: Amarin

this is my code
but it coming underline and bold

anyway thanks for reply to me

Hi

Thank you for additional information. Your HTML works fine on my side. I attached output document produced on my side. Here is my code:

Document doc = new Document();
DocumentBuilder builder = new DocumentBuilder(doc);
string html = File.ReadAllText(@"Test001\test.html");
builder.InsertHtml(html);
doc.Save(@"Test001\out.doc");

Best regards.

hi plz reply its urgent

my word document not save that means

there is insufficent memory.save the document now .
my hard disk have free space
plz help me

Hi

Thanks for your inquiry. Could you please show me your code or attach sample application, which will allow me to reproduce the problem? I will check the issue and try to help you.

Best regards,

hi,
some projects its coming correctly.

for (int l = 0; l <= DS1.Tables[0].Rows.Count - 1; l++)
{

    builder.CellFormat.Borders.LineStyle = LineStyle.Thick;
    builder.InsertCell();
    builder.Writeln(DS1.Tables[0].Rows[l]["zyx"].ToString());
    builder.InsertCell();
    builder.Writeln(DS1.Tables[0].Rows[l]["xyz"].ToString());
    builder.InsertCell();
    builder.Writeln(DS1.Tables[0].Rows[l]["abc"].ToString());
    builder.InsertCell();
    if (DS1.Tables[0].Rows[l]["abc"].ToString().Length > 30)
        builder.RowFormat.HeightRule = HeightRule.Auto;
    else
        builder.RowFormat.HeightRule = HeightRule.Exactly;
    builder.Writeln(DS1.Tables[0].Rows[l]["abd"].ToString());
    builder.EndRow();
}
builder.EndTable();

Thank you for additional information. I supposed, the problem occurs upon saving the document on disk, doesn’t it? Maybe, you should try saving the document into a Stream, then check the size of the document saving to stream and then check whether there is enough space on your disk.

Best regards.

<font color="#ff0000"><p align=center><a name='clinical'><a href='#III' class=middlelink><b>Phatails</b></a> </p></font><font color='000080'><b> Curr III </b>(As of August 06, 2009)</font><br><br><font color="#ff0000"><a name='II'></a><font color='000080'></font><b>PhaseDetails</font></b> <font color="#ff0000">&nbsp;&nbsp;<a href='#clinical'class=middlelink>^Top</a> </font><br><br><font color='000080'><b>Inittion</font><bthKline</i><br><br><font color="#ff0000"><a  name='II'></a><font color='000080'></font><b>Phaseils</font></b> &nbsp;&nbsp;<font color="#ff0000"><a href='#clinical'class=middlelink>^Top</a> </font><br><br><b>In-Study with tus.</b><bR><br><br><br><br><br>

i don’t want to that red color data in my word document

Hi

Thanks for your inquiry. As I can see you would like to remove anchors from your HTML string. You can use regular expressions to achieve this. For example see the following code:

string html = File.ReadAllText(@"Test001\test.html");
Regex regex = new Regex("]\*>(.\*?)", RegexOptions.Singleline);
html = regex.Replace(html, "");

The same technique you can use to remove any HTML tag from your string. Please see the following link to learn more about regular expressions:

http://www.codeproject.com/KB/dotnet/regextutorial.aspx

Best regards.

Thank you so much

my word document first page coming blank .