How to identify class in the html document and apply page breaks

Hii

I need to identify class from the html and need to apply page breaks before and after.(page break before,page break after)

ex:

Yo can see in my png, class name of the div tag is “banner-header”. So, I need to identify that “banner-header” class name and put a page break before that div or page break after that div. Is this possible???

@nethmi, “div” element can have different content as child elements with no target entities in the resulting aspose DOM model after importing the html file. Therefore, inserting PageBreak after “div” element is generally an impossible task. In cases where “div” element is imported into the aspose DOM model as Paragraph and its “class” attribute points to the corresponding paragraph style, one may try to find this element as follows:

Document doc = new Document("ex.html");
DocumentBuilder builder = new DocumentBuilder(doc);
doc.Save("ex_orig.docx");
Style style = doc.Styles["banner-header"];
foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
    if (para.ParagraphFormat.Style == style)
    {
        builder.MoveTo(para);
        builder.InsertBreak(BreakType.PageBreak);
    }

}
doc.Save("ex_result.docx");

ex.zip (393 Bytes)

1 Like

Thank you very much. I’ll try this and let you know.

Hiii

This is working. But in here, when I applied the class attribute to the relevant element. I can’t figure whether it is a heading style or list item or it’s style name. Why is like that?? how can I identify it??

Can’t figure below styles when I use the class attribute.

paragraph.ParagraphFormat.IsHeading
if (paragraph.ParagraphFormat.StyleName.Contains('p'))

Hii
I am having another doubt regarding this. In my html code having two paragraphs with two class attributes. Please check below image

issue.png (25.8 KB)

Check my code below.

public static void BlockLevelPageBreakHandler(Document document, Dictionary<string, Dictionary<string, string>> mediaStyles)
{
    Style style = document.Styles["banner-header"];
    Style style1 = document.Styles["commentary-block"];
    foreach (Paragraph paragraph in document.GetChildNodes(NodeType.Paragraph, true))
    {
        if (paragraph.ParagraphFormat.Style == style)
        {
            // condition of this if clause becomes true
        }
        if (paragraph.ParagraphFormat.Style == style1)
        {
            // condition of this if clause becomes false
        }
    }
}

if (paragraph.ParagraphFormat.Style == style) why only this condition becomes true???

I found the reason for this problem.

paragraph.ParagraphFormat.Style == style
this is working with only the style tag in the html. means if we put a class attribute to the element we should add it to the style tag of the html also. otherwise it can’t identify.

Could you please give me a solution to this??

I need to identify both class property and the tag. (p,h1,h1 etc)

@nethmi

I created a test html document based on your screenshot and ran your test code. In my case both conditions are executed: the first condition for the second paragraph and the second condition for the third paragraph. Please, check whether there is a typo somewhere in your test document. Also check that both your style and style1 are not null under the debugger.

1 Like

for this one?

@nethmi
In my case, both specified paragraphs are found.

1 Like

I was not asking about this. I got your point. In here I am using style sheet. So that’s why my code was not working. when I am using a style sheet, that code takes class property name from the style sheet. I didn’t use “commentary-block” class property in my style sheet. Only use “banner-header” class property. That’s why one of my condition was not working. So to use your logic either i should use style sheet with class properties or class attribute without a style sheet.

  1. my second question is,
    But in here, when I applied the class attribute to the relevant element. I can’t figure whether it is a heading style or list item or it’s style name. Why is like that?? how can I identify it??

Can’t figure below styles when I use the class attribute.

paragraph.ParagraphFormat.IsHeading
if (paragraph.ParagraphFormat.StyleName.Contains('p'))

I need to identify both class property and the tag. (p,h1,h1 etc).

I need a answer for my second question

As an example,
I need to put a page break to “commentry-block” class paragrpah
So I need to identify both class and P tag. But when I use your logic I couldn’t find tag name§ .

@nethmi Unfortunately, such functionality is not available. After importing an html file into Aspose.Word DOM, information about where a paragraph has been imported from (from <p> or <div> tag) is not stored, and during the import process, Callback(-s) that provide such information are also not available. So, unfortunately, this task cannot be solved in a general way.

1 Like

Thank you very much for your support

@nethmi Please feel free to ask in case of any issues, we will be glad to help you.

1 Like

Hiii

@alexey.noskov @vadim.saltykov

<p class="paragraph banner-header">
    A well-organized paragraph supports or develops a single controlling idea, which is expressed in a sentence called the topic sentence. A topic sentence has several important functions:
    it substantiates or supports an essay’s thesis statement; it unifies the content of a paragraph and directs the order of the sentences; and it advises the reader of the subject to
    be discussed and how the paragraph will discuss it. Readers generally look to the first few sentences in a paragraph to determine the subject and perspective of the paragraph.
</p>

I the above code I have two class property names (class=“paragraph banner-header”). I need to identify them separately. Please check below code.

if (paragraph.ParagraphFormat.Style == "paragraph")
{
    // condition of this if clause becomes true
}
if (paragraph.ParagraphFormat.Style == "banner-header")
{
    // condition of this if clause becomes false
}

Could you please give me a solution?

@nethmi I am afraid there is no way to preserve both style names in the MS Word document model, so only the first style name will be retained. Foe example if you have the styles applied like this:

<html>
<head>
    <style>
        .paragraph {
            background-color: aqua;
        }
        .banner-header {
            color:red;
        }
    </style>
</head>
<body>
    <p class="paragraph banner-header">
        A well-organized paragraph supports or develops a single controlling idea, which is expressed in a sentence called the topic sentence. A topic sentence has several important functions:
        it substantiates or supports an essay’s thesis statement; it unifies the content of a paragraph and directs the order of the sentences; and it advises the reader of the subject to
        be discussed and how the paragraph will discuss it. Readers generally look to the first few sentences in a paragraph to determine the subject and perspective of the paragraph.
    </p>
</body>
</html>

Such paragraph will be exported to MS Word like this:

<w:p>
	<w:pPr>
		<w:pStyle w:val="paragraph" />
		<w:spacing w:after="240" />
		<w:rPr>
			<w:color w:val="FF0000" />
		</w:rPr>
	</w:pPr>
	<w:r>
		<w:rPr>
			<w:color w:val="FF0000" />
			<w:shd w:val="clear" w:color="auto" w:fill="auto" />
		</w:rPr>
		<w:t xml:space="preserve">A well-organized paragraph supports or develops a single controlling idea, which is expressed in a sentence called the topic sentence. A topic sentence has several important functions: it substantiates or supports an essay’s thesis statement; it unifies the content of a paragraph and directs the order of the sentences; and it advises the reader of the subject to be discussed and how the paragraph will discuss it. Readers generally look to the first few sentences in a paragraph to determine the subject and perspective of the paragraph. </w:t>
	</w:r>
</w:p>

As you can see formatting from paragraph styles is applied as style, and formatting from banner-header style is applied as direct formatting of the paragraph.

1 Like

thank you very much

1 Like