We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

BR tag with page break style is not imported in separate paragaph using C#

Hi,

We have a situation where we have to load html converted from word document to Aspose.Words. and everything seems to be working so far. But now, we have come accross a scenario where in some places manual page breaks inserted in word document(page break before) gets merged with the next paragraph, Though they are in separate P- tag in html document and this is causing problem for our further processing.

For example Paragraph text would look like : \fTest paragraph\r
where β€˜\f’ should appear in the separate paragraph range.

please let us know how can we separate this or identify this paragraph is merged.

And i noticed that paragraph.paragraphFormat.pagebreakbefore is set to false which should be true.

Thanks
ManoharSampleAsposeApp.zip (5.3 MB)

@rtmanohar3

To ensure a timely and accurate response, please attach the following resources here for testing:

  • Your input document.
  • Please attach the output document that shows the undesired behavior.
  • Please attach the expected output document that shows the desired behavior.
  • Please create a standalone console application ( source code without compilation errors ) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

1 Like

Hi Tahir,

Thanks for the reply, I have uploaded a sample application with required input file to reproduce the issue.
please let me know if you need anything else.

Thanks
Manohar

@rtmanohar3

If you want to remove page breaks from the document, please use the following code example.

Document doc = new Document(MyDir + "Sample_Input.html");
doc.Range.Replace(ControlChar.PageBreak, "", new FindReplaceOptions());
doc.Save(MyDir + "20.4.docx");

If you still face problem, please ZIP and attach your problematic and expected output Word documents. We will then provide you code example according to your requirement.

We do not want to remove the page breaks. We just want to write all the paragraphs text and page breaks into a custom text file. In this scenario, the page break is in the different p-tag in the input scenario so we would expect to write this page break into a output file in separate paragraph. But in aspose paragraph list, this particular page break and the next consecutive paragraph text are merged to together to a single paragraph. This is causing us the paragraph count mismatch with input and ouput file.

This is how merged paragraph range text look like::
\fStatement of Acceptance by Escrow Holder\r

We would like to know if this page break can be separated to separate paragraph or is there a way to identify that these paragraphs are merged.

Thanks
Manohar

By using the latest version of Aspose.Words for .NET 20.4, we have not found paragraphs merge issue. So, please use Aspose.Words for .NET 20.4. The shared paragraphs are imported separately into Aspose.Words’ DOM.

If you still face problem, please ZIP and attach problematic and expected output TXT files along with simplified code example to reproduce this issue at our end. We will investigate the issue and provide you more information on it.

Thanks, Tahir. We were using v17.7, looks like in v20.4 it is working as expected.
Although, We noticed some changes in the 20.4, where, it is not identifying the non-breaking space charater in the paragraph with only non-breaking space character. This space character is identified properly in v17.7. Paragraph range text shows only β€œ\r”, where as in 17.7 it shows as " \r".
Could you please have look at it.

Thanks
Manohar

@rtmanohar3

Could you please ZIP and attach the input, problematic output and expected output documents along with screenshots of problematic sections of output? We will investigate the issue and provide you more information on it.

Hi Tahir,

Here is one more issue with Aspose reference where it is not listing the empty paragraph in DOM paragraph list. I have attached sample document and also marked the error paragraph in the image.
Could you please let me know if this is reproducible for you and I am using v20.4.

SampleAspose.zip (282.0 KB)

Thanks
Manohar

@rtmanohar3

Unfortunately, your requirement is not clear to us. Could you please share some more detail about your query? We will then provide you more information on it.

Hi Tahir,

There is an empty paragraph in the sample input document in the end of the TOC section, which is marked in a image in the attachment. This paragraph is not present/imported in aspose dom paragraph list.

Thanks
Manohar

@rtmanohar3

Thanks for sharing the detail. Please note that Aspose.Words mimics the behavior of MS Word. The empty P and BR tags are imported as section break in the DOM.

<p style="margin-top:0pt; margin-bottom:0pt; font-size:12pt"></p>
<br style="page-break-before:always; clear:both; mso-break-type:section-break" />

Could you please share some detail about your requirement? We will then answer your query accordingly.

Thanks Tahir. In word online/officejs it is imported as two separate paragraphs hence it is causing paragraphs count mismatch for us.
Is it possible to do something like that in Aspose words as well?

Hi Tahir,

I have found one more document where empty paragraph is not imported into Aspose words DOM. Please have a look.

Thanks,Empty_Para_Doc2.zip (331.4 KB)

Manohar

@rtmanohar3

Please open the attached HTML in MS Word to test your scenario. These paragraphs are not imported by MS Word
input.zip (202 Bytes)

<html>
    <body>
    <p style="margin-top:0pt; margin-bottom:0pt; font-size:12pt"></p>
    <p style="margin-top:0pt; margin-bottom:0pt; font-size:12pt"></p>
    <p style="margin-top:0pt; margin-bottom:0pt; font-size:12pt"></p>
    <p style="margin-top:0pt; margin-bottom:0pt; font-size:12pt"></p>
    <p style="margin-top:0pt; margin-bottom:0pt; font-size:12pt"></p>
    <p style="margin-top:0pt; margin-bottom:0pt; font-size:12pt"></p>
    <p style="margin-top:0pt; margin-bottom:0pt; font-size:12pt"></p>
    </bodY>
    <html>

In your case, we suggest you please add non-breaking space in the P tag as shown below to get the desired output.

<p style="margin-top:0pt; margin-bottom:0pt; font-size:12pt">&nbsp;</p>