DOCX to HTML conversion issue with DIV tags using Java

Hi @tahir.manzoor

I want to separate

tag one by one. But when i convert the document to Html it’s giving me some
tag inside a
tag.But that is ok with me.
after that, I need to separate the
tag from the Html text.
I want to check if
is inside a
then take all the child
and store it.
As ex.







my expected output will be .





I am adding my code and input .docx file
PFA
test.zip (28.1 KB)

@rabinintig

You are facing the expected behavior of Aspose.Words. The extra DIV contains the roundtrip information for document. You can disable it by using HtmlSaveOptions.ExportRoundtripInformation property.

Could you please share some more detail about your use case in which these DIVs are causing issue for you? We will then provide you more information about your query.

Hi @tahir.manzoor
I am adding more input for your understanding.
If you see my input document “test.docx” it has some paragraph , some contain control and some clauses.
Batter understanding paragraph as like,
Dear,
This letter will constitute the agreement (the “Agreement”) between you and with respect to the services and materials you have agreed to provide, as more fully described on Schedule A (together, the “Services”). This Agreement is subject to the terms and conditions set forth below.

some contain clauses like(i am only adding the heading),
1. Term; Termination.
2. Materials.
3. Acceptance
So, when i converting the input document into HTML. the paragraph is taking separate div but only the first clause (1. Term; Termination.) is having div.
And others clauses don’t have any div (Materials,Acceptance).

So, my expected behavior will be each clause and each paragraph will be taken as a separate separate div.

PFA
test2.zip (25.6 KB)

@tahir.manzoor
My use case as like in my document whatever clause and paragraph will present it will take as a separate div.
But in current behavior, if you see I am facing extra div and some div also present inside the div.
After adding “options.setExportXhtmlTransitional(false);” code my outer div is remove but the child div is not have with any div tag.

please help to solve this problem

@rabinintig

We are working over your query and will get back to you soon.

Thanks @tahir.manzoor
I will be waiting for your result.

@rabinintig

We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSNET-20436 . You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

@tahir.manzoor
can u please tell me what is the priority of my problem
and what is the estimate of my problem if u tell me that will help me a lot.
so, I can wait to resolve this problem.
please tahir it’s the big problem on my side. if u insist then u can give me an estimat.

@rabinintig

We try our best to deal with every customer request in a timely fashion, we unfortunately cannot guarantee a delivery date to every customer issue. Our developers work on issues on a first come, first served basis. We feel this is the fairest and most appropriate way to satisfy the needs of the majority of our customers.

Currently, your issue is pending for analysis and is in the queue. Once our product team completes the analysis of your issue, we will then be able to provide you an estimate.

@tahir.manzoor
OK

It is to inform you that the issue which you are facing is actually not a bug in Aspose.Words. So, we have closed this issue (WORDSNET-20436) as ‘Not a Bug’.

The first clause (1. Term; Termination) contains list item inside content control but the second and the third are contained in list items. See the attached image for detail. content controls.png (57.1 KB)