Paragraph Format after removing Bookmark

AsposeBookmarkTest.zip (46.8 KB)
Good afternoon Aspose.Words Support, we have recently updated from 20.2.0 to 22.7.0 and we noticed a change in behavior with Bookmarks.

We have a template that has the following structure

Para - P1
  Run

Para2 - P2
  Run - R2.1
  Run - R2.2
  BookmarkStart - BM1
  Run
  Run
  BookmarkStart - _GoBack
  BookmarkEnd - _GoBack

BookmarkEnd - BM1

Para - P3
  Run - R3.1

If we get a reference to BM1 and then call

bookmark.Text = ""; 
bookmark.Remove();

we are seeing different results starting with Aspose.Words 21.10.0

In the upgrades from 20.2.0 up to 21.9.0 we would see the nodes inside of P3 moved into P2 because we would see the styling of P2 used in the output document. The output would look like

Para - P1
  Run

Para2 - P2
  Run - R2.1
  Run - R2.2
  Run - R3.1

In the upgrade to 21.10 we are seeing the two Runs at the start of P2 are moved to P3 and P2 is removed

Para - P1
  Run

Para - P3
  Run - R2.1
  Run - R2.2
  Run - R3.1

Both of these behaviors are different than what Word does if I follow these steps which conceptually sound the same as what Aspose.Words is doing. The steps I take in Word are

  1. Go to Insert > Bookmark
  2. Select the bookmark “false” and hit the “Go To” button
  3. Close the Bookmark dialog
  4. hit DEL key on keyboard to remove the text Word has selected
  5. hit DEL key on keyboard to remove the Bookmark that has nothing in it

Word will create an output document with the structure

Para - P1
  Run

Para2 - P2
  Run - R2.1
  Run - R2.2
  Run - R3.1

I liked the way Aspose.Words 21.9.0 and before handled it and is the way many Templates have been setup to handle where it would combine P2 and P3 into P2. That behavior is what happens in Word when you hit the DEL key twice and generates the documents they way they are used to.

I will attach a sample sln/csproj that shows the code used to test this.

There will also be 4 documents in there

  1. FormatChange.dotx - this is the starting template that contains the Bookmark
  2. FormatChange-20.2.0.docx - this is the output document when using Aspose.Words 20.2.0
  3. FormatChange-21.10.0.docx - this is the output document when using Aspose.Words 21.10.0
  4. FormatChange-WordEdit.docx - this is output document using using Word to perform the same steps

This change in behavior seems likely to have been caused by WORDSNET-19767 that was created from this forum post
https://forum.aspose.com/t/paragraphformat-issues-after-removing-bookmark-content/207013

Actually, I’m not seeing a way to attach a file to this post. Can you let me know what I’m missing so I can send what you need to reproduce this quickly.

@mike.doerfler You can simply drag and drop the files you would like to attach to the post. But first zip the required resources. We will check the issue and provide you more information once get your documents.

original post has been updated with zip file attached

@mike.doerfler Thank you for additional information. I will consult with the responsible developer and provide you more information.

@mike.doerfler We have further investigated the issue and concluded the current Aspose.Words behavior is correct. If perform the same scenario using MS Word Automation the result matches the output produced by Aspose.Words. Here is MS Word Automation code:

Microsoft.Office.Interop.Word.Application app = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document doc = app.Documents.Open(@"C:\Temp\in.dotx");
doc.Bookmarks[1].Range.Text = "";
doc.SaveAs2(@"C:\Temp\out_ms.docx", Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatXMLDocument);

out_ms.docx (16.2 KB)
And equivalent Aspose.Words code:

Document doc = new Document(@"C:\Temp\in.dotx");
doc.Range.Bookmarks["false"].Text = "";
doc.Save(@"C:\Temp\out.docx");

out.docx (11.6 KB)
As you can see the output documents are the same. So the fix made by WORDSNET-19767 corrected Aspose.Words behavior to match MS Word behavior.

That is interesting behavior from the MS Interop API. I think the MS Interop API behavior is incorrect because it doesn’t match what happens when I do the operations through the UI. My users expect the code to do what they manually do. But that is my issue to take up with Microsoft now that I know that the Aspose API aims to be consistent with MS Interop API behavior.

@mike.doerfler Actually, your actions in MS Word UI does not do the same as Aspose.Words and MS Word Automation do. When you select and delete bookmarked text in UI paragraph break is not removed and you remove it by the next action. However, internally the paragraphs are represented as the following:

<w:p w14:paraId="4E5C2A3D" w14:textId="2ACAACC4" w:rsidR="00D30726" w:rsidRDefault="00D30726" w:rsidP="00D30726">
	<w:r>
		<w:t xml:space="preserve">Test: </w:t>
	</w:r>
	<w:r>
		<w:tab/>
	</w:r>
	<w:bookmarkStart w:id="0" w:name="false"/>
	<w:r>
		<w:t>some bookmarked text</w:t>
	</w:r>
	<w:r w:rsidR="00473FD6">
		<w:t xml:space="preserve"> with a bool false rule</w:t>
	</w:r>
</w:p>
<w:bookmarkEnd w:id="0"/>
<w:p w14:paraId="40CD3728" w14:textId="2ED5388E" w:rsidR="00D30726" w:rsidRPr="00D30726" w:rsidRDefault="00D30726" w:rsidP="00D30726">
	<w:pPr>
		<w:ind w:left="720"/>
	</w:pPr>
	<w:r>
		<w:t>Some not bookmarked text</w:t>
	</w:r>
</w:p>

If remove content between bookmarks start and end we will get this:

<w:p w14:paraId="4E5C2A3D" w14:textId="2ACAACC4" w:rsidR="00D30726" w:rsidRDefault="00D30726" w:rsidP="00D30726">
	<w:r>
		<w:t xml:space="preserve">Test: </w:t>
	</w:r>
	<w:r>
		<w:tab/>
	</w:r>

<w:p w14:paraId="40CD3728" w14:textId="2ED5388E" w:rsidR="00D30726" w:rsidRPr="00D30726" w:rsidRDefault="00D30726" w:rsidP="00D30726">
	<w:pPr>
		<w:ind w:left="720"/>
	</w:pPr>
	<w:r>
		<w:t>Some not bookmarked text</w:t>
	</w:r>
</w:p>

As you can see the first paragraph becomes not valid and its content must be copied somewhere - MS Word and Aspose.Words API copies the content into the next valid paragraph.

The removal of the closing tag for w:p with paraId of 4E5C2A3D is invalid if the document is processed as Xml. With Xml a closing tag just can’t be removed if you are processing it using a DOM and not as a string.

That Word GUI does not appear to put the DOM into an invalid state with a missing closing element. In Word GUI when I perform these steps it does not remove the paragraph where the BookmarkStart is in the middle of it.

  1. Open Word Document
  2. Go to Ribbon Insert > Bookmark
  3. Select the “false” bookmark and click “Go To” and then close the bookmark dialog.
  4. Hit the DEL or BACKSPACE key and the Word GUI removes the highlighted text.

In the OpenXml structure that leaves the <w:p> that contains the BookmarkStart and two Runs in it. With the Interop Api it moves the two Runs to the next Paragraph.

If I were to use the Interop API and remove the content of what is inside the Bookmark using the Bookmark.Range.Delete it would behave like the GUI

var app = new Application();
app.Visible = true;

var inputFile = new FileInfo("FormatChange.dotx");
var outputPath = inputFile.FullName.Replace(".dotx", "_ms.docx");
var doc = app.Documents.Open(inputFile.FullName);
var docBookmark = doc.Bookmarks[1];
var range = docBookmark.Range;

object unit = WdUnits.wdCharacter;
object count = 0;
range.Delete(ref unit, ref count);
doc.SaveAs2(outputPath, WdSaveFormat.wdFormatXMLDocument);

doc.Close();
app.Quit();

This must mean the Word GUI is deleting the selected Range when I hit the DEL key instead of setting the text inside the bookmark to an empty string.

Oh well, I’ll write code to figure out how to get the behavior that I think is correct based on how the Word GUI works when I hit the DEL button after going to the Bookmark.

@mike.doerfler Yes, you are right, MS Word cannot put the DOM into invalid state, so when you select bookmark content, MS Word selects only content to the end of the paragraph. After removing this content the document has two valid paragraphs and then when you remove paragraphs break, you actually concatenate two paragraphs.
You can achieve the same using Aspose.Words, for example, see the following code:

Document doc = new Document(@"C:\Temp\in.dotx");

// Get bookmark.
Bookmark bk = doc.Range.Bookmarks["false"];

// Move bookmark end insite paragraph.
if (bk.BookmarkEnd.PreviousSibling.NodeType == NodeType.Paragraph)
    ((Paragraph)bk.BookmarkEnd.PreviousSibling).AppendChild(bk.BookmarkEnd);

// Remove bookmark's content.
bk.Text = "";

// Now copy content of the next paragraph the the paragraph with bookmark.
Paragraph bkParagraph = (Paragraph)bk.BookmarkEnd.GetAncestor(NodeType.Paragraph);
Paragraph nextParagraph = bkParagraph.NextSibling as Paragraph;
if (nextParagraph != null)
{
    while (nextParagraph.HasChildNodes)
        bkParagraph.AppendChild(nextParagraph.FirstChild);

    nextParagraph.Remove();
}

// Save output.
doc.Save(@"C:\Temp\out.docx");

Note, the code is for demonstration purposes and covers only a simple case when bookmark end is between paragraphs.