Free Support Forum - aspose.com

Extracting the bookmark content as rtf ruins fonts and formatting

Hi,


We are using the following code to extract the content of bookmark in rtf format:

            Bookmark FieldBookmark = mDocument.Range.Bookmarks[“START” + wField.Id];
        <span style="color:#2b91af;">BookmarkStart</span> bookmarkStart = FieldBookmark.BookmarkStart;
        <span style="color:#2b91af;">BookmarkEnd</span> bookmarkEnd = FieldBookmark.BookmarkEnd;

        <span style="color:green;">// Firstly extract the content between these nodes including the bookmark.</span>
        <span style="color:#2b91af;">ArrayList</span> extractedNodes = ExtractContent(bookmarkStart, bookmarkEnd, <span style="color:blue;">false</span>);

        Aspose.Words.<span style="color:#2b91af;">Document</span> BookmarkContent = GenerateDocument(mDocument, extractedNodes);</pre></div><div><pre style="font-family: 'Courier New'; font-size: 13px; background-color: rgb(159, 209, 251); background-position: initial initial; background-repeat: initial initial; ">            <span style="color: rgb(43, 145, 175); ">RichTextBox </span>mRichTextbox.Rtf = getString(BookmarkContent, <span style="color:#2b91af;">SaveFormat</span>.Rtf);

        <span style="color:green;">// Trim the start and end delimiter out of the report RTF.</span>
        mRichTextbox.Select(0, wField.FieldBegin.Length);
        <span style="color:blue;">if</span> (mRichTextbox.SelectedText == wField.FieldBegin)
            mRichTextbox.SelectedText = <span style="color:#2b91af;">String</span>.Empty;
        <span style="color:blue;">if</span> (mRichTextbox.TextLength - wField.FieldEnd.Length >= 0)
        {
             mRichTextbox.Select(mRichTextbox.TextLength - wField.FieldEnd.Length, wField.FieldEnd.Length);
             <span style="color:blue;">if</span> (mRichTextbox.SelectedText == wField.FieldEnd)
                 mRichTextbox.SelectedText = <span style="color:#2b91af;">String</span>.Empty;
        }

        wField.ReportText = mRichTextbox.Rtf;</pre><pre style="font-family: 'Courier New'; font-size: 13px; background-color: rgb(159, 209, 251); background-position: initial initial; background-repeat: initial initial; "><br></pre></div><div><pre style="font-family: 'Courier New'; font-size: 13px; background-color: rgb(159, 209, 251); background-position: initial initial; background-repeat: initial initial; ">        <span style="color:blue;">private</span> <span style="color:#2b91af;">ArrayList</span> ExtractContent(<span style="color:#2b91af;">Node</span> startNode, <span style="color:#2b91af;">Node</span> endNode, <span style="color:blue;">bool</span> isInclusive)
    {
         <span style="color:green;">// First check that the nodes passed to this method are valid for use.</span>
        VerifyParameterNodes(startNode, endNode);

        <span style="color:green;">// Create a list to store the extracted nodes.</span>
        <span style="color:#2b91af;">ArrayList</span> nodes = <span style="color:blue;">new</span> <span style="color:#2b91af;">ArrayList</span>();

        <span style="color:green;">// Keep a record of the original nodes passed to this method so we can split marker nodes if needed.</span>
        <span style="color:#2b91af;">Node</span> originalStartNode = startNode;
        <span style="color:#2b91af;">Node</span> originalEndNode = endNode;

        <span style="color:green;">// Extract content based on block level nodes (paragraphs and tables). Traverse through parent nodes to find them.</span>
        <span style="color:green;">// We will split the content of first and last nodes depending if the marker nodes are inline</span>
        <span style="color:blue;">while</span> (startNode.ParentNode.NodeType != <span style="color:#2b91af;">NodeType</span>.Body)
            startNode = startNode.ParentNode;

        <span style="color:blue;">while</span> (endNode.ParentNode.NodeType != <span style="color:#2b91af;">NodeType</span>.Body)
            endNode = endNode.ParentNode;

        <span style="color:blue;">bool</span> isExtracting = <span style="color:blue;">true</span>;
        <span style="color:blue;">bool</span> isStartingNode = <span style="color:blue;">true</span>;
        <span style="color:blue;">bool</span> isEndingNode = <span style="color:blue;">false</span>;
        <span style="color:green;">// The current node we are extracting from the document.</span>
        <span style="color:#2b91af;">Node</span> currNode = startNode;

        <span style="color:green;">// Begin extracting content. Process all block level nodes and specifically split the first and last nodes when needed so paragraph formatting is retained.</span>
        <span style="color:green;">// Method is little more complex than a regular extractor as we need to factor in extracting using inline nodes, fields, bookmarks etc as to make it really useful.</span>
        <span style="color:blue;">while</span> (isExtracting)
        {
            <span style="color:green;">// Clone the current node and its children to obtain a copy.</span>
            <span style="color:#2b91af;">CompositeNode</span> cloneNode = (<span style="color:#2b91af;">CompositeNode</span>)currNode.Clone(<span style="color:blue;">true</span>);
            isEndingNode = currNode.Equals(endNode);

            <span style="color:blue;">if</span> (isStartingNode || isEndingNode)
            {
                <span style="color:green;">// We need to process each marker separately so pass it off to a separate method instead.</span>
                <span style="color:blue;">if</span> (isStartingNode)
                {
                    ProcessMarker(cloneNode, nodes, originalStartNode, isInclusive, isStartingNode, isEndingNode);
                    isStartingNode = <span style="color:blue;">false</span>;
                }

                <span style="color:green;">// Conditional needs to be separate as the block level start and end markers maybe the same node.</span>
                <span style="color:blue;">if</span> (isEndingNode)
                {
                    ProcessMarker(cloneNode, nodes, originalEndNode, isInclusive, isStartingNode, isEndingNode);
                    isExtracting = <span style="color:blue;">false</span>;
                }
            }
            <span style="color:blue;">else</span>
                <span style="color:green;">// Node is not a start or end marker, simply add the copy to the list.</span>
                nodes.Add(cloneNode);

            <span style="color:green;">// Move to the next node and extract it. If next node is null that means the rest of the content is found in a different section.</span>
            <span style="color:blue;">if</span> (currNode.NextSibling == <span style="color:blue;">null</span> && isExtracting)
            {
                <span style="color:green;">// Move to the next section.</span>
                <span style="color:#2b91af;">Section</span> nextSection = (<span style="color:#2b91af;">Section</span>)currNode.GetAncestor(<span style="color:#2b91af;">NodeType</span>.Section).NextSibling;
                currNode = nextSection.Body.FirstChild;
            }
            <span style="color:blue;">else</span>
            {
                <span style="color:green;">// Move to the next node in the body.</span>
                currNode = currNode.NextSibling;
            }
        }


        <span style="color:green;">// Return the nodes between the node markers.</span>
        <span style="color:blue;">return</span> nodes;
    }</pre><pre style="font-family: 'Courier New'; font-size: 13px; background-color: rgb(159, 209, 251); background-position: initial initial; background-repeat: initial initial; "><pre style="font-family: 'Courier New'; background-position: initial initial; background-repeat: initial initial; ">        <span style="color:blue;">private</span> Aspose.Words.<span style="color:#2b91af;">Document</span> GenerateDocument(Aspose.Words.<span style="color:#2b91af;">Document</span> srcDoc, <span style="color:#2b91af;">ArrayList</span> nodes)
    {
         <span style="color:green;">//Remove empty paragraphs from the end of document</span>
        <span style="color:blue;">if</span> (srcDoc.LastSection.Body.LastChild != <span style="color:blue;">null</span>)
        {
            <span style="color:blue;">while</span> (!((<span style="color:#2b91af;">CompositeNode</span>)srcDoc.LastSection.Body.LastChild).HasChildNodes)
            {
                srcDoc.LastSection.Body.LastParagraph.Remove();
                <span style="color:blue;">if</span> (srcDoc.LastSection.Body.LastChild == <span style="color:blue;">null</span>)
                    <span style="color:blue;">break</span>;
            }
        }

        <span style="color:green;">// Create a blank document.</span>
        Aspose.Words.<span style="color:#2b91af;">Document</span> dstDoc = <span style="color:blue;">new</span> Aspose.Words.<span style="color:#2b91af;">Document</span>();
        <span style="color:green;">// Remove the first paragraph from the empty document.</span>
        dstDoc.FirstSection.Body.RemoveAllChildren();

        <span style="color:green;">// Import each node from the list into the new document. Keep the original formatting of the node.</span>
        <span style="color:#2b91af;">NodeImporter</span> importer = <span style="color:blue;">new</span> <span style="color:#2b91af;">NodeImporter</span>(srcDoc, dstDoc, <span style="color:#2b91af;">ImportFormatMode</span>.KeepSourceFormatting);

        <span style="color:blue;">foreach</span> (<span style="color:#2b91af;">Node</span> node <span style="color:blue;">in</span> nodes)
        {
            <span style="color:#2b91af;">Node</span> importNode = importer.ImportNode(node, <span style="color:blue;">true</span>);
            dstDoc.FirstSection.Body.AppendChild(importNode);
        }

        <span style="color:green;">// Return the generated document.</span>
        <span style="color:blue;">return</span> dstDoc;
    }</pre></pre><pre style="font-family: 'Courier New'; font-size: 13px; background-color: rgb(159, 209, 251); background-position: initial initial; background-repeat: initial initial; "><pre style="font-family: 'Courier New'; background-position: initial initial; background-repeat: initial initial; ">        <span style="color:blue;">private</span> <span style="color:blue;">string</span> getString(Aspose.Words.<span style="color:#2b91af;">Document</span> doc, <span style="color:#2b91af;">SaveFormat</span> format)
    {
        <span style="color:green;">// Save document to stream as RTF (you can save it also in HTML for example).</span>
        <span style="color:#2b91af;">MemoryStream</span> rtfStream = <span style="color:blue;">new</span> <span style="color:#2b91af;">MemoryStream</span>();

        <span style="color:#2b91af;">SaveOptions</span> options = <span style="color:#2b91af;">SaveOptions</span>.CreateSaveOptions(format);
        options.PrettyFormat = <span style="color:blue;">true</span>;

        doc.Save(rtfStream, options);

        <span style="color:green;">// Get string from stream.</span>
        <span style="color:blue;">return</span> <span style="color:#2b91af;">Encoding</span>.UTF8.GetString(rtfStream.GetBuffer());
    }</pre></pre></div><div><br></div><div>This code ruins the formatting of bookmark's content.</div><div>I am attaching the document.</div><div>Bookmark ID: START1222</div><div><br></div><div>How it can be fixed?</div><div><br></div><div>I already opened similar post in the past. The problem was then that when bookmark contained a picture, after extracting bookmark's content in rtf format to RichTextBox, the picture disappeared. This bug was fixed then in the next version. </div><div><br></div><div>Thanks,</div><div>Stanislav.</div><div><br></div><div><br></div>

Hi Stanislav,

Thanks for your query. I have worked with your document and have not found the image inside bookmark START1222. Please see the attached image file for details. I have modified the ExtractContent method. Please read my reply from this post.

Please let us know if you have any more queries.

Hi,


First of all thanks for the providing updated code for ExtractContent method. I am testing it right now.

I apologize for misunderstanding but i dont have an image in this bookmark. I just mentioned the image as old bug that was in this area and was solved by you more then year ago.

Now i have another problem that i described above. I have a bookmark with content that has some formatting. I can extract the content from bookmark but when i am extracting content as rtf using the code above the formatting is ruined. I am using RichTextBox as middle point to get final rtf. The bookmark ID as i mentioned above is START1222. It has only formatted text inside.

Thanks,
Stas.

Hi Stanislav,

Thanks for sharing the details. I have used the same code and have not found any issue with output RTF. Please find the output RTF file in attachment. I have also displayed the output RTF in RichTextBox control with no issue.

It would be great if you please share RTF formatting issue in detail and also share your output RTF file with us for investigation purposes.


<span style=“font-family: “Courier New”; color: rgb(43, 145, 175);” lang=“EN-GB”>Document<span style=“font-family: “Courier New”;” lang=“EN-GB”> doc = new
Document(MyDir + “plstemp.doc”);<o:p></o:p>

Aspose.Words.Bookmark FieldBookmark = doc.Range.Bookmarks["START1222"];

BookmarkStart bookmarkStart = FieldBookmark.BookmarkStart;

BookmarkEnd bookmarkEnd = FieldBookmark.BookmarkEnd;

// Firstly extract the content between these nodes including the bookmark.

ArrayList extractedNodes = ExtractContent(bookmarkStart, bookmarkEnd, false);

Document doc2 = GenerateDocument(doc, extractedNodes);

doc2.Save(MyDir + "AsposeOut.rtf");

richTextBox1.Rtf = getString(doc2, SaveFormat.Rtf);


Hi,


Thank you for you answer. As i see from your solution i need to save a document in rtf format before calling to getString method. I will test this code after a weekend and i will provide you the result.

It is important to say that i am checking the rtf when it returned from RichTextBox and not as it looks after calling to GenerateDocument method.

Without calling to Save method i am getting the following rtf string from the RichTextBox:

{\rtf1\fbidis\ansi\ansicpg1252\deff0\deflang1033\deflangfe2052{\fonttbl{\f0\fswiss\fprq2\fcharset177 Arial;}{\f1\froman\fprq2\fcharset177 Times New Roman;}{\f2\fswiss\fprq2\fcharset0 Arial;}}
\viewkind4\uc1\pard\ltrpar\lang1037\b\i\f0\rtlch\fs24’f9’fa’e9’e1’e9\f1 'e1’e5’e3’f7 \ul\f0’f2’e1’f8’e9’fa\ulnone\f1 'e1’f4’e5’f0’e8’e9’ed 'f9’e5’f0’e9’ed\lang1033\b0\i0\f2\ltrch\fs20\par
}

Thanks,
Stanislav.

I tested the code with the Save method and i see that Save method indeed creates rtf file with correct formatting. But when i am assigning this rtf string into RichTextBox and getting it back from RichTextBox the formatting is ruined.


I tested the same code when using Word instead of Aspose. The rtf that is returned form RichTextBox also has incorrect formatting but it still looks better that when i am using Aspose.


Range
wRange = GetRangeFromField(wField, false);
Range wDuplicatedRange = wRange.Duplicate;
Range wDuplicatedRange = wDuplicatedRange.FormattedText.Copy();

RichTextBox rt = new RichTextBox()
rt.Paste();

// Trim the start and end delimiter out of the report RTF.
rt.Select(0, wField.FieldBegin.Length);
if (rt.SelectedText == wField.FieldBegin)
rt.SelectedText = String.Empty;

if (rt.TextLength - wField.FieldEnd.Length >= 0)
{
rt.Select(rt.TextLength - wField.FieldEnd.Length, wField.FieldEnd.Length);
if (rt.SelectedText == wField.FieldEnd)
rt.SelectedText = String.Empty;
}

wField.ReportText = rt.Rtf;
rt.Clear();


Thanks,
Stanislav.

Hi,


Do you have any new information about this issue?

Thanks,
Stanislav.

Hi Stanislav,

I have tested the scenario and have not found any issue with Aspose.Words. This issue is with RichTextBox control. Please check RichTextBox details.

<!–[if gte mso 9]>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:HyphenationZone>21</w:HyphenationZone>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>PL</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>AR-SA</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:DontVertAlignCellWithSp/>
<w:DontBreakConstrainedForcedTables/>
<w:DontVertAlignInTxbx/>
<w:Word11KerningPairs/>
<w:CachedColBalance/>
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
<m:mathPr>
<m:mathFont m:val=“Cambria Math”/>
<m:brkBin m:val=“before”/>
<m:brkBinSub m:val="–"/>
<m:smallFrac m:val=“off”/>
<m:dispDef/>
<m:lMargin m:val=“0”/>
<m:rMargin m:val=“0”/>
<m:defJc m:val=“centerGroup”/>
<m:wrapIndent m:val=“1440”/>
<m:intLim m:val=“subSup”/>
<m:naryLim m:val=“undOvr”/>
</m:mathPr></w:WordDocument>
<![endif]–><!–[if gte mso 10]>

/* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:10.0pt; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:47.9pt; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:Arial; mso-bidi-theme-font:minor-bidi;}

<![endif]–>

String strRTF = getString(doc2, SaveFormat.Rtf);


Hi,


Thank you for the answer. My point was that when i am using RichTextBox with rtf that was returned by Word and by Aspose i am getting back from RichTextBox different rtf string.

As i said in previous post:
I tested the same code when using Word instead of Aspose. The rtf that is returned form RichTextBox also has incorrect formatting but it still looks better that when i am using Aspose.

Apparently the RichTextBox has some difficulties to display correctly the rtf tags in string that returned by Aspose.

Thanks,
Stanislav.

Hi Stanislav,

Thanks for your query. I have attached the output of Aspose.Words and RichTextBox with this post. It would be great if you please share what exact you want to achieve by using Aspose.Words?

Hi,


Thank you for the answer.
I have the code that is running once on client using Word in interactive mode and retrieves rtf strings from specific bookmarks in the document. On other hand I also have this code running on server (for batch needs). I cant use word on server so i am using the Asopse.
When i am checking the rtf string that i am getting back by client and by server, using the way that i described above, i can see some differences in their formatting.

I checked the attached files and i see that you dont have correct fonts on your machine so i just attaching the pictures of what i see when getting rtf from RichTextBox using Word and Aspose.

Thanks,
Stanislav.

Hi Stanislav,

Thanks for your query. I have worked over shared scenario and like to share with you that the formatting issue is not with Aspose.Words. The following line of code is not returning string with proper formatting. I will check this issue and will update you asap.


<span style=“font-family: “Courier New”; color: blue;” lang=“EN-GB”>return<span style=“font-family: “Courier New”;” lang=“EN-GB”> Encoding.UTF8.GetString(rtfStream.GetBuffer());<o:p></o:p>


Hi,


Thank you for the answer. I will wait for the update.

Stanislav.

Hi Stanislav,

I have managed to reproduce the same issue at my side. I have logged this issue as WORDSNET-6820 in our issue tracking system. I have linked this forum thread to the same issue and you will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Hi,


Do you have any news about this issue. I would like to get an estimation when it can be solved?

Thank you
Stanislav.

Hi Stanislav,

Thanks for your query. Our development team has analyzed this issue and has shared that this issue is not related to Aspose.Words. This issue occurs because RichTextBox doesn’t support Unicode symbols. Hope this answers your query. Please let us know if you have any more queries.

Hi,


Thank you for the answer. What do you mean doesn’t support Unicode symbols? RichTextBox works fine with Unicode for me when i am using Word and Aspose. The only problem is the formatting as i already described it above several times.

Thanks,
Stanislav.

Hi Stanislav,

Thanks for your query.

This issue occurs because RichTextBox doesn’t support Unicode Symbols. You are converting document to RTF by using getString(Aspose.Words.Document doc, SaveFormat format) method. The contents of output RTF has some Unicode Symbols which contains information about Fonts and such Unicode Symbols are not supported by RichTextBox.

I have asked for the details about Unicode Symbols (related to fonts) for this issue from our development team.
As soon as, any information is shared by them, I will be more than
happy to share that with you.

Hi Stanislav,

I have received response from our development team about the Unicode Symbols issue with RichTextBox. Aspose.Words converts all symbols with value > 255 in \uNumber and RichTextBox doesn’t display these symbols correctly.

Please feel free to ask if you have any question about Aspose.Words, we will be happy to help you.

Hi,


Thank you very much for the answer. Is this issue will be fixed in Aspose.Words?

Thanks,
Stanislav.