How to Merge Multiple RTF Chapter (content) blocks into a PDF and build a TOC

Hi

I mu company has the pdf toolkit. and also aspose total v9.7.0.0

I need to merge a list of content (chapters) in RTF format into

a final PDF document that will have an image as a cover page also.

Plus each chapter needs to also be in the TOC.

How can i do this?

Thanks

Nitin

Hi Nitin,


Thanks for using our products.

As far as I have understood, you need to merge the chapters present inside RFT format and finally convert the merged document into PDF format. In order to accomplish this requirement, please try using Aspose.Words for .NET as it provides the capability to deal with MS Word/RFT documents. I am moving this thread to respective forum where my fellow workers taking care of this product would be in better position to answer this query.

However please note that Aspose.Pdf for .NET supports the feature to Create a new PDF document as well as it supports the feature to manipulate existing PDF files. It also supports the feature to concatenate the PDF documents into a single resultant file. So if you have converted individual RFT file into PDF format using Aspose.Words, you may consider using Aspose.Pdf for .NET to merge these individual PDF files. Before I comment further, please confirm which approach you would like to follow i.e.

  • Combine chapters of RFT files into single file, add TOC, add image to cover page and then convert it into PDF format.
  • Or, convert individual chapters from RFT to PDF format, combine the individual PDF files, create TOC and then add image to cover page.

Please acknowledge as it will help us in answering the query in more appropriate manner.

Hi Nayyerv,

I think i like your approach of merge all the RTF blocks into a word doc (with TOC generated) and then convert that word doc into PD.

This would work just fine.

can you please give me some code that will do this

i.e. merge multiple RTF content into Word

Word will auto create the TOC

Then I generate a PDF for the word

Thanks

nitin.mistry@bell.ca:

i.e. merge multiple RTF content into Word

Hi Nitin,

Thanks for your patience.

Please visit the following link for information on Joining and Appending Documents

nitin.mistry@bell.ca:

Word will auto create the TOC

I have asked my fellow worker from Aspose.Words team to answer this query. Soon you will be updated with the required information.

nitin.mistry@bell.ca:

Then I generate a PDF for the word

Please visit the following link for information on How to Convert a Document to PDF

In case it does not satisfy your requirement or you have any further query, please feel free to contact. We are sorry for your inconvenience.

Hi Nitin,

I am representative of Aspose.Words team. Please read following documentation links for your kind reference.

http://www.aspose.com/docs/display/wordsnet/How+to++Insert+a+Document+into+another+Document


http://www.aspose.com/docs/display/wordsnet/Specifying+How+a+Document+is+Joined+Together

Please use the following sample code snippet to merge RTF files and convert to PDF file.


Document doc1 = new Document(MyDir + "in.rtf");

Document doc2 = new Document(MyDir + "in2.rtf");

Document doc3 = new Document(MyDir + "in3.rtf");

Document doc4 = new Document(MyDir + "in4.rtf");

// Append the source document using the original styles found in the source document.

doc1.AppendDocument(doc2, ImportFormatMode.KeepSourceFormatting);

doc1.AppendDocument(doc3, ImportFormatMode.KeepSourceFormatting);

doc1.AppendDocument(doc4, ImportFormatMode.KeepSourceFormatting);

// Updates the values of fields and TOC in the whole document.

doc1.UpdateFields();

doc1.Save(MyDir + "AsposeOut.pdf");


Hope this answers your query. Please let us know if you have any more queries.

Hi Tahir,

This is a great sample code. Can you send me a sample where

I have the RTF content in a string variable - not in a physical file?

Thanks

Nitin

Hi Nitin,

Thanks for your query. Please use the following code snippet to convert RTF string to PDF file. Please let us know if you have any more queries.


<span style=“font-family: “Courier New”; color: blue;” lang=“EN-GB”>string<span style=“font-family: “Courier New”;” lang=“EN-GB”> rtfString = @“RTF
string”
;<o:p></o:p>

Document doc = RtfStringToDocument(rtfString);

doc.Save(MyDir + "AsposeOut.pdf");

private static Document RtfStringToDocument(string rtf)

{

Document doc = null;

// Convert RTF string to byte array.

byte[] rtfBytes = Encoding.UTF8.GetBytes(rtf);

// Create stream.

using (MemoryStream rtfStream = new MemoryStream(rtfBytes))

{

// Open document from stream.

doc = new Document(rtfStream);

}

return doc;

}


super thank you very much. This is exactly what i needed.

So far so good. I have a couple of issues.

  1. The first page of the final .doc file is always blank.

  2. In each document, I have special tags like this
    [#Freedom to Be~http://freedomtobe.com.au#]

This is actually a hyper link to a web site.
The first part “Freedom to Be” is the display text
and the 2nd part

http://freedomtobe.com.au” is the actual URL.

I have many links like this.
I need to convert these into a proper Hyperlink in my Aspose word document.
How do I do this?

============== code ==================

Aspose.Words.Document FinalDoc = new Document();
Aspose.Words.Document docCoverPage = new Document();

DocumentBuilder _DocumentBuilder = new DocumentBuilder(docCoverPage);
_DocumentBuilder.InsertImage(_EBookMaker.CoverPage.CoverPageImageFileName);

FinalDoc.AppendDocument(docCoverPage, ImportFormatMode.KeepSourceFormatting);

Document docTitlePage = AsposeHelperManager.RtfStringToDocument(_EBookMaker.TitlePage.RtfContent);

string txt = docTitlePage.Range.Text;
FinalDoc.AppendDocument(docTitlePage, ImportFormatMode.KeepSourceFormatting);

// render all chapters and sections...
foreach (EBookSectionChapter chapter in _EBookMaker.ChapterList)
{
    Document ChapterDoc = AsposeHelperManager.RtfStringToDocument(chapter.RtfContent);
    FinalDoc.AppendDocument(ChapterDoc, ImportFormatMode.KeepSourceFormatting);
    foreach (EBookSectionChapterSection section in chapter.EBookSectionChapterSectionList)
    {
        Document SectionDoc = AsposeHelperManager.RtfStringToDocument(section.RtfContent);
        FinalDoc.AppendDocument(SectionDoc, ImportFormatMode.KeepSourceFormatting);
    }
}

FinalDoc.UpdateFields();
FinalDoc.Save(saveFileDialog1.FileName, SaveFormat.Doc);

Hi Nitin,

Thanks for your query. You can use Document.RemoveAllChildren() method to remove first blank page. Please see the code below.

1. The first page of the final .doc file is always blank.

<span style=“font-family: “Courier New”; color: rgb(43, 145, 175);” lang=“EN-GB”>Document<span style=“font-family: “Courier New”;” lang=“EN-GB”> FinalDoc = new
Document();<o:p></o:p>

FinalDoc.RemoveAllChildren();

Document docCoverPage = new Document();

DocumentBuilder _DocumentBuilder = new DocumentBuilder(docCoverPage);

_DocumentBuilder.InsertImage(@"image.jpg");

FinalDoc.AppendDocument(docCoverPage, ImportFormatMode.KeepSourceFormatting);

Document docTitlePage = RtfStringToDocument("RTF STring");

string txt = docTitlePage.Range.Text;

FinalDoc.AppendDocument(docTitlePage, ImportFormatMode.KeepSourceFormatting);

FinalDoc.Save(MyDir + "AsposeOut.doc", SaveFormat.Doc);


I am working over your second query (Hyperlinks [#Freedom to Be~http://freedomtobe.com.au#]) and will update you asap.

Thank you for your quick reply!

Now the TOC is not showing. I think this is because i did not specify chapter's title text. I actually hold this title text in another RTF string. I think i may need to set the STYLE of the title's text but I do not know how i should do this since this text is in RTF format. See the code below highlighted in yellow.

Document FinalDoc = newDocument();<?xml:namespace prefix = o />

FinalDoc.RemoveAllChildren();

Document docCoverPage = new Document();

DocumentBuilder _DocumentBuilder = new DocumentBuilder(docCoverPage);

_DocumentBuilder.InsertImage(@"image.jpg");

FinalDoc.AppendDocument(docCoverPage,ImportFormatMode.KeepSourceFormatting);

Document docTitlePage = RtfStringToDocument("RTF STring");

FinalDoc.AppendDocument(docTitlePage,ImportFormatMode.KeepSourceFormatting);

I want to insert a TOC after my title page here

Document docChapter1Title = RtfStringToDocument("chapter 1 title RTF STring");

// how do i set the style on docChapterTitle?

FinalDoc.AppendDocument(docChapter1Title,ImportFormatMode.KeepSourceFormatting);

Document docChapter1Content = RtfStringToDocument("chapter 1 content RTF STring");

FinalDoc.AppendDocument(docChapter1Content,ImportFormatMode.KeepSourceFormatting);

FinalDoc.Save(MyDir + "AsposeOut.doc", SaveFormat.Doc);

Hi Nitin,

Thanks for your query. It would be great if you please share your document/RTF along with code for investigation purposes.

Hi Nitin,

Please use the following code snippet for your your second query. Hope this helps you. Please read following documentation links for your kind reference. Please let us know if you have any more queries.

2. In each document, I have special tags like this
[#Freedom to Be~http://freedomtobe.com.au]

<!–[if gte mso 10]>/* Style Definitions */ table.MsoNormalTable {mso-style-name:“Table Normal”; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:“”; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:10.0pt; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:47.9pt; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:“Calibri”,“sans-serif”; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:“Times New Roman”; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:Arial; mso-bidi-theme-font:minor-bidi;}

Document doc = new Document(MyDir + "in.docx");

DocumentBuilder builder = new DocumentBuilder(doc);

Regex regex = new Regex("\\[#", RegexOptions.IgnoreCase);

FindLinks obj = new FindLinks();

doc.Range.Replace(regex, obj, true);

String link = "";

Node endNode = null;
Node currentNode = null;

ArrayList removenodes = new ArrayList();

foreach (Run run in obj.nodes)
{
    currentNode = run;
    link += currentNode.Range.Text;
    removenodes.Add(currentNode);

    while (!currentNode.Range.Text.Contains("#]"))
    {
        currentNode = currentNode.NextPreOrder(doc);
        removenodes.Add(currentNode);
        link += currentNode.Range.Text;
    }

    String[] LinkNode = link.Split(new Char[] { '~' });
    link = "";

    builder.MoveTo(run);
    builder.InsertHyperlink(LinkNode[0].Replace("[#", ""), LinkNode[1].Replace("#]", ""), false);
}

foreach (Node node in removenodes)
{
    node.Remove();
}

doc.Save(MyDir + "AsposeOut.docx");

/// <summary>
/// This is called during a replace operation each time a match is found.
/// This method appends a number to the match string and returns it as a replacement string.
/// </summary>
public class FindLinks : IReplacingCallback
{
    // Store Matched nodes
    public ArrayList nodes = new ArrayList();

    ReplaceAction IReplacingCallback.Replacing(ReplacingArgs e)
    {
        // This is a Run node that contains either the beginning or the complete match.
        Node currentNode = e.MatchNode;
        nodes.Add(currentNode);
        return ReplaceAction.Skip;
    }
}

Hi Tahir,

here’s a complete example of what I want to achieve.

The content (of a book) is held in my Object Model in a List of Objects -
and i want to convert this content into a final word document:

My Object Model objects are as follows…

Object 1: Filepath to an image (.jpg) file
Object 2: RTF Content (Title and some Copyright information)

Object 3: How We came to Mexico (This is JUST the first chapter’s title text and is in RTF format)
Object 4: RTF Content - This is the content that belongs to the above chapter
Object 5: A Word of Protest - This is the above Chapter’s Sub Section Title Text (In RTF format)
Object 6: RTF Content - Sub Section Content (In RTF format)

Object 7 : Mexico Traditions and Celebrations - This is the 2nd chapter’s Title text in RTF format
Object 8 : RTF Content - This is the content that belongs to the “Mexico Traditions and Celebrations” chapter
Object 9 : Tips for the Traveller in Mexico - This is the above Chapter’s Sub Section Title Text (In RTF format)
Object 10: RTF Content - Sub Section Content (In RTF format)


I would like the above Object model content to be converted into a word document with a TOC on the page after the title page.
Please see the attached word doc which is the final generated word doc I would like to see.

Please note that the Chapter’s header text (and also the Chapter’s sub section’s header text is in RTF format because there is font, size, justification etc formatting on the title text)


Hi Nitin,

Thanks for sharing the details. I am working over this query and will get back to you soon.

Hi Nitin,

Thanks for sharing the details. I have read your Object Model and suggest you following code snippet. You can get idea from following code snippet and use it in your application. Please also read following documentation links for your kind reference.

http://www.aspose.com/docs/display/wordsnet/How+to++Insert+a+Document+into+another+Document
http://www.aspose.com/docs/display/wordsnet/DocumentBuilder+Class
http://www.aspose.com/docs/display/wordsnet/DocumentBuilder+Members



<!–[if gte mso 9]>
<w:WordDocument>
<w:View>Normal</w:View>
<w:Zoom>0</w:Zoom>
<w:TrackMoves/>
<w:TrackFormatting/>
<w:HyphenationZone>21</w:HyphenationZone>
<w:PunctuationKerning/>
<w:ValidateAgainstSchemas/>
<w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
<w:IgnoreMixedContent>false</w:IgnoreMixedContent>
<w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
<w:DoNotPromoteQF/>
<w:LidThemeOther>PL</w:LidThemeOther>
<w:LidThemeAsian>X-NONE</w:LidThemeAsian>
<w:LidThemeComplexScript>AR-SA</w:LidThemeComplexScript>
<w:Compatibility>
<w:BreakWrappedTables/>
<w:SnapToGridInCell/>
<w:WrapTextWithPunct/>
<w:UseAsianBreakRules/>
<w:DontGrowAutofit/>
<w:SplitPgBreakAndParaMark/>
<w:DontVertAlignCellWithSp/>
<w:DontBreakConstrainedForcedTables/>
<w:DontVertAlignInTxbx/>
<w:Word11KerningPairs/>
<w:CachedColBalance/>
</w:Compatibility>
<w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
<m:mathPr>
<m:mathFont m:val=“Cambria Math”/>
<m:brkBin m:val=“before”/>
<m:brkBinSub m:val=“–”/>
<m:smallFrac m:val=“off”/>
<m:dispDef/>
<m:lMargin m:val=“0”/>
<m:rMargin m:val=“0”/>
<m:defJc m:val=“centerGroup”/>
<m:wrapIndent m:val=“1440”/>
<m:intLim m:val=“subSup”/>
<m:naryLim m:val=“undOvr”/>
</m:mathPr></w:WordDocument>
<![endif]–><!–[if gte mso 10]>

/* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:10.0pt; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:47.9pt; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:Arial; mso-bidi-theme-font:minor-bidi;}

<![endif]–>

Document FinalDoc = new Document();

DocumentBuilder builder = new DocumentBuilder(FinalDoc);

builder.MoveTo(FinalDoc.FirstSection.Body.FirstParagraph);

//Object 1: Filepath to an image (.jpg) file

Shape image = builder.InsertImage("d:\\Chrysanthemum.jpg");

//Object 2: RTF Content (Title and some Copyright information)

Document docChapter1Title = RtfStringToDocument("chapter 1 title RTF STring");

Paragraph paragraph = builder.InsertParagraph();

InsertDocument(paragraph, docChapter1Title);

builder.Writeln("");

builder.ParagraphFormat.StyleIdentifier = StyleIdentifier.Heading1;

builder.Font.Color = Color.Blue;

builder.Writeln("Table of Contents ");

builder.InsertTableOfContents("\\o \"1-3\" \\h \\z \\u");

builder.Writeln("");

//Object 3: How We came to Mexico (This is JUST the first chapter's title text and is in RTF format)

builder.Writeln("How We Came to Mexico");

builder.ParagraphFormat.StyleIdentifier = StyleIdentifier.Normal;

builder.Font.Color = Color.Black;

//Object 4: RTF Content - This is the content that belongs to the above chapter

builder.Writeln(@"It started on the beaches, both in Ixtapa, on the Pacific coast, and Playa del Carmen on the Gulf. In the 90s we began to come down for ....");

builder.Writeln("");

builder.ParagraphFormat.StyleIdentifier = StyleIdentifier.Heading2;

builder.Font.Color = Color.Blue;

builder.Writeln("A Word of Protest");

builder.ParagraphFormat.StyleIdentifier = StyleIdentifier.Normal;

builder.Font.Color = Color.Black;

builder.Writeln(@"One of the points I made in the Conclusions section of my book on the expatriate experience, San Miguel de Allende: a Pla .....");

FinalDoc.UpdateFields();

FinalDoc.Save(MyDir + "AsposeOut.doc", SaveFormat.Doc);

public void InsertDocument(Node insertAfterNode, Document srcDoc)

{

// Make sure that the node is either a paragraph or table.

if ((!insertAfterNode.NodeType.Equals(NodeType.Paragraph)) &

(!insertAfterNode.NodeType.Equals(NodeType.Table)))

throw new ArgumentException("The destination node should be either a paragraph or table.");

// We will be inserting into the parent of the destination paragraph.

CompositeNode dstStory = insertAfterNode.ParentNode;

// This object will be translating styles and lists during the import.

NodeImporter importer = new NodeImporter(srcDoc, insertAfterNode.Document, ImportFormatMode.KeepSourceFormatting);

// Loop through all sections in the source document.

foreach (Section srcSection in srcDoc.Sections)

{

// Loop through all block level nodes (paragraphs and tables) in the body of the section.

foreach (Node srcNode in srcSection.Body)

{

// Let's skip the node if it is a last empty paragraph in a section.

if (srcNode.NodeType.Equals(NodeType.Paragraph))

{

Paragraph para = (Paragraph)srcNode;

if (para.IsEndOfSection && !para.HasChildNodes)

continue;

}

// This creates a clone of the node, suitable for insertion into the destination document.

Node newNode = importer.ImportNode(srcNode, true);

// Insert new node after the reference node.

dstStory.InsertAfter(newNode, insertAfterNode);

insertAfterNode = newNode;

}

}

}

Please let us know if you have any more queries.

Hi Tahir,

It did not work completely. Plus I think I need to clarify some more as your
understanding was not quite what i wanted.

ok, so here goes again…

The contents (of a book) is held in my Object Model in a List of Objects -
and i want to convert this content into a final word document:

My Object Model objects are as follows…

Object 1: Filepath to an image (.jpg) file - This must appear on the FIRST page and resize and fit the page.
forced page break here

Object 2: RTF Content (This is RTF content)
forced page break here

Object 3: RTF Content (This is title of the Chapter1 so this must be of style Heading 1 so that it will appear in the TOC)
Object 4: RTF Content - This is the RTF content (contains text and images) that belongs to the above chapter1

Object 5: RTF Content (This is a sub section title 1 under Chapter1 so this must be of style Heading 2 so that it will appear in the TOC under Chapter 1)
Object 6: RTF Content for section 1 - can contain text and images!

Object 5: RTF Content (This is a sub section title 2 under Chapter1 so this must be of style Heading 2 so that it will appear in the TOC under Chapter 1)
Object 6: RTF Content for section 2- can contain text and images!


** Please note that there are many many Chapters and Sub Sections in the whole book so some form
of iteration is invloved
for example
I could have the following chapter and sections

- Chapter 1
section 1a
section 1b
section 1c
- Chapter 2
section 2a
- Chapter 3
section 3a
section 3b




I would like the above Object model content to be converted into a word document with an automatically
generated TOC on the page AFTER the title page. i.e The Book Cover Image (page1), Title & Copyright (Page 2), TOC (Page 3), followed by all chapters and sections etc…

Please see the previously attached word doc which is the final generated word doc I would like to see.

Hi Nitin,

Please accept my apologies for late response. Thanks for sharing the details. We’re checking the shared scenario and will get back to you soon.

Hi Tahir,

No need to apologize. In fact I THANK YOU! You are doing an awsome job. My company and myself (of course) love your products! We have Aspose Total and it's well worth the money. Plus your support response is the very best!

I look forward to your reply for the above secanrio i have given. Plus also note that

FYI...
I am using Aspose Word version 9.5.0.0