@k.sukumar,
To get the desired result you can remove unwanted TOC and unwanted paragraphs from your output document after producing it. Please check the following code example:
public static void RemoveTableOfContents(Document doc, int index)
{
// Store the FieldStart nodes of TOC fields in the document for quick access.
ArrayList fieldStarts = new ArrayList();
// This is a list to store the nodes found inside the specified TOC. They will be removed
// At the end of this method.
ArrayList nodeList = new ArrayList();
foreach (FieldStart start in doc.GetChildNodes(NodeType.FieldStart, true))
{
if (start.FieldType == FieldType.FieldTOC)
{
// Add all FieldStarts which are of type FieldTOC.
fieldStarts.Add(start);
}
}
// Ensure the TOC specified by the passed index exists.
if (index > fieldStarts.Count - 1)
throw new ArgumentOutOfRangeException("TOC index is out of range");
bool isRemoving = true;
// Get the FieldStart of the specified TOC.
Node currentNode = (Node)fieldStarts[index];
while (isRemoving)
{
// It is safer to store these nodes and delete them all at once later.
nodeList.Add(currentNode);
currentNode = currentNode.NextPreOrder(doc);
// Once we encounter a FieldEnd node of type FieldTOC then we know we are at the end
// Of the current TOC and we can stop here.
if (currentNode.NodeType == NodeType.FieldEnd)
{
FieldEnd fieldEnd = (FieldEnd)currentNode;
if (fieldEnd.FieldType == FieldType.FieldTOC)
isRemoving = false;
}
}
// Remove all nodes found in the specified TOC.
foreach (Node node in nodeList)
{
node.Remove();
}
}
Document doc = new Document(@"C:\Temp\32p531-val-of-analytical-proc-benzyl alcohol (1).doc");
DocumentBuilder builder = new DocumentBuilder(doc);
// removing unwanted TOC (list of tables) from document
RemoveTableOfContents(doc, 1);
//removing unwanted paragraps from first TOC and from document
int num = 1;
foreach (Node p in doc.GetChildNodes(NodeType.Paragraph, true))
{
if (p.GetText().Contains("LIST OF TABLES"))
{
if (num == 1)
p.Remove();
if (num == 3)
{
builder.MoveTo(p.NextSibling);
p.Remove();
break;
}
num += 1;
}
}
builder.InsertBreak(BreakType.PageBreak);
doc.Save(@"C:\Temp\32p531-val-of-analytical-proc-benzyl alcohol (1)-updated.doc");
Please also check the attached document, produced by the code above.
32p531-val-of-analytical-proc-benzyl alcohol (1)-updated.zip (59.0 KB)
Thanks for the help @sergey.lobanov but how can put this code in visual studio should i put this code after static void main or else should i put this code above static void main . you have send two codes iam not getting exact idea which code i need to take first…please help me
@k.sukumar,
First block of code is function, that removes an unwanted TOC of specified index from the document. Please put this function into the namespace of your project.
Second block of code is code, that uses the function above to remove unwanted TOC (in your case, the TOC with index 1), and then removes unwanted paragraphs (the paraghraps with “LIST OF TABLES” in their text, one from first TOC and one from document body) from your document.
Please put this code after producing your document (32p531-val-of-analytical-proc-benzyl alcohol (1).docx
).
thank you so much @sergey.lobanov
Hi @alexey.noskov ,
I’m facing an issue when removing section break, I have attached the document in that when I removed the section break on the first page the header was also removed. i have used this code to remove section section break.
private static void RemoveSectionBreaks(Document doc)
{
// Loop through all sections starting from the section that precedes the last one
// And moving to the first section.
for (int i = doc.Sections.Count - 2; i >= 0; i--)
{
// Copy the content of the current section to the beginning of the last section.
doc.LastSection.PrependContent(doc.Sections[i]);
// Remove the copied section.
doc.Sections[i].Remove();
}
}
can you help me to remove only section break not the header in the first page docQC28738_Output_O-3.2.A.1 Facilities and Equipment-Pfizer Grange Castle.zip (43.3 KB)
Hi Hafeez,
I’m facing an issue when removing section break, I have attached the document in that when I removed the section break on the first page the header was also removed. i have used this code to remove section break.
QC28738_Output_O-3.2.A.1 Facilities and Equipment-Pfizer Grange Castle.zip (43.3 KB)
private static void RemoveSectionBreaks(Document doc)
{
// Loop through all sections starting from the section that precedes the last one
// And moving to the first section.
for (int i = doc.Sections.Count - 2; i >= 0; i--)
{
// Copy the content of the current section to the beginning of the last section.
doc.LastSection.PrependContent(doc.Sections[i]);
// Remove the copied section.
doc.Sections[i].Remove();
}
}
can you help me to remove only the section break not the header in the first-page doc
@k.sukumar The problem occurs because headers and footers are defined in the first section of your document, but in your code you copy all content into the last section and remove the first section. You can modify your code like the following to get the desired result:
private static void RemoveSectionBreaks(Document doc)
{
// Append all content to the first section of the document.
while (doc.Sections.Count > 1)
{
doc.FirstSection.AppendContent(doc.Sections[1]);
doc.Sections[1].Remove();
}
}
However, you should note that different sections might have different headers/footers and if you remove all section breaks you will have headers/footers only from one section.
1 Like
thank you very much @alexey.noskov.
Hi @alexey.noskov,
i have attached the document in that document I am trying to remove the page break in the first page but it is not removed QC28881_Output_3.2.S.4.3.2 Validation of Analyt. Procedure( BET) - Cambrex.zip (20.6 KB)
can you help me to remove that page break in the first page .
I have tried this code but have not removed it.
Document doc = new Document(MyDir + "Remove page breaks.docx");
// Retrieve all paragraphs in the document.
NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);
foreach (Paragraph para in paragraphs)
{
// If the paragraph has a page break set before, then clear it.
if (para.ParagraphFormat.PageBreakBefore)
para.ParagraphFormat.PageBreakBefore = false;
// Check all runs in the paragraph for page breaks and remove them.
foreach (Run run in para.Runs)
if (run.Text.Contains(ControlChar.PageBreak))
run.Text = run.Text.Replace(ControlChar.PageBreak, string.Empty);
}
@k.sukumar I have checked your document on my side and your code successfully remove page break from your document. Please make sure you have attached the correct file.
Hi @alexey.noskov,
with this code in my document if any other sections have existed they are also removed. but i want only the first-page section has to be removed . can you help me with this issue? the above code you send to me for above document i have attached to you.QC28738_Output_O-3.2.A.1 Facilities and Equipment-Pfizer Grange Castle.zip (43.3 KB)
@k.sukumar In the attached document there is a section break at the very beginning of the document, i.e. there is a section with empty body. However, this section still have header/footer, which, I suppose, you would like to keep. In this case you can use code like the following:
Document doc = new Document(@"C:\Temp\in.doc");
while (doc.Sections.Count > 0 && String.IsNullOrEmpty(doc.FirstSection.Body.ToString(SaveFormat.Text).Trim()))
{
doc.FirstSection.AppendContent(doc.Sections[1]);
doc.Sections[1].Remove();
}
doc.Save(@"C:\Temp\out.doc");
Hi @awais.hafeez,
Actually, I am trying to read existing citations in the document but I am unable read them can you help to read that citation and add it to a list.
I have attached a document in that when you go to the last page you will find reference page there are 20 references. when you double click numbers like 1,2,3 it will directly route to the endnotes of that number.
i want to read those particular citations can you help me to read those citations.
we have code on how to create citations .
// Open a document containing bibliographical sources that we can find in
// Microsoft Word via References -> Citations & Bibliography -> Manage Sources.
Document doc = new Document(MyDir + "Bibliography.docx");
DocumentBuilder builder = new DocumentBuilder(doc);
builder.Write("Text to be cited with one source.");
// Create a citation with just the page number and the author of the referenced book.
FieldCitation fieldCitation = (FieldCitation)builder.InsertField(FieldType.FieldCitation, true);
// We refer to sources using their tag names.
fieldCitation.SourceTag = "Book1";
fieldCitation.PageNumber = "85";
fieldCitation.SuppressAuthor = false;
fieldCitation.SuppressTitle = true;
fieldCitation.SuppressYear = true;
Assert.AreEqual(" CITATION Book1 \\p 85 \\t \\y", fieldCitation.GetFieldCode());
// Create a more detailed citation which cites two sources.
builder.InsertParagraph();
builder.Write("Text to be cited with two sources.");
fieldCitation = (FieldCitation)builder.InsertField(FieldType.FieldCitation, true);
fieldCitation.SourceTag = "Book1";
fieldCitation.AnotherSourceTag = "Book2";
fieldCitation.FormatLanguageId = "en-US";
fieldCitation.PageNumber = "19";
fieldCitation.Prefix = "Prefix ";
fieldCitation.Suffix = " Suffix";
fieldCitation.SuppressAuthor = false;
fieldCitation.SuppressTitle = false;
fieldCitation.SuppressYear = false;
fieldCitation.VolumeNumber = "VII";
Assert.AreEqual(" CITATION Book1 \\m Book2 \\l en-US \\p 19 \\f \"Prefix \" \\s \" Suffix\" \\v VII", fieldCitation.GetFieldCode());
// We can use a BIBLIOGRAPHY field to display all the sources within the document.
builder.InsertBreak(BreakType.PageBreak);
FieldBibliography fieldBibliography = (FieldBibliography)builder.InsertField(FieldType.FieldBibliography, true);
fieldBibliography.FormatLanguageId = "1124";
Assert.AreEqual(" BIBLIOGRAPHY \\l 1124", fieldBibliography.GetFieldCode());
doc.UpdateFields();
doc.Save(ArtifactsDir + "Field.CITATION.docx");
@k.sukumar It looks like you have missed to attach your document. Could you please attach it here for our reference. We will check it and provide you more information.
@k.sukumar What you are asking about are footnotes, you can use code like this to read them:
Document doc = new Document(@"C:\Temp\in.docx");
NodeCollection footnotes = doc.GetChildNodes(NodeType.Footnote, true);
foreach (Footnote note in footnotes)
{
if (note.FootnoteType == FootnoteType.Endnote)
Console.WriteLine(note.ToString(SaveFormat.Text));
}
Hi @alexey.noskov,
thanks for the help but unable read that footnote …
But I have tried this code to get citations with this code it worked
class Program
{
static void Main(string[] args)
{
Document doc = new Document("E:\\ABC citatinos.docx");
bool status = false;
bool bookmarkEx = false;
LayoutCollector layout = new LayoutCollector(doc);
List<int> lst = new List<int>();
List<object> objlst = new List<object>();
foreach (Section sct in doc.Sections)
{
foreach (Paragraph pr in sct.Body.GetChildNodes(NodeType.Paragraph, true))
{
foreach (Field field in pr.Range.Fields)
{
if (field.Type == FieldType.FieldCitation)
{
FieldCitation citation = (FieldCitation)field;
if (citation.Suffix != null)
{
bookmarkEx = true;
status = true;
}
if (status == true && bookmarkEx)
{
if (layout.GetStartPageIndex(field.Start) != 0)
lst.Add(layout.GetStartPageIndex(field.Start));
}
}
}
}
}
}
}
}
ABC citatinos.docx (15.5 KB)
But I have an issue reading the works site below text. I am able to read works site above text by above code .which i have pasted will you help me to read below works site citations.
@awais.hafeez,
yes this is the sample document which we are working
ABC citatinos.docx (15.5 KB)
This is the document for my exact issue.
After work site iam unable to read below citations. will you help me read
@k.sukumar The filed below Workd Cited
is not CITATION
, but BIBLIOGRAPHY
field. Please see the screenshot:
You can press Alt+F9
in MS Word to see fields codes.
So you should add one more condition in your code:
foreach (Field field in pr.Range.Fields)
{
if (field.Type == FieldType.FieldCitation)
{
// Process CITATION field
}
if (field.Type == FieldType.FieldBibliography)
{
// Process BIBLIOGRAPHY field.
}
}