Merge and Consolidate Citations


#1

We are looking to combine multiple documents that each may contain citations. Currently the citations are included at the original position at the end of each document we combine. Does Apsose have any support for consolidating these citations into a single list at the end of the document (or a specified location)? Is there any support to remove any duplicate references or at least a way we can access the citation and its corresponding number in the document to do this manually? Thanks for any help that can be provided.


#2

@jferguson9018,

To ensure a timely and accurate response, please ZIP and attach the following resources here for testing:

  • Your simplified input Word documents
  • Aspose.Words generated output document showing the undesired behavior
  • Your expected document showing the correct output. You can create expected document by using MS Word. Please also list the steps that you performed in MS Word to create expected document.
  • Please also create a simplified standalone application (source code without compilation errors) that helps us to reproduce your current problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your scenario and provide you more information. Thanks for your cooperation.


#3

Thank you for the response. I have put together a quick zip containing the requested information as best as possible. I do not have access to the citation plugin used, so I had to manually generate the result document. In this example the same citations are present in both documents, so the result should be a single set of references at the end of the document (duplicates removed and consolidated). If the references were different in the documents we would need to do some renumbering as well when consolidating, so each item pointing to a reference would point to the correct one if there were changes.

Nothing special was required to be done in the code, just appending documents and saving the output.

Citations.zip (92.7 KB)


#4

@jferguson9018,

In this case, you can get the ‘expected’ output by using the following code:

Document doc = new Document("E:\\Citations\\Citations01.docx");
Document doc1 = new Document("E:\\Citations\\Citations02.docx");

doc.AppendDocument(doc1, ImportFormatMode.KeepSourceFormatting);

int i = 0;
foreach (Field field in doc.Range.Fields)
{
    if (field.Type == FieldType.FieldAddin)
    {
        FieldAddIn addIn = (FieldAddIn)field;
        if (addIn.GetFieldCode().Equals("ADDIN RW.BIB"))
        {
            if (i < 1)
            {
                Paragraph para = (Paragraph)addIn.Start.GetAncestor(NodeType.Paragraph);
                if (para != null)
                {
                    Paragraph prevPara = (Paragraph)para.PreviousSibling;
                    if (prevPara.ToString(SaveFormat.Text).Trim().Equals("References"))
                    {
                        prevPara.Remove();
                    }
                }
                            
                addIn.Remove();
                i++;
            }
            else
            {
                break;
            }
        }
    }
}

foreach(Section sec in doc.Sections)
    sec.PageSetup.SectionStart = SectionStart.Continuous;

doc.Save("E:\\Citations\\19.7.docx");

#5

Thank you for the reply. I think this will work for this exact scenario, but I am not sure it will work in the following scenario. Notice that My first reference is used in both documents references, but the other references are unique. Any numbers pointing to the 3rd reference in the second document should end up pointing to the first reference in the merged document in this scenario.

First Document References
1. My first reference
2. My second reference
3. My third reference

Second Document References

  1. My Second document first reference
  2. My Second document second reference
    3. My first reference

Desire Merge References
1. My first reference
2. My second reference
3. My third reference
4. My Second document first reference
5. My Second document second reference


#6

@jferguson9018,

I think, you can build on the following code to get the desired output. The code removes all FieldAddIn objects and manually writes entries at the end of document.

Document doc = new Document("E:\\Temp\\Citations\\Citations01.docx");
Document doc1 = new Document("E:\\Temp\\Citations\\Citations02.docx");

doc.AppendDocument(doc1, ImportFormatMode.KeepSourceFormatting);

ArrayList list = new ArrayList();
foreach (Field field in doc.Range.Fields)
{
    if (field.Type == FieldType.FieldAddin)
    {
        FieldAddIn addIn = (FieldAddIn)field;
        if (addIn.GetFieldCode().Equals("ADDIN RW.BIB"))
        {
            string[] entries = addIn.DisplayResult.Split(new char[] { '\r' });
            foreach(string entry in entries)
            {
                if (!list.Contains(entry))
                {
                    list.Add(entry);
                }
            }

            addIn.Remove();
        }
    }
}

DocumentBuilder builder = new DocumentBuilder(doc);
builder.MoveToDocumentEnd();
foreach (string entry in list)
{
    // format enteries manually
    // builder.ListFormat.List = doc.Lists.Add(ListTemplate.NumberArabicDot);
    builder.Font.Color = Color.Green;
    builder.Writeln(entry);
}

foreach (Section sec in doc.Sections)
    sec.PageSetup.SectionStart = SectionStart.Continuous;

doc.Save("E:\\Temp\\Citations\\19.7.docx");

Hope, this helps.