Extract Paragraphs under different headings from word Document

Hi there,

Thanks for your inquiry. The shared output documents are generated by Aspose.Words v15.8.0. Please upgrade to the latest version of Aspose.Words for Java 15.11.0 and ask for temporary license from following link.

Get temporary license

You are using Aspose.Words without license. Please note that in evaluation mode there are some limitations applied. E.g Aspose.Words injects an evaluation watermark at the top of the document. The document’s content are truncated after a certain number of paragraphs during import or export.

Please read the following article about applying license.

Applying a License

This will fix the shared issue. Please let us know if you have any more queries.

This template is not working. Please check. As Heading style now is Heading_1 instead of Heading 1

Hi there,

Thanks for your inquiry. ParagraphFormat.IsHeading property returns true when the paragraph style is one of the built-in Heading styles. In your case, we suggest you please check if a style name starts with “Heading” or not. Please check following highlighted code snippet. Hope this helps you.

if (para.hasChildNodes() && (para.getParagraphFormat().getStyle().getName().startsWith("Heading") || para.getParagraphFormat().isHeading()))
{
    Paragraph paragraph = new Paragraph(doc);
    para.getParentNode().insertBefore(paragraph, para);
    builder.moveTo(paragraph);
    builder.startBookmark("bm_extractcontents" + i);
    builder.endBookmark("bm_extractcontents" + i);
    i++;
}

Thanks for the reply. Yes it worked.

Please could you let me know how to add file name with embedded object while exporting a docx.

Hi there,

Thanks for your inquiry. It seems that your query is related to WORDSNET-12221. We will inform you via this forum thread once this issue is resolved.

If your query is not related to WORDSNET-12221, please share some more detail about your query. We will then provide you more information on this.

Thanks for such prompt responses. I appreciate the wonderful forum aspose has.

I have just one concern since beginning that extract word method works for very few specific templates. Even similar looking templates does not work. Please could you tell me a way I can find out what problem i have with my template.

PFA the template and for this again the extract word method does not work.This is the template for which it should really work.

I need it asap. Thanks

Hi there,

Thanks for your inquiry. We have tested the scenario using latest version of Aspose.Words for Java 15.12.0 and have not found the any issue. Please use Aspose.Words for Java 15.12.0. We have attached the output documents with this post for your kind reference.

I bought the license for aspose. But in my application I have declared Document object at 2 places so do I have to put

License license = new License();
license.setLicense(FilenameUtils.getFullPath(licPath)+"Aspose.Words.lic");

at both the places?

Hi there,

Thanks for your inquiry. No, the license only needs to be set once per application domain. Calling License.SetLicense multiple times is not harmful, but simply wastes processor time. Please read following documentation links for your kind reference.

Applying a License

My find and replace function does not work when the searched text has brackets or any other special symbols. PFA the find and replace code.

Thanks,

priyanka

Hi Priyanka,

Thanks for your inquiry. It would be great if you please share following detail for investigation purposes.

  • Please attach your input Word documents.
  • Please create a standalone Java application (source code without compilation errors) that helps us reproduce your problem on our end.
  • Please attach the output Word file that shows the undesired behavior.
  • Please attach your target Word document showing the desired behavior. You can use Microsoft Word to create your target Word document. We will investigate as to how you are expecting your final document be generated like.

Unfortunately, it is difficult to say what the problem is without the Document(s) and simplified application. We need your Document(s) and simple project to reproduce the problem. As soon as you get these pieces of information to us we’ll start our investigation into your issue.

PFA the documents for your reference.

Thanks,

Priyanka

Hi Priyanka,

Thanks for sharing the detail. Please use following code example to replace text with image. Hope this helps you. Please let us know if you have any more queries.

Document doc = new Document(MyDir  +"Input Document.docx");
Pattern regex = Pattern. compile  ("Desert \\(1\\). jpg", Pattern. CASE_INSENSITIVE );
doc.getRange().replace(regex, new ReplaceEvaluator(MyDir  +"Desert (1).jpg"), false  );
doc.getRange().replace("Desert (1). jpg", "", false  , false  );
regex = Pattern. compile  ("Sample\\{\\}\\$&.jpg", Pattern. CASE_INSENSITIVE );
doc.getRange().replace(regex, new ReplaceEvaluator(MyDir  +"Sample{}$&.jpg"), false  );
doc.getRange().replace("Sample{}$&.jpg", "", false  , false  );
// Save the modified document.
doc.save(MyDir  +"Out.docx");
public class ReplaceEvaluator implements IReplacingCallback
{
    String filepath = "";
    public ReplaceEvaluator(String file)
    {
        filepath = file;
    }

    public int replacing(ReplacingArgs e) throws Exception
    {
        // Get the match node
        Run run = (Run) e.getMatchNode();
        DocumentBuilder builder = new DocumentBuilder((Document)e.getMatchNode().getDocument());
        // Move to the match node
        builder.moveTo(run);
        builder.insertImage(filepath);
        return ReplaceAction.SKIP;
        }
    }
}

Hi,

Thanks, for the reply. But the string to be matched is dynamic in my case so I don’t know when to escape the string.

Thanks,

Priyanka

Hi Priyanka,

Thanks for your inquiry. In your case, you need to create the regular expression dynamically according to your requirements. Please read the detail of class Pattern and use backslashes and escape characters according to your contents/requirements.

Thanks for your prompt response. I am facing one more issue. API is not able to read the headings in the document. PFA the input document.

Hi Priyanka,

Thanks for your inquiry. Please check following code example. The ParagraphFormat.IsHeading property returns true for built-in Heading styles. Could you please share some more detail about your issue? We will then provide you more information about your query.

Document doc = new Document(MyDir + "test_doc.docx");
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
for (Paragraph para : (Iterable)paragraphs)
{
    System.out.println(para.getParagraphFormat().isHeading());
    System.out.println(para.getText());
}

Hi,

Pls find the following output in Aspose 15.11:

System.out.println(para.getParagraphFormat().isHeading()); //true
System.out.println(para.getText()); //TEST Error! Reference source not found. : Testing

Thanks,

Priyanka

Hi Priyanka,

Thanks for your inquiry. Perhaps, you are calling Document.updateFields method before getting the paragraph’s text. Your input document contains the REF fields which return “Error! Reference source not found” when these fields are updated. Please open your document in MS Word and update the fields by pressing F9. You will get the same output.

Please let us know if you have any more queries.

Hi,

Is it possible to convert Html to word and read style/formatting using VB Macro?

Please find below my convert Html to word code snippet for your reference.

String outputFile = contextPath+planName+".docx";
Document doc = new Document(file.getCanonicalPath());
DocumentBuilder builder = new DocumentBuilder(doc);
// Insert a table of contents at the beginning of the document.
// builder.insertTableOfContents("\o "1-3" \h \z \u");
builder.insertTableOfContents("TOC \o "1-6" \h \z \u \h");
builder.insertBreak(BreakType.PAGE_BREAK);
doc.updateFields();
doc.save(outputFile);
return new File(outputFile);

Thanks,

Priyanka