How to split word file into different html out files

shivaji_dole · January 7, 2016, 3:51am

Hi,

I was trying to convert word document to html file.

I have input word document with number of question and answers, want each pair of question answer separately generate in html output file. that is each question./answer want in different file.

Q/A can have images and headings aswell.

I tried with direct conversion it generates in a single html file.

Can you please suggest me as how can i acheive this.

Thanks in advance.
Shivaji

awais.hafeez · January 8, 2016, 4:18am

Hi Shivaji,

Thanks for your inquiry. Please attach your sample Word document here for testing. We will investigate the scenario on our end and provide you code to meet this requirement.

Best regards,

shivaji_dole · January 8, 2016, 7:02am

Hi,

Thank you for looking into my request.

Please find the attached document with sample question/answer provided which we suppose to use as input to aspose during the conversion.
If required we can provide some separator for each question/answer.

we want two html files to be generated in this case, as there are 2 question/answer pair included in this document.

we have more than 300(Q/A)pages to generate.

please let me know if more details required on this.

Thanks.
Shivaji

awais.hafeez · January 11, 2016, 1:36am

Hi Shivaji,

Thanks for your inquiry. Please try using the following code:

Document doc = new Document(getMyDir() + “sample_q_a_reviewer.docx”);

NodeCollection paras = doc.getChildNodes(NodeType.PARAGRAPH, true);
ArrayList paraNos = new ArrayList();

for (int i = 0; i< paras.getCount(); i++)
{
    Paragraph para = (Paragraph) paras.get(i);
    if (para.getChildNodes().getCount() > 0 && para.getChildNodes().get(0).getNodeType() == NodeType.RUN) {
        if (para.getRuns().get(0).getFont().getBold() == true &&
                para.getRuns().get(0).getFont().getSize() == 18) {
            paraNos.add(i);
        }
    }
}

for (int i = 0; i < paraNos.size(); i++) {
    Paragraph startPara = (Paragraph) paras.get((int)paraNos.get(i));
    Paragraph endPara;

    if (i + 1 == paraNos.size())
        endPara = doc.getLastSection().getBody().getLastParagraph();
    else
        endPara = (Paragraph) paras.get((int)paraNos.get( i + 1));

    if (endPara != null) {
        ArrayList extractedNodes = extractContent(startPara, endPara, true);
        Document dstDoc = generateDocument(doc, extractedNodes);

        if (i + 1 != paraNos.size())
            dstDoc.getLastSection().getBody().getLastParagraph().remove();

        dstDoc.save(getMyDir() + “out-Q-” + i + “.html”);
    }
}

Hope, this helps.

PS: Please get definitions of extractContent and generateDocument methods from this article:

Extract Content Overview and Code

Best regards,

shivaji_dole · January 12, 2016, 12:50am

Thank you for the reply,

I will try this and update here if I come across any issues further.

Thanks,
Shivaji

shivaji_dole · January 13, 2016, 6:07am

I am able to achieve my requirement with the given sample.

Thank you for the help.