Converting a Text stream to docx

write2priyank · December 6, 2011, 10:00am

Hi,
I want to convert a text stream to docx. Can you please tell me how I will achieve this??

CODE:

Document doc = new Document(getPathToTemplateDoc());
System.out.println("Template fields ::" + doc.getMailMerge().getFieldNames());

NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true, false);
System.out.println(shapes.getCount());

for (Shape shape : (Iterable)shapes)
    if (shape != null)
    {
        if (shape.hasImage())
        {
            shape.remove();
        }
    }

ByteArrayOutputStream dstStream = new ByteArrayOutputStream();

doc.save(dstStream, SaveFormat.TEXT);

ByteArrayInputStream srcStream = new ByteArrayInputStream(dstStream.toByteArray());
Document docx = new Document(srcStream);

return docx;

AndreyN · December 6, 2011, 1:04pm

Hello
Thanks for your request. You should use the code like the following:

ByteArrayInputStream srcStream = new ByteArrayInputStream(dstStream.toByteArray());
Document docx = new Document();
DocumentBuilder builder = new DocumentBuilder(docx);
try
{
    // You might need to specify a different encoding depending on your plain text files.
    BufferedReader reader = new BufferedReader(new InputStreamReader(srcStream, "UTF8"));
    String line = null;
    // Read plain text "lines" and convert them into paragraphs in the document.
    while ((line = reader.readLine()) != null)
    {
        builder.writeln(line);
    }
    reader.close();
}
finally
{
    if (srcStream != null) srcStream.close();
}
docx.save("C:\\Temp\\out.docx", SaveFormat.DOCX);

Best regards,

write2priyank · December 7, 2011, 3:00am

Thanks for your reply,
One more problem is there…I am trying to remove images after loading the file and again need the images while saving it as docx… I am using following code, but all the images are not removing …its still there what to do pls suggest me? Is there some particular shape of images we can remove or all… and again how can I get those images back…
Here is the code given by Aspose

Document doc = new Document("C:\test\in.docx");
NodeCollection shapes = doc.getChildNodes(NodeType.SHAPE, true, false);
for (int i = 0; i <shapes.getCount(); i++)
{
    Shape shape = (Shape) shapes.get(i);
    if (shape.hasImage())
    {
        shape.remove();
    }
}

Thank You,

AndreyN · December 7, 2011, 3:46am

Hello
Thanks for your request. It seems there are DrawingML objects inside your DOCX document.
https://reference.aspose.com/words/net/aspose.words.drawing/
You can use DocumentExplorer (Aspose.Words demo application) to inspect structure of your document. This demo is included into Aspose.Words MSI installation package.
So you should remove DrawingML images too:

// Open source document
Document doc = new Document("C:\\Temp\\in.docx");
// Get collection of Shapes and remove them
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
shapes.Clear();
// Get collection of DrawingML and remove them.
NodeCollection dmls = doc.GetChildNodes(NodeType.DrawingML, true);
dmls.Clear();
// Save as HTML
doc.Save("C:\\Temp\\out.docx");

Regarding getting images back, it is not so ease as sounds. First of all you need to mark the current position of a particular image for example by using Bookmarks [image#1]. Then you can try looping through the original document with images and import Images to current document using NodeImporter for example:
https://reference.aspose.com/words/net/aspose.words/nodeimporter/
Best regards,