Read RichTextFormat from doc file

Hi Team,
I have a Word document which has some formatted Text(bold,underline, bullets etc). Now, this has to be copied exactly in Excel file. From Excel, using Pagination technique, I am writing the data to PPT slides which has to be sent to our end users.
I tried including the Doc as OLE Object in both excel and PPT. But, if the Doc is huge, OLE Object is not getting split up to consecutive Slides.
Below is my question:

  1. Is there a way to split the OLE Object from Java? If not, Please suggest me a way to accomplish the task.

Thanks.

Hi
Thanks for your request. There is no way to split OLE objects using Aspose.Words. However, in your case, I think, you can create a custom converter from Word to PowerPoint. For instance, you can try using the same technique as suggested here:
https://forum.aspose.com/t/101902
Hope this could help you to achieve what you need. Please let me know if you need more assistance, I will be glad to help you.
Best regards,

Hi Alexey,

Thanks for your quick reply.
Converter can be a better option. But the link which you have suggested, does speak very less about the converter and how that is implemented.

Will it be possible for you to provide me that attachment? I am unable to access that.
Secondly, can we convert Word document to Excel sheet too using Converter ?

Thankyou.

Hi
Thanks for your request. I attached the source code here. Also, you can find Word2Excel converter in our samples:
https://releases.aspose.com/words/net
Both converters are for .NET and written in C#. But I think it would not be difficult for you to translate them to Java.
Best regards,

Thats very helpful of you Alexey.

I can see the code to convert Word to PPT. As you said, that will not be difficult to translate that to JAVA.
But, I could not get the code to convert Word2Excel in the below link you have provided. Instead, Excel to Word is available.

Can you please help me in getting Word2Excel converter? Not a problem even if exists for .Net.

That will be a very great help to me… I have been searching for this from past two days.

Thanks.

Hi
Thanks for your request. I attached Word2Excel converter as well. But the problem with conversion from Word to Excel is that Excel document is one big table. And it is quite difficult to make that converted document look the same as the original one.
Anyways, you are free to change and experiment with this code. Hope this helps.
Best regards,

Thanks for sharing the converter.
With out converting the file, is there a way to directly copy the RichText contents from Word document to Excel cell ? Can that be possible by any bridge conversions ? Say, word to HTML and HTML to Excel… something like that…
Thanks.

Hi there,
Thanks for your inquiry.
Yes I believe this is possible using Aspose.Cells, from memory the Cell class has a property “HtmlString” for this purpose. You can export parts of your document to HTML and import them into a new Worksheet object.
However you will need to post your query on the Aspose.Cells forum to find the best way to insert HTML (e.g can you import HTML into an entire work sheet).
Thanks,

Hi Team,

I have tried converting the .Net code to “ConvertDoc2XLS” to java code. That works perfectly fine for Text. But the output could not retain bullets which exist in the source Doc file.

I tried to find that out but with no luck. Am attaching the java code for the converter. Can you please help me in handling the bullets?

Thanks.

Hi
Thanks for your request. I think, in your case you should replace list labels with simple text and then convert document to XLS. The following code should help you to achieve this:

[Test]
public void Test001()
{
    // Open document
    Document doc = new Document(@"Test001\in.doc");
    // Replace list labels with plain text
    ReplaceListLabels(doc);
    // Save output document
    doc.Save(@"Test001\out.doc");
}
private void ReplaceListLabels(Document doc)
{
    // Get collection of Paragraphs from the document
    NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);
    DocumentBuilder builder = new DocumentBuilder(doc);
    // Loop through all paragraphs
    foreach(Paragraph par in paragraphs)
    {
        if (par.IsListItem && (par.HasChildNodes || !par.IsEndOfSection))
        {
            if (par.ListFormat.ListLevel.NumberStyle != NumberStyle.Bullet)
            {
                ListLabelsExtractor extractor = ListLabelsExtractor.GetLabelExtractor(par.ListFormat.List);
                // Get label of list item
                string label = extractor.GetListLabel(par.ListFormat.ListLevelNumber) + "\t";
                // Create run that will represent label in the document
                Run labelRun = new Run(doc, label);
                // We should import paragraph indents
                double leftIndent = par.ListFormat.ListLevel.NumberPosition;
                Console.WriteLine(label + "\t" + par.ToTxt());
                // Remove list label
                par.ListFormat.RemoveNumbers();
                // Insert label at the begining of paragraph
                par.ChildNodes.Insert(0, labelRun);
                // par.ParagraphFormat.ClearFormatting();
                par.ParagraphFormat.LeftIndent = leftIndent;
            }
        }
    }
}

I attached ListLAbelExtractor class. Hope this helps. Please feel fee to ask in case of any issues, we will be glad to assist you.
Best regards,

Hi,
Thanks for sharing the file (ListLabelsExtractor.java).
But this code doesnt take care of Bullets formatting. Can you please help me handling Bullets too ?
Currently, if am trying to convert the doc which has bullets, it gives some junk chars in converted Excel file.
Thanks alot for your support.

Hi
Thanks for your request. I think, you can just replace all bullets in the document with “\u2022” character (•). You can try inserting bullet as a default value of list label.
Please let me know if you need more assistance, I will be glad to help you.
Best regards,

Hi Alexey,
Thakyou very much for your continuous support.
As per your suggestion, I tried with ‘\u2022’. But still the output is same with “*” as bullet instead of dot as bullet.
Am attaching the Java code (converted version of .Net code) for this. Could you please guide me on where exactly we can get this ? Along with the code change, please explain too the flow so that I can understand it better.
I have almost achieved the output except for the bullet part. If this too is achieved, its really a great thing for me.
Thanks once again

Attachments here. Sorry, missed them in the previous email

Hi
Thanks for your request. Please try using the following code:

///
/// Get label of current level
///
/// List level number
///
private String GetLevelLabel(int level)
{
    String label = "";
    try
    {
        // Get current level
        ListLevel currentLevel = mList.getListLevels().get(level);
        List currentList = mList;
        // If current level is not linked to another list then we just get label of current level
        // Otherwice we should get label of linked list
        if (currentLevel.getLinkedStyle() != null &&
            currentLevel.getLinkedStyle().getListFormat().getList() != null &&
            !currentList.equals(currentLevel.getLinkedStyle().getListFormat().getList()))
        {
            currentList = currentLevel.getLinkedStyle().getListFormat().getList();
            currentLevel = currentList.getListLevels().get(level);
        }
        // Build list label
        if (currentLevel.getNumberStyle() != NumberStyle.NONE)
        {
            int currentPosition = ListLabelsExtractor.getLabelLists().get(currentList).Labels().get(currentLevel);
            switch (currentLevel.getNumberStyle())
            {
                case NumberStyle.LOWERCASE_LETTER:
                {
                    label = String.valueOf(currentPosition).toLowerCase();
                    break;
                }
                case NumberStyle.UPPERCASE_LETTER:
                {
                    label = String.valueOf(currentPosition).toUpperCase();
                    break;
                }
                case NumberStyle.LOWERCASE_ROMAN:
                {
                    label = GetLowerRoman(currentPosition);
                    break;
                }
                case NumberStyle.UPPERCASE_ROMAN:
                {
                    label = GetLowerRoman(currentPosition).toUpperCase();
                    break;
                }
                default:
                {
                    label = "\u2022";
                    break;
                }
            }
            // We should also set previouse level for Linked List
            // Otherwice restart method will work incorrectly
            ListLabelsExtractor.getLabelLists().get(currentList).setPrevLevel(mPrevLevel);
        }
    }
    catch (Exception e)
    {
        e.printStackTrace();
    }
    return label;
}

Hope this helps. If not, please share also your version of ReplaceListLabels method.
Best regards,

Hi Alexey,
Thanks alot for all your support. Finally I got it :slight_smile: Below are the details.
Though the above code returns the “dot” as bullet,
label = String.format(GetLevelNumberFormat(endLevel), levelNumbers); this is again giving start bullet.
Since I only had bullets in my formatted text, I have just hardcoded the label as “\u2022”. So I need not have to call GetListlabels in the LabelExtractor. This worked perfectly.
Below is the code for “ReplaceListLabels”

private void ReplaceListLabels(Document doc)
{
    try
    {
        // Get collection of Paragraphs from the document
        NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
        DocumentBuilder builder = new DocumentBuilder(doc);
        // Loop through all paragraphs
        for (Paragraph par: paragraphs)
        {
            if (par.isListItem() && (par.hasChildNodes() || !par.isEndOfSection()))
            {
                if (par.getListFormat().getListLevel().getNumberStyle() != 0)
                {
                    ListLabelsExtractor extractor = ListLabelsExtractor.*GetLabelExtractor * (par.getListFormat().getList());
                    // Get label of list item
                    String label = "\u2022"; //extractor.GetListLabel(par.getListFormat().getListLevelNumber()) + "\t";
                    // Create run that will represent label in the document
                    Run labelRun = new Run(doc, label);
                    // We should import paragraph indents
                    double leftIndent = par.getListFormat().getListLevel().getNumberPosition();
                    // Remove list label
                    par.getListFormat().removeNumbers();
                    // Insert label at the begining of paragraph
                    par.getChildNodes().insert(0, labelRun);
                    // par.ParagraphFormat.ClearFormatting();
                    par.getParagraphFormat().setLeftIndent(leftIndent);
                }
            }
        }
    }
    catch (Exception e)
    {
        e.printStackTrace();
    }
}

Hi Malathi,
It is perfect that you managed to achieve what you needed. Please feel free to ask in case of any issues. We will be glad to help you.
Best regards,