Scale Images, Show Section Numbers during DOC to HTML Conversion using Java API

I still have difficult to convert the attached doc. Here are the problems.

  1. The bullet number gets reset after 6.1. For example, it should be 6.1.1 not 1.1.1.

  2. Diagrams are not converted. (one in architecture overview and one in design overview,see the word doc)

  3. unordered lists are converted to ordered lists. (i.e. the list in 1.2 Audience)

  4. ordered list without sub-numberings are coverted to sub-numbering (i.e. the list in 3.6 Availability)

Could you please try the attached doc and see if the above problems can be fixed?

Thanks,

-Bin

Hi

Thank you for additional information. As you see there is a very complicated system of number restarting. And there are many special cases.

Regarding your question:

  1. I fixed this problem.

  2. This occurs because currently Word drawing objects are not rendered during HTML conversion. This is issue #1146 in our defect database.
    Issue #1146 – Render shapes as image files when exporting to HTML
    As a workaround you can insert diagrams into your document as raster images (JPG, PNG etc).

  3. Fixed by adding additional condition.

  4. I think this could be fixed by using ListLevel.NumberFormat. I will further investigate the issue and provide you more information.

Attached is updated ListLabelsExtractor class. And here is updated ReplaceListLabels

private void ReplaceListLabels(Document doc)
{
    //Get collection of Paragraphs from the document
    NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);

    //Loop through all paragraphs
    foreach (Paragraph par in paragraphs)
    {
        if (par.IsListItem && par.HasChildNodes)
        {
            if (par.ListFormat.ListLevel.NumberStyle != NumberStyle.Bullet)
            {
                ListLabelsExtractor extractor = ListLabelsExtractor.GetLabelExtractor(par.ListFormat.List);
                //Get label of list item
                string label = extractor.GetListLabel(par.ListFormat.ListLevelNumber) + "\t";
                //Create run that will represent label in the document
                Run labelRun = new Run(doc, label);

                //We should import paragraph indents
                par.ParagraphFormat.LeftIndent = par.ListFormat.ListLevel.TextPosition;
                par.ParagraphFormat.FirstLineIndent = par.ListFormat.ListLevel.NumberPosition;

                Console.WriteLine(label + "\t" + par.ToTxt());

                //Remove list label
                par.ListFormat.RemoveNumbers();
                //Insert label at the begining of paragraph
                par.ChildNodes.Insert(0, labelRun);
            }
        }
    }
}

Best regards.

  1. I think this could be fixed by using ListLevel.NumberFormat. I will further investigate the issue and provide you more information

Any update on that.

Also, could you try to convert the attached document? I found the links are broken after section 6.6. This is the document I’ve been using to test the conversion. My goal is to get it converted with correct bullet numberings.

Thanks,

-Bin

Hi

Thanks for your request. I fixed issue with numbering. But I didn’t fix issue with “ordered list without sub-numberings” yet. There are few things I will implement tomorrow and provide you the solution.

Thanks for your patience.

Best regards.

Hi

Please try using the attached class and the following method to replace list labels.

private void ReplaceListLabels(Document doc)
{
    //Get collection of Paragraphs from the document
    NodeCollection paragraphs = doc.GetChildNodes(NodeType.Paragraph, true);

    //Loop through all paragraphs
    foreach (Paragraph par in paragraphs)
    {
        if (par.IsListItem && par.HasChildNodes)
        {
            if (par.ListFormat.ListLevel.NumberStyle != NumberStyle.Bullet)
            {
                ListLabelsExtractor extractor = ListLabelsExtractor.GetLabelExtractor(par.ListFormat.List);
                //Get label of list item
                string label = extractor.GetListLabel(par.ListFormat.ListLevelNumber) + "\t";

                //Create run that will represent label in the document
                Run labelRun = new Run(doc, label);

                //We should import paragraph indents
                par.ParagraphFormat.LeftIndent = par.ListFormat.ListLevel.TextPosition;
                par.ParagraphFormat.FirstLineIndent = par.ListFormat.ListLevel.NumberPosition;

                //Remove list label
                par.ListFormat.RemoveNumbers();
                //Insert label at the begining of paragraph
                par.ChildNodes.Insert(0, labelRun);
            }
        }
    }
}

Please let me know in case of any issues.

Best regards.

I tested your latset code using the test doc I sent you. I found bullet numbers 6.6.2 - 6.6.11 in TOC are not hyperlinked. Can you try it again?

Also, ordered lists don’t seem to be indented. For example,

1. list1
a. list a

becomes

1. list1
a. list a

Other than that, it looks good. It’s getting very close.

Thanks,

-Bin

Hello Bin!

Thank you for experimenting with this code. It gives just an idea how to overcome the issue with list labels. Of course it doesn’t support all features, preserve indents etc. You can manipulate with ParagraphFormat of the corresponding paragraphs directly.

Now Alexey is on a short vacation till January 3. I’ll ask him to take a look on this thread in a few days.

Regards,

Hi Bin,

Thanks for your request.

  1. This problem occurs because Aspose.Words does not Update TOC. TOC in your document is not actual. If you update TOC before conversion all will works fine. Actually there are 6.6.1-6.6.7 items in your document but TOC contains also 6.6.8-6.6.11 items.

  2. Regarding the second problem I created new issue #7061 in our defect database. I will notify you as soon as it is fixed.

Best regards.

Hello Bin!

Thank you for your patience. Issue #7061 appeared to be a consequence of another known issue #6807. I’m closing the first one and re-linking your thread to the second. We’ll notify you when it is fixed. As a workaround in your case you can request embedded or external CSS style sheet on HTML export. See this page:
https://reference.aspose.com/words/net/aspose.words.saving/cssstylesheettype

Regards,

Any update on the following issue?

Issue #1146 – Render shapes as image files when exporting to HTML
This issue is a showstopper for us right now because our customers use shape objects quite often and it’s not acceptable to them that shape objects are not exported to HTML.

Thanks,

-Bin

Hi

Thanks for your request. Unfortunately, this issue is still unresolved. Currently we are working on our rendering engine. We are planning to support rendering of shapes separately from all document. This will allow us to resolve this issue.

Best regards.

We are planning to support rendering of shapes separately from all document. This will allow us to resolve this issue.

Could you tell me more about this feature? Is it the same as converting Word doc to image for viewing/printing? The problem with this approach is that links in the image won’t be clickable.

Also any timeline when it will be available?

Thanks,

-Bin

Hi

Thanks for your inquiry. Yes, this is task related to the generating image for viewing/printing. Aspose.Words can render whole document into images, however there is no way to render only particular shape to image at the moment.

Currenly, I canoot provide you any reliable estimate regarding this issue. Hopefully, this feature will be supported somewhere in Q2-Q3 2009.

Best regards.

Is there any way to pull in the fix for the following issue?

Issue #1146 – Render shapes as image files when exporting to HTML

Since it’s a showstopper for us and our license is up for renewal, our management is questioning whether or not we should renew the license if it’s not fixed.

Thanks,

-Bin

Hello!

Sorry for long delay. We have consulted regarding issue #1146 and increased its priority. I’ll look how it can be implemented but I cannot share any ETA by now. Thank you for understanding.

Regards,

The issues you have found earlier (filed as 3701) have been fixed in this update.

How about the issue #1146 ? Any update on ETA?

Thanks,

-Bin

Hi

Thanks for your inquiry. Unfortunately, I cannot provide you any reliable estimate regarding this issue at the moment. You will be notified as soon as it is resolved.

Best regards.

Hi,

Do you have any update on the following issue? Last time I was told it will be released in Q2-Q3 2009. Is it available now?

Issue #1146 – Render shapes as image files when exporting to HTML

As you can see, I have been inquery this fix for a while. I hope you have given it enough priority.

Thanks,

-Bin

Hi

Thanks for your request. Unfortunately, this feature is still unavailable. You will be notified as soon as it is implemented.

Best regards.