Anchor html

Alex_HU · May 29, 2015, 7:55am

Hi,
I would like to know if I can read the anchors of a html file (in java)? That is to say, if I am able with Aspose to verify where the file pointed…
Thanks,
Alex

awais.hafeez · June 1, 2015, 4:18am

Hi Alex,

Thanks for your inquiry. Sure, you can achieve this using Aspose.Words. Please try using the following Java code:

Document doc = new Document();

DocumentBuilder builder = new DocumentBuilder(doc);
// insert hyperlink pointing to Aspose’s website

builder.insertHtml("<a href=“http://www.aspose.com/”>Home");



for(Field field : (Iterable<Field>)doc.getRange().getFields())

{

if(field.getStart().getFieldType() == FieldType.FIELD_HYPERLINK){

FieldHyperlink link = (FieldHyperlink) field;
    System<font color="BLUE"><b>.</b></font>out<font color="BLUE"><b>.</b></font>println<font color="BLUE"><b>(</b></font><font color="PURPLE">"---------------"</font><font color="BLUE"><b>)</b></font><font color="BLUE"><b>;</b></font>
    System<font color="BLUE"><b>.</b></font>out<font color="BLUE"><b>.</b></font>println<font color="BLUE"><b>(</b></font>link<font color="BLUE"><b>.</b></font>getAddress<font color="BLUE"><b>(</b></font><font color="BLUE"><b>)</b></font><font color="BLUE"><b>)</b></font><font color="BLUE"><b>;</b></font>
    System<font color="BLUE"><b>.</b></font>out<font color="BLUE"><b>.</b></font>println<font color="BLUE"><b>(</b></font>link<font color="BLUE"><b>.</b></font>getResult<font color="BLUE"><b>(</b></font><font color="BLUE"><b>)</b></font><font color="BLUE"><b>)</b></font><font color="BLUE"><b>;</b></font>
<font color="BLUE"><b>}</b></font>

}

I hope, this helps.

Best regards,

Alex_HU · June 8, 2015, 7:34am

Hi,
Thanks for your request. I tested your code on my file, and I have (in output):
---------------
Adr page_0.html
Res (Provide a document title)
---------------
Adr page_1.html
Res Introduction
---------------
Adr page_2.html
Res Follow-up of the evolutions
---------------
Adr page_3.html
Res 1 Scope
---------------
Adr page_3.html
Res 1.1 Identification
---------------
Adr page_3.html
Res 1.2 System overview
---------------
Adr page_3.html
Res 1.3 Document overview […]

However, in my file .html, I would like to recover the element, for example “page_3.hmtl#Element_223323332…”…

.html file:

Introduction

awais.hafeez · June 9, 2015, 1:36am

Hi Alex,

Thanks for your inquiry. I am afraid, Aspose.Words can’t do such document recoveries. Aspose.Words builds Document instance from whatever information is present in input HTML file. Most of the time, it mimics MS Word behavior. If we can help you with anything else, please feel free to ask.

Best regards,

Alex_HU · June 9, 2015, 4:03am

Hi,
Thanks for your request.
So, are you saying that is impossible to recover all the information contained in a hyperlink?

Alex_HU · June 9, 2015, 6:42am

Actually, I tested on a Word document? I am able to recover the information of the hyperlink:

(console output)
Paragraph (Style Normal, java.awt.Color[r=0,g=0,b=0]) : HYPERLINK \l “Element_4040009190” Configure Availability Of Services a
FieldStart (22)
FieldSeparator (23)
FieldEnd (24)

However, with html code, I can’t recover the “#Element_999198187” even if it is written in the html code ( href=“page_3.html#Element_999198187”). How can I fix it?

awais.hafeez · June 10, 2015, 1:51am

Hi Alex,

Thanks for your inquiry. Please ZIP and attach your input HTML file for which MS Word is able to recover anchor's href attribute information here for testing. We will investigate the issue on our end and provide you more information.

Best regards,

Alex_HU · June 10, 2015, 8:04am

Hi,
Thanks for your help.
Please, find my html file in attachment.

awais.hafeez · June 11, 2015, 1:21am

Hi Alex,

Thanks for your inquiry. You can find this information inside FieldHyperlink.SubAddress Property.

I hope, this helps.

Best regards,

Alex_HU · June 11, 2015, 3:25am

Hi,
Thanks for your advice, I fixed my issue!

I would like to know if it is possible to manage the metadata of a html file… Indeed, “Document.CustomDocumentProperties” and “Document.BuiltInDocumentProperties” allows to manage Word metadata. But when I tried to use the same code for a html file, I had an issue: the algorithm can’t find the build-in and custom properties… Is it normal?

Regards,

awais.hafeez · June 12, 2015, 1:59am

Hi Alex,

Thanks for your inquiry. We have a class HtmlSaveOptions that you can use to specify additional options when saving a Word document into the Html, Mhtml or Epub format. Please refer to the HtmlSaveOptions.ExportDocumentProperties Property. This option can be used to specify whether to export built-in and custom document properties to HTML, MHTML or EPUB. Default value is false. I hope, this helps.

Best regards,