Fragmented URI link after Doc to Pdf conversion

When I convert Word documents that contain URI links inside, links are fragmented on pdf side (each word or space has own link). In opposite if same document is converted to Pdf with MS Word built-in conversion link are only fragmented on new line.

It’s painful because we have some manual validation process for output files and it increases amount of work that needs to be done.
Included screen, example files and UT. Tested with Aspose.Words 11.10 and earlier.

[Test]
public void Doc_To_Pdf_Produces_Unfragmented_Link()
{
    Document doc = new Document();
    DocumentBuilder builder = new DocumentBuilder(doc);
    builder.InsertHyperlink("This is link to some content and its realy realy realy realy realy realy realy long long.", "https://www.aspose.com/", false);
    doc.Sections[0].PageSetup.PaperSize = PaperSize.A5;

    doc.Save("source.doc", SaveFormat.Doc);
    doc.Save("aspose.pdf", SaveFormat.Pdf);
    doc = null;

    Aspose.Pdf.Document pdf = new Aspose.Pdf.Document("aspose.pdf");
    Assert.AreEqual(2,pdf.Pages[1].Annotations.Count,"LinkAnnotation is fragmented.");
}

Regards
Jacek Bator

Hi Jacek,

Thanks for your inquiry. I have managed to reproduce the same issue at my side. I have logged this issue as WORDSNET-7406 in our issue tracking system. I have linked this forum thread to the same issue and you will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Hello,

Do you have any plans to fix this links fragmentation?

Regards
Jacek Bator

Hi Jacek,

Thanks for your inquiry. Unfortunately, your issue is still unresolved. We apologise once again for the inconvenience.

P.S: Since you reported this issue via Aspose.Words’ Normal Forum, WORDSNET-7406 was logged with ‘Normal Priority’ in our issue tracking system and when an issue is reported via normal (free) support forum we make no promises about estimations on when a fix can/will be delivered. To avoid this in future, please always post your critical issues in the Enterprise Support Forum since only issues posted in the Enterprise Support forum are treated with that much priority. Please tell us if you would like to avail your Enterprise Support subscription and raise the priority of WORDSNET-7406?

Best regards,

will you please tell me how to raise the priority to enterprise issue?

here are my questions.
Hi,
I need your help at this moment. It is urgent very much.
questions came to me when I splitted a *.doc file into files by each page.
1、when a paragraph cross into two pages, when split the doc by each page, can not get the exactly position of the last word of the page.(ps, the last word of a page may be an english word or a chinese word.
2、the same question lies in table across two or more pages.

About enterprise priority have to ask my management. Soon we will start to develop new version of our product. Have to ask if they need this to be fixed now or want to save enterprise tickets for more important road-blockers we may hit (we already have partial workaround for this problem in our code).

1,2 I’m don’t fully understand what feedback do you need. MS Word creates one link when link/reference text is in same/single line. If reference/link is has two lines on same or different pages it creates two links/annotations.

I don’t know your internals but naive implementation could be if there is on page another run that has same top/bottom and right from earlier equals left from later put them in same annotation.Of course those runs have to be inside same link/reference field. But I’m not sure what will happen if the are built different size fonts

Regards
Jacek Bator

Hi Xiaohua,

Thanks for your inquiry.

jihu31:
will you please tell me how to raise the priority to enterprise issue?

Please read the detail about Enterprise Support from here:
https://helpdesk.aspose.com/kb/faq/3-Enterprise-Support-Key-Benefits-Conditions

jihu31:
1 . when a paragraph cross into two pages, when split the doc by each page, can not get the exactly position of the last word of the page.(ps, the last word of a page may be an english word or a chinese word.
2. the same question lies in table across two or more pages.

Please note that MS Word document is flow document and does not contain any information about its layout into lines and pages. Therefore, technically there is no “Page” concept in Word document. Pages are created by Microsoft Word on the fly.

Aspose.Words uses our own Rendering Engine to layout documents into pages. Please check using the DocumentLayoutHelper sample from the offline samples pack. This sample demonstrates how to easily work with the layout elements of a document and access the pages, lines, spans etc.

Hope this answers your query. Please let us know if you have any more queries.

Thank you for your time.
I can not get the right page count from layoutDoc.Pages.Count with 11.docx file from the sample code from the offline samples pack.
When the page count of 11.docx file not more than 5 it works fine, but fail when the page count of 11.docx file is more than 5.
There is something wrong when I load a BP-PJM .doc file with the same code. Error: ArgumentOutOfRangeExcepti occur in LayoutEntities.cs, Line 281.
------>>>>
My sample code list below. And please pay your attention to the attachment for test.

static void Main(string[] args)
{
    #region 
    ////// set license
    // Aspose.Words.License license = new Aspose.Words.License();
    // 
    //// This line attempts to set a license from several locations relative to the executable and Aspose.Words.dll.
    //// You can also use the additional overload to load a license from a stream, this is useful for instance when the 
    //// license is stored as an embedded resource 
    // try
    // {
    // license.SetLicense("Aspose.Words.lic");
    // }
    // 
    // catch (Exception e)
    // {
    //// We do not ship any license with this example, visit the Aspose site to obtain either a temporary or permanent license. 
    // Console.WriteLine("There was an error setting the license: " + e.Message);
    // }
    #endregion
    string dataDir = Path.GetFullPath("../../Data/");
    // Document doc = new Document(dataDir + "TestFile.docx");
    Document doc = new Document(dataDir + "11.docx");
    // This sample introduces the RenderedDocument class and other related classes which provide an API wrapper for 
    // the LayoutEnumerator. This allows you to access the layout entities of a document using a DOM style API.

    // Create a new RenderedDocument class from a Document object.
    RenderedDocument layoutDoc = new RenderedDocument(doc);
    // The following examples demonstrate how to use the wrapper API. 
    // This snippet returns the third line of the first page and prints the line of text to the console.
    RenderedLine line = layoutDoc.Pages[0].Columns[0].Lines[2];
    Console.WriteLine("Line: " + line.Text);
    // With a rendered line the original paragraph in the document object model can be returned.
    Paragraph para = line.Paragraph;
    Console.WriteLine("Paragraph text: " + para.Range.Text);
    // Retrieve all the text that appears of the first page in plain text format (including headers and footers).
    string pageText = layoutDoc.Pages[0].Text;
    Console.WriteLine();
    // Loop through each page in the document and print how many lines appear on each page.
    foreach (RenderedPage page in layoutDoc.Pages)
    {
        LayoutCollection lines = page.GetChildEntities(LayoutEntityType.Line, true);
        Console.WriteLine("Page {0} has {1} lines.", page.PageIndex, lines.Count);
    }
    // This method provides a reverse lookup of layout entities for any given node (with the exception of runs and nodes in the
    // header and footer).
    Console.WriteLine();
    Console.WriteLine("The lines of the second paragraph:");
    foreach (RenderedLine paragraphLine in layoutDoc.GetLayoutEntitiesOfNode(doc.FirstSection.Body.Paragraphs[1]))
    {
        Console.WriteLine(string.Format("\"{0}\"", paragraphLine.Text.Trim()));
        Console.WriteLine(paragraphLine.Rectangle.ToString());
        Console.WriteLine();
    }
}

Hi Xiaohua,

Thanks for your inquiry. I have tested the scenario and have not found any issue while using latest version of Aspose.Words for .NET v 13.6.0. Please use the latest version of Aspose.Words for .NET.

Issue 1 : Page count issue (11.docx). Please see the attached image for results with latest version.

RenderedDocument layoutDoc = new RenderedDocument(doc);
int pages = layoutDoc.Pages.Count;

Issue 2 : Error: ArgumentOutOfRangeExcepti occur in LayoutEntities.cs. You are using index 2 (Lines[2]) in the code which throws this exception. Please use the following code snippet to avoid this exception.

RenderedLine line = layoutDoc.Pages[0].Columns[0].Lines[layoutDoc.Pages[0].Columns[0].Lines.Count - 1];
Console.WriteLine("Line: " + line.Text);

The issues you have found earlier (filed as WORDSNET-7406) have been fixed in this .NET update and this Java update.

This message was posted using Notification2Forum from Downloads module by aspose.notifier.