Not returning the correct page number using bookmark object

Hi,

I am getting the page number by using the bookmark object but it is not returning the correct count i.e. in my original document for eg : if bookmark text is on page number 6 the result from your Aspose dll is returning the 7 i ma getting these types of discrepancies in my result.

@gaurav.budhiraja Upon building document layout Aspose.Words requires the fonts. If Aspose.Words cannot locate the required fonts, the fonts are substituted. This might lead to layout differences. You can implement IWarningCallback to get notification when font substitution is performed.

Also, please attach the problematic document along with code that will allow us to reproduce the problem? We will check the issue and provide you more information.

Hi @alexey.noskov

Please find the attached code for getting the bookmarks page number. There are 2 solutions one is TestAddin which is a addin in ms word which is used to add bookmark in the document and second one is api which is used to getting the bookmark from the active document using aspose. When i tried to run the solution i am getting the different page number using aspose when we convert the document using aspose.

Please find the sample attached document and also find the discrepancy document .

Discrepency.docx (14.4 KB)
file-sample_1MB.docx (1002.7 KB)

Please find the code link below

page-numbering-service.zip

Please let me know for any query

@gaurav.budhiraja Thank you for additional information. I have checked your document nd found font names in your document are specified not quite correctly. Several font names are specified separated by semicolon. Aspose.Words substitutes such font, since cannot find the fonts by exact name:

Font 'DejaVu Sans' has not been found. Using 'Times New Roman' font instead. Reason: font info substitution.
Font 'Open Sans;Arial' has not been found. Using 'Times New Roman' font instead. Reason: default font substitution.
Font 'Droid Sans Fallback' has not been found. Using 'Times New Roman' font instead. Reason: default font substitution.
Font 'Droid Sans Fallback' has not been found. Using 'Times New Roman' font instead. Reason: default font substitution.
Font 'Liberation Serif;Times New Roma' has not been found. Using 'Times New Roman' font instead. Reason: default font substitution.
Font 'Liberation Serif;Times New Roma' has not been found. Using 'Times New Roman' font instead. Reason: default font substitution.

Also as you can see DejaVu Sans, Droid Sans Fallback fonts are not available on my side. Could you please attach these fonts here for testing.
As i mentioned font substitution might lead to layout differences and as a result incorrect page number detection.

Hi @alexey.noskov,

I don’t have these font i have just downloaded the sample document from the internet. Can there is any way to ignore the font and get the right page number with the bookmark name ?

@gaurav.budhiraja No, layout cannot be build correctly without fonts. MS Word documents are flow documents and does not have any information about document layout. Document layout is build on the fly by the consumer application and depend on the fonts available on the system. So if the fonts used in the document are not available on your side you cannot be sure MS Word shows the document the same way as it is shown on the machine where the document was created. If you need to make sure MS Word document is displayed the same way in any environment, you can embed the fonts used in the document. In this case, both MS Word and Aspose.Words can use these fonts and build correct layout.

PS: Looks like you was not lucky to download buggy document.

Hi @alexey.noskov,

Can we do get the page number by converting the word document into the pdf using aspose pdf and then get the page number of the bookmark added

@gaurav.budhiraja Conversion to PDF will give you the same result since the same Layout Engine is used by LayoutCollector as for converting flow documents to PDF using Aspose.Words.

Hi @alexey.noskov,

So there is no way to achieve this functionality using aspose ?

@gaurav.budhiraja Of course, you can get bookmark page number using Aspose.Words. For example see the following simple code:

Document doc = new Document(@"C:\Temp\in.docx");
LayoutCollector collector = new LayoutCollector(doc);

foreach (Bookmark bk in doc.Range.Bookmarks)
{
    Console.WriteLine("{0} - start {1}; end {2}", 
        bk.Name, 
        collector.GetStartPageIndex(bk.BookmarkStart), 
        collector.GetEndPageIndex(bk.BookmarkEnd)) ;
}

But to get correct page number it is required to build correct document layout. To build correct document layout - fonts used in the document are required. Without fonts it is impossible to build correct layout neither in MS Word not in Aspose.Words or any other tools.

Hi @alexey.noskov,

How can we build correct document layout from a ms document ?

Please make this thread as public

@gaurav.budhiraja

As I already mentioned to build correct layout the fonts used in the document are required. You can either install the fonts in the system or put the required fonts into the folder and set it as fonts source. Please see our documentation to learn how to specify fonts location.

I have made the topic public.

Hi @alexey.noskov,

Is there any way to remove the evaluation text from the document while converting the byre array to aspose document while using unlicensed library

@gaurav.budhiraja If you would like to test Aspose.Words without evaluation version limitations you can request a temporary 30-days license.

Hi @alexey.noskov,

Now i am using the paid license of aspose and also my document content is in times new roman but now i am also not getting the correct page number also.

Please find the TestDoc.docx saved with the aspose library code . in orginal document there are 3 pages but after saving using aspose it becomes 4.

Please find the attached document TestDoc.docx (15.8 KB)
Discrepency.docx (14.2 KB)
file-sample_1MB.docx (1.4 MB)
Please find the below sample code

License license = new();
license.SetLicense(Path.Combine(Directory.GetCurrentDirectory(), "license/license.lic"));

var document = new Document(new MemoryStream(documentContents));

int d = document.PageCount;

document.Save("TestDoc.docx", SaveFormat.Docx);

GetSubstitutionWithoutSuffixes(document);


var builder = new DocumentBuilder(document);
var result = new List<BookmarkPageNumber>();
foreach (var bookmarkName in bookmarkNames)
{
    var item = new BookmarkPageNumber(bookmarkName);

    var bookmark = document.Range.Bookmarks[bookmarkName];
    if (bookmark == null)
        continue;

    item.BookmarkText = bookmark.Text;

    builder.MoveToBookmark(bookmark.Name, true, false);
    var page = builder.InsertField("PAGE");
    document.UpdatePageLayout();
    page.Update();

    if (int.TryParse(page.Result, out var pageNumber))
        item.PageNumberText = pageNumber.ToString();

    result.Add(item);

    page.Remove();
}

return result;

@gaurav.budhiraja Thank you for additional information. As I can see you are inserting updating and then deleting PAGE field ti get page number of the bookmark. This might have side effects since PAGE field has its content and can affect document layout. Much easier to use LayoutCollector to get page number where bookmark is placed:

Document doc = new Document(@"C:\Temp\in.docx");
LayoutCollector collector = new LayoutCollector(doc);

foreach (Bookmark bk in doc.Range.Bookmarks)
{
    Console.WriteLine("{0} - start {1}; end {2}",
        bk.Name,
        collector.GetStartPageIndex(bk.BookmarkStart),
        collector.GetEndPageIndex(bk.BookmarkEnd));
}

Also, I have checked your Discrepency.docx document and see the folllowing:

DISCREPANCY: BookmarkName: PNT_Main_029_1; BookmarkText: .; Truth: 1; Service: 2

Checked in MS Word and PNT_Main_029_1 bookmark is on the second page, just like Aspose.Words returns.
the same applies to the following:

DISCREPANCY: BookmarkName: PNT_Main_064_2; BookmarkText: . ; Truth: 2; Service: 3
DISCREPANCY: BookmarkName: PNT_Main_065_2; BookmarkText: Aliquam ; Truth: 2; Service: 3
DISCREPANCY: BookmarkName: PNT_Main_066_2; BookmarkText: venenatis; Truth: 2; Service: 3
DISCREPANCY: BookmarkName: PNT_Main_067_2; BookmarkText: , ; Truth: 2; Service: 3
DISCREPANCY: BookmarkName: PNT_Main_068_2; BookmarkText: vel ; Truth: 2; Service: 3
DISCREPANCY: BookmarkName: PNT_Main_069_2; BookmarkText: , ; Truth: 2; Service: 3
DISCREPANCY: BookmarkName: PNT_Main_070_2; BookmarkText: , ; Truth: 2; Service: 3
DISCREPANCY: BookmarkName: PNT_Main_071_2; BookmarkText: Class ; Truth: 2; Service: 3
DISCREPANCY: BookmarkName: PNT_Main_072_2; BookmarkText: . ; Truth: 2; Service: 3
DISCREPANCY: BookmarkName: PNT_Main_073_2; BookmarkText: imperdiet ; Truth: 2; Service: 3
DISCREPANCY: BookmarkName: PNT_Main_074_2; BookmarkText: at ; Truth: 2; Service: 3

bookmarks from PNT_Main_064_2 to PNT_Main_074_2 are on the third page.
Here is output produced by the code I have provided:

PNT_Main_001_1 - start 1; end 1
PNT_Main_002_1 - start 1; end 1
PNT_Main_003_1 - start 1; end 1
PNT_Main_004_1 - start 1; end 1
PNT_Main_005_1 - start 1; end 1
PNT_Main_006_1 - start 1; end 1
PNT_Main_007_1 - start 1; end 1
PNT_Main_008_1 - start 1; end 1
PNT_Main_009_1 - start 1; end 1
PNT_Main_010_1 - start 1; end 1
PNT_Main_011_1 - start 1; end 1
PNT_Main_012_1 - start 1; end 1
PNT_Main_013_1 - start 1; end 1
PNT_Main_014_1 - start 1; end 1
PNT_Main_015_1 - start 1; end 1
PNT_Main_016_1 - start 1; end 1
PNT_Main_017_1 - start 1; end 1
PNT_Main_018_1 - start 1; end 1
PNT_Main_019_1 - start 1; end 1
PNT_Main_020_1 - start 1; end 1
PNT_Main_021_1 - start 1; end 1
PNT_Main_022_1 - start 1; end 1
PNT_Main_023_1 - start 1; end 1
PNT_Main_024_1 - start 1; end 1
PNT_Main_025_1 - start 1; end 1
PNT_Main_026_1 - start 1; end 1
PNT_Main_027_1 - start 1; end 1
PNT_Main_028_1 - start 1; end 1
PNT_Main_029_1 - start 2; end 2
PNT_Main_030_2 - start 2; end 2
PNT_Main_031_2 - start 2; end 2
PNT_Main_032_2 - start 2; end 2
PNT_Main_033_2 - start 2; end 2
PNT_Main_034_2 - start 2; end 2
PNT_Main_035_2 - start 2; end 2
PNT_Main_036_2 - start 2; end 2
PNT_Main_037_2 - start 2; end 2
PNT_Main_038_2 - start 2; end 2
PNT_Main_039_2 - start 2; end 2
PNT_Main_040_2 - start 2; end 2
PNT_Main_041_2 - start 2; end 2
PNT_Main_042_2 - start 2; end 2
PNT_Main_043_2 - start 2; end 2
PNT_Main_044_2 - start 2; end 2
PNT_Main_045_2 - start 2; end 2
PNT_Main_046_2 - start 2; end 2
PNT_Main_047_2 - start 2; end 2
PNT_Main_048_2 - start 2; end 2
PNT_Main_049_2 - start 2; end 2
PNT_Main_050_2 - start 2; end 2
PNT_Main_051_2 - start 2; end 2
PNT_Main_052_2 - start 2; end 2
PNT_Main_053_2 - start 2; end 2
PNT_Main_054_2 - start 2; end 2
PNT_Main_055_2 - start 2; end 2
PNT_Main_056_2 - start 2; end 2
PNT_Main_057_2 - start 2; end 2
PNT_Main_058_2 - start 2; end 2
PNT_Main_059_2 - start 2; end 2
PNT_Main_060_2 - start 2; end 2
PNT_Main_061_2 - start 2; end 2
PNT_Main_062_2 - start 2; end 2
PNT_Main_063_2 - start 2; end 2
PNT_Main_064_2 - start 3; end 3
PNT_Main_065_2 - start 3; end 3
PNT_Main_066_2 - start 3; end 3
PNT_Main_067_2 - start 3; end 3
PNT_Main_068_2 - start 3; end 3
PNT_Main_069_2 - start 3; end 3
PNT_Main_070_2 - start 3; end 3
PNT_Main_071_2 - start 3; end 3
PNT_Main_072_2 - start 3; end 3
PNT_Main_073_2 - start 3; end 3
PNT_Main_074_2 - start 3; end 3
PNT_Main_075_3 - start 3; end 3
PNT_Main_076_3 - start 3; end 3
PNT_Main_077_3 - start 3; end 3
PNT_Main_078_3 - start 3; end 3
PNT_Main_079_3 - start 3; end 3
PNT_Main_080_3 - start 3; end 3
PNT_Main_081_3 - start 3; end 3
PNT_Main_082_3 - start 3; end 3
PNT_Main_083_3 - start 3; end 3
PNT_Main_084_3 - start 3; end 3
PNT_Main_085_3 - start 3; end 3
PNT_Main_086_3 - start 3; end 3
PNT_Main_087_3 - start 3; end 3
PNT_Main_088_3 - start 3; end 3
PNT_Main_089_3 - start 3; end 3
PNT_Main_090_3 - start 3; end 3
PNT_Main_091_3 - start 3; end 3
PNT_Main_092_3 - start 3; end 3
PNT_Main_093_3 - start 3; end 3
PNT_Main_094_3 - start 3; end 3
PNT_Main_095_3 - start 3; end 3
PNT_Main_096_3 - start 3; end 3
PNT_Main_097_3 - start 3; end 3
PNT_Main_098_3 - start 3; end 3
PNT_Main_099_3 - start 3; end 3
PNT_Main_100_3 - start 3; end 3

As I can see there are no discrepancy in the detected page numbers between Aspose.Words and MS Word 2019 on my side.

Hi @alexey.noskov

Now i am using above code

var document = new Document(new MemoryStream(documentContents));
LayoutCollector collector = new LayoutCollector(document);
var result = new List<BookmarkPageNumber>();
foreach (var bookmarkName in bookmarkNames)
{
    Bookmark bookmark = document.Range.Bookmarks[bookmarkName];
    BookmarkPageNumber item = new(bookmarkName)
    {
        BookmarkText = bookmark.Name,
        StartPageNumber = collector.GetStartPageIndex(bookmark.BookmarkStart).ToString(),
        EndPageNumber = collector.GetEndPageIndex(bookmark.BookmarkEnd).ToString()
    };
    result.Add(item);
}

still there is discripency in the result.

Please fidnt he attached documentDiscrepency.docx (14.5 KB)

in word document

below screenshot

. is on page number 1 but your code is returning 2

@gaurav.budhiraja Which version of MS Word do you use? I have cheked in MS word 2019 and as I mentioned PNT_Main_029_1 bookmark is on the second page, just like Aspose.Words returns.

Could you please save your document to PDF in MS Word and attach it here for our reference?