Aspose PageBreaks not matching with WordObjectModel Page breaks

@Gayatri_K,

The issues you have found earlier (filed as WORDSNET-14844) have been fixed in this Aspose.Words for .NET 17.7 update and this Aspose.Words for Java 17.7 update.

Hi,

We are not converting the document to PDF.We are using RenderedDocument class to read the content of the document.

@Gayatri_K,

Thanks for your inquiry. Please use the latest version of Aspose.Words for .NET 17.7. Hope this helps you.

if you still face problem, please create a standalone console application (source code without compilation errors) that helps us to reproduce your problem on our end and attach it here for testing.

PS : Please ZIP and update the application.

Hi ,

PFA of console applicationSampleApp.zip (66.7 KB)

@Gayatri_K,

Thanks for your inquiry. We have tested the scenario using following code example and have not found the shared issue. We have saved the text of each page into TXT file and attached them with this post for your kind reference. output.zip (5.6 KB)

RenderedDocument layoutDoc = new RenderedDocument(AsposeDoc);
int i = 1;
foreach (RenderedPage page in layoutDoc.Pages)
{

    System.IO.File.WriteAllText(@"c:\output" + i + ".txt", page.Text);
    Console.WriteLine(page.Text);
    Console.WriteLine("-----------------------------------------------");
    i++;
}

Hi,

We don’t only need the page text but the hyperlink field text as well.Please find below code from where we are getting the text like footnote,comment etc.Similarly if we can get hyperlink field text, will be helpful.

private string GetLineText(RenderedLine line)
{
string lineText = string.Empty;

		int firstRun = 1;
		foreach (RenderedSpan span in line.Spans)
		{
			switch (span.Kind)
			{
				case "TAB":
				{
					lineText += ControlChar.Tab; 
					break;
				}
				case "SECTION":
				{
					lineText += ControlChar.SectionBreak; 
					break;
				}
				case "PARAGRAPH":
				{
					lineText += ControlChar.ParagraphBreak; 
					break;
				}

                case "FOOTNOTEREFERENCE":
                {
                    lineText += ControlChar.SpaceChar;
                    break;
                }
				case "FIELDSTART":
				{
					if (firstRun == 1)
					{
						//get fieldCode text from parentNode
						string fsText = (span.ParentNode).GetText();
						string fcText = getFieldCodeText((FieldStart)(span.ParentNode));
						lineText += string.Concat(fsText, fcText);
					}
					firstRun++;
					break;
				}
				
				case "FIELDSEPARATOR":
				{
					lineText += ControlChar.FieldSeparatorChar; //span.Text;
					break;
				}
				case "FIELDEND":
				{
					lineText += span.ParentNode.GetText();
					firstRun = 1;
					break;
				}
			}
		}

		return lineText;
	}

@Gayatri_K,

Thanks for your inquiry. In your case, we suggest you following solution. Hope this helps you.

  1. Clone the document.
  2. Iterate through all fields.
  3. Move the cursor to the field and insert field code at the position of field. Please use Field.GetFieldCode method to get the text between field start and field separator.
  4. Use the RenderedPage.Text to get the desired output.

Hi Team,

We tried the above steps, able to get the fieldcode. But still the page_break is not coming correctly.

We have a property in RenderedDocument.cs i:e

public LayoutEntity Parent
	{
		get
		{
			return mParent;
		}
	}

and a method

private void ProcessLayoutElements(LayoutEntity current)
{
do
{
LayoutEntity child = current.AddChildEntity(mEnumerator);

            if (mEnumerator.MoveFirstChild())
            {
                current = child;

                ProcessLayoutElements(current);
                mEnumerator.MoveParent();

                current = current.Parent;
            }
        } while (mEnumerator.MoveNext());
    }

where parent is not coming correctly.

PFA of sample document.

third page break should come after 9.17 line but it is coming after 9.15.Sample.zip (32.9 KB)

@Gayatri_K,

You want the page’s text along with field codes. In this case, page’s text may flow to next page.

To ensure a timely and accurate response, please attach the following resources here for testing:

  • Please attach the output documents that show the undesired behavior.
  • Please attach the expected output documents that shows the desired behavior.

We will investigate the issue and provide you more information on this along with code. Thanks for your cooperation.

Hi,

Please see our previous reply.Please let me know if you still need anything from our side.

@Gayatri_K,

You shared new document (Sample.docx) that contains the TOC field in this forum thread. The application you shared is DocumentLayoutHelper utility. Please share how are you using this utility to reproduce the issue.

We have tested again the scenario using following code example and have not found any issue with third page’s output.

foreach (RenderedPage page in layoutDoc.Pages)
{
    foreach (RenderedLine item in page.GetChildEntities(LayoutEntityType.Line, true))
    {
        pagetext += item.Text;
    }
    System.IO.File.WriteAllText(@"c:\output" + i + ".txt", page.Text);
    Console.WriteLine(pagetext);
    Console.WriteLine("-----------------------------------------------");
    i++;
}

We need the simplified code example that you are using to reproduce the issue along with problematic output and expected output documents. We will then provide you more information about your query along with code to get desired output.

Hi please find the sample application.Sample application.zip (563.8 KB)
Now we are not facing the issue with hyperlink.But the property which we have in RenderedDocument.cs i:e

public LayoutEntity Parent
{
get
{
return mParent;
}
}
and a method

private void ProcessLayoutElements(LayoutEntity current)
{
do
{
LayoutEntity child = current.AddChildEntity(mEnumerator);

        if (mEnumerator.MoveFirstChild())
        {
            current = child;

            ProcessLayoutElements(current);
            mEnumerator.MoveParent();

            current = current.Parent;
        }
    } while (mEnumerator.MoveNext());
}

where parent is not coming correctly.

@Gayatri_K,

Thanks for your inquiry. It is nice to hear from you that your problem with hyperlink has resolved.

Regarding your query about LayoutEntity where Parent value is incorrect, we have tested the scenario using following code snippet and have not found the shared issue.

foreach (RenderedPage page in layoutDoc.Pages)
{
    foreach (LayoutEntity item in page.GetChildEntities(LayoutEntityType.Line, true))
    {
        Console.WriteLine("**" + item.Parent);
    }
}

We have also tested the scenario using following code snippet to get the field codes of fields. We have not found any issue with ParentNode.

    foreach (RenderedSpan span in ((RenderedLine)child).Spans)
    {
        switch (span.Kind)
        {
            case "FIELDSTART":
                {
                    //get fieldCode text from parentNode
                    string fsText = (span.ParentNode).GetText();
                    Console.WriteLine(((FieldStart)span.ParentNode).GetField().GetFieldCode());
                    break;
                }
        }
    }

Please note that RenderedDocument is not part of Aspose.Words API. We suggest you please read the classes of Aspose.Words.Layout namespace.