Extract Content between Pages

simple test document attached. this produces the error that I included in the above post.

Hi there,

Thanks for your inquiry.

I can’t reproduce any problem on my side. Make sure that values you pass to your method are within the valid page range (1 to 4 in the case of your document).

Thanks,

I am definitely using a valid page range. May I ask, are you running this code in a Windows Form or a Web form? Because it actually does work in a Windows form…however, I need it to work in a Web form. It doesn’t make any sense to me why the DOCX files don’t work in both…the code is identical. I must be missing SOMETHING.

Hi there,

Thanks for this additional information.

Could you please attach a quick sample application which reproduces the issue here? I will take a further look into this for you.

Thanks,

Attached is the web code and the test document

It seems you are using an older version of PageFinder in your web application while your using the newer one in your console application. Please make sure to use the new version found here: https://forum.aspose.com/t/77148

Thanks,

Thank you, that did help with the page numbers, however, now it seems to be ignoring the headers.

Attached is the current web code and test document

Hi there,

Thanks for this additional information.

This occurs because these headers are linked to the previous section. Since these sections are moved on their own to the new document they no longer display the content from the previous section’s header. You will see that the footer is not linked so it does not have this problem.

You can work around this by copying any linked header footers from the previous section.

You need to add the code below somewhere in the constructor:

/// <summary>
/// Creates a proper copy of any linked header/ footers into the sections of the document.
/// </summary>
private void CopyLinkedHeaderFooters()
{
    foreach (Section section in mOrigDoc)
    {
        if (section == mOrigDoc.FirstSection)
            continue;
        HeaderFooterCollection previousHeaderFooters = ((Section)section.PreviousSibling).HeadersFooters;

        foreach (HeaderFooter headerFooter in previousHeaderFooters)
        {
            if (section.HeadersFooters[headerFooter.HeaderFooterType] == null)
            {
                HeaderFooter newHeaderFooter = (HeaderFooter)previousHeaderFooters[headerFooter.HeaderFooterType].Clone(true);
                section.HeadersFooters.Add(newHeaderFooter);
            }
        }
    }
}

Thanks,

Thank you, this did get rid of the error. I finished the web application and began testing on various documents and began getting a different error.

I get “Cannot insert a node of this type at this location” on the following line:

Field fieldStart = builder.InsertField("PAGE", "1");

This happens no matter what pages I select to extract.

The problem is, due to the sensitive nature of the documents that I am testing, I cannot send you a sample.

Hi there,

Thanks for your inquiry.

I’m afraid without the input document it’s hard to know what the problem is. You can sanitize your document by replacing any confidential data with dummy data but you need to make sure the issue is still reproducible with the modified document.

Thanks,

OK, I think I was able to sanitize it enough to avoid problems and still retain the error. See attached.

Hi there,

Thanks for attaching your document here for testing.

I’m afraid however the document you attached is empty (0 bytes long) which causes an exception on load. Could you please check you attached the correct document to this thread?

Thanks,

Sorry about that. this one should work.

Hi there,

Thanks for the additional information.

I managed to reproduce the issue on my side. I have made a fix for this in the existing attachment.

Thanks,

Thank you. I have a new error for you (see attached). Thank you so much for all of your help and quick responses.

Hi there,

Thanks for your inquiry.

I managed to reproduce the issue on my side. The issue occurs because a PAGE field within an inlinestory (comment or footnote node) causes an exception when fields are updated.

Please try using the following work around for the time being. You need to add the following code to the InsertFieldsAtParagraphLevel method.

if(!IsHeaderFooterType(para) && !IsInlineStory(para))

private bool IsInlineStory(Node node)
{
    return node.GetAncestor(typeof(InlineStory)) != null;
}

If I can help with anything else, please feel free to ask.

Thanks,

I have a new issue. I have a feeling this may relate to document size, but when I click the button to submit, I get a 500 error. Debugging returns no errors. I have attached a sample document using images to increase the document size.

Hi there,

Thanks for your inquiry.

I got the same issue on my side using your project. According to the Windows Event Viewer the details behind the error are: “Post size exceeds allowed limits”. Most likely the issue is occurring with your web config not allowing files over a certain size to be processed and probably has nothing to do with Aspose.Words.

Thanks,

Thank you, that fixed it.

I have come across another issue. I have a document that is aproximately 500 pages. It does not exceed the size limit. My problem is that it is taking 20+ minutes for each parsed document to be completed…once you get to the 3rd cycle, the app times out. Now I know you will suggest increasing the timeout, but I don’t really want this app to take HOURS to run on one document (especially when we usually have multiple documents that we have to parse). The problem is that the ExtractContentBetweenPages function is looping through the ENTIRE document each time. Is there any way to jump out of the loop once you have passed the page number you designated as the “end page”?

Please use the code I attached in my previous post, it is the same.

Thanks for all of your help.
Alicia Gontarek

Hi Alicia,

Thanks for your inquiry.

I’m afraid I’m not quite sure what you mean, that particular function starts at “startPage” and the loop finishes at “endPage”. It doesn’t seem to continue to the end of the document. Could you please clarify the issue?

Thanks,