Extract Content between Pages

AGontarek · September 28, 2011, 11:11am

simple test document attached. this produces the error that I included in the above post.

adam.skelton · September 29, 2011, 8:58am

Hi there,

Thanks for your inquiry.

I can’t reproduce any problem on my side. Make sure that values you pass to your method are within the valid page range (1 to 4 in the case of your document).

Thanks,

AGontarek · September 29, 2011, 2:17pm

I am definitely using a valid page range. May I ask, are you running this code in a Windows Form or a Web form? Because it actually does work in a Windows form…however, I need it to work in a Web form. It doesn’t make any sense to me why the DOCX files don’t work in both…the code is identical. I must be missing SOMETHING.

adam.skelton · September 29, 2011, 3:13pm

Hi there,

Thanks for this additional information.

Could you please attach a quick sample application which reproduces the issue here? I will take a further look into this for you.

Thanks,

AGontarek · September 30, 2011, 9:04am

Attached is the web code and the test document

adam.skelton · October 1, 2011, 5:05am

It seems you are using an older version of PageFinder in your web application while your using the newer one in your console application. Please make sure to use the new version found here: https://forum.aspose.com/t/77148

Thanks,

AGontarek · October 11, 2011, 12:56pm

Thank you, that did help with the page numbers, however, now it seems to be ignoring the headers.

Attached is the current web code and test document

adam.skelton · October 12, 2011, 2:39am

Hi there,

Thanks for this additional information.

This occurs because these headers are linked to the previous section. Since these sections are moved on their own to the new document they no longer display the content from the previous section’s header. You will see that the footer is not linked so it does not have this problem.

You can work around this by copying any linked header footers from the previous section.

You need to add the code below somewhere in the constructor:

/// <summary>
/// Creates a proper copy of any linked header/ footers into the sections of the document.
/// </summary>
private void CopyLinkedHeaderFooters()
{
    foreach (Section section in mOrigDoc)
    {
        if (section == mOrigDoc.FirstSection)
            continue;
        HeaderFooterCollection previousHeaderFooters = ((Section)section.PreviousSibling).HeadersFooters;

        foreach (HeaderFooter headerFooter in previousHeaderFooters)
        {
            if (section.HeadersFooters[headerFooter.HeaderFooterType] == null)
            {
                HeaderFooter newHeaderFooter = (HeaderFooter)previousHeaderFooters[headerFooter.HeaderFooterType].Clone(true);
                section.HeadersFooters.Add(newHeaderFooter);
            }
        }
    }
}

Thanks,

AGontarek · October 18, 2011, 9:39am

Thank you, this did get rid of the error. I finished the web application and began testing on various documents and began getting a different error.

I get “Cannot insert a node of this type at this location” on the following line:

Field fieldStart = builder.InsertField("PAGE", "1");

This happens no matter what pages I select to extract.

The problem is, due to the sensitive nature of the documents that I am testing, I cannot send you a sample.

adam.skelton · October 18, 2011, 5:35pm

Hi there,

Thanks for your inquiry.

I’m afraid without the input document it’s hard to know what the problem is. You can sanitize your document by replacing any confidential data with dummy data but you need to make sure the issue is still reproducible with the modified document.

Thanks,

AGontarek · October 19, 2011, 11:19am

OK, I think I was able to sanitize it enough to avoid problems and still retain the error. See attached.

adam.skelton · October 19, 2011, 10:42pm

Hi there,

Thanks for attaching your document here for testing.

I’m afraid however the document you attached is empty (0 bytes long) which causes an exception on load. Could you please check you attached the correct document to this thread?

Thanks,

AGontarek · October 20, 2011, 9:11am

Sorry about that. this one should work.

adam.skelton · October 22, 2011, 5:23am

Hi there,

Thanks for the additional information.

I managed to reproduce the issue on my side. I have made a fix for this in the existing attachment.

Thanks,

AGontarek · October 27, 2011, 10:00am

Thank you. I have a new error for you (see attached). Thank you so much for all of your help and quick responses.

adam.skelton · October 28, 2011, 5:41am

Hi there,

Thanks for your inquiry.

I managed to reproduce the issue on my side. The issue occurs because a PAGE field within an inlinestory (comment or footnote node) causes an exception when fields are updated.

Please try using the following work around for the time being. You need to add the following code to the InsertFieldsAtParagraphLevel method.

if(!IsHeaderFooterType(para) && !IsInlineStory(para))


private bool IsInlineStory(Node node)
{
    return node.GetAncestor(typeof(InlineStory)) != null;
}

If I can help with anything else, please feel free to ask.

Thanks,

AGontarek · November 17, 2011, 2:53pm

I have a new issue. I have a feeling this may relate to document size, but when I click the button to submit, I get a 500 error. Debugging returns no errors. I have attached a sample document using images to increase the document size.

adam.skelton · November 18, 2011, 4:01am

Hi there,

Thanks for your inquiry.

I got the same issue on my side using your project. According to the Windows Event Viewer the details behind the error are: “Post size exceeds allowed limits”. Most likely the issue is occurring with your web config not allowing files over a certain size to be processed and probably has nothing to do with Aspose.Words.

Thanks,

AGontarek · December 7, 2011, 10:05am

Thank you, that fixed it.

I have come across another issue. I have a document that is aproximately 500 pages. It does not exceed the size limit. My problem is that it is taking 20+ minutes for each parsed document to be completed…once you get to the 3rd cycle, the app times out. Now I know you will suggest increasing the timeout, but I don’t really want this app to take HOURS to run on one document (especially when we usually have multiple documents that we have to parse). The problem is that the ExtractContentBetweenPages function is looping through the ENTIRE document each time. Is there any way to jump out of the loop once you have passed the page number you designated as the “end page”?

Please use the code I attached in my previous post, it is the same.

Thanks for all of your help.
Alicia Gontarek

adam.skelton · December 8, 2011, 5:51am

Hi Alicia,

Thanks for your inquiry.

I’m afraid I’m not quite sure what you mean, that particular function starts at “startPage” and the loop finishes at “endPage”. It doesn’t seem to continue to the end of the document. Could you please clarify the issue?

Thanks,