LayoutEnumerator role in PDF generation

PraneethS · December 3, 2014, 4:23am

Hey there,
We’ve got some questions on ‘LayoutEnumerator role in PDF generation’,
#1. Could you please highlight the role of LayoutEnumerator while generating the PDF.

#2. If we create the LayoutEnumerator instance before generating the PDF, does this add some time in PDF generation or ideally the time should not change (since LayoutEnumerator is always instantiated)?

#3. We assume that creating the LayoutEnumerator instance is expensive operation and is always done while genarting the DOCX to PDF internally, is this correct assumption?

thanks.

PraneethS · December 3, 2014, 8:51am

Some more information required:
#4. Generating the PDF directly from LayoutEnumerator instance, rather than Document.save()? Our goal is to separate out LayoutEnumerator from PDF generation to reduce some time in PDF creation.

tahir.manzoor · December 4, 2014, 1:00am

Hi Praneeth,

Thanks for your inquiry. Aspose.Words uses LayoutEnumerator and LayoutCollector classes while rendering document to Pdf. Please note that Aspose.Words uses our own Rendering Engine to layout documents into pages. The Aspose.Words.Layout namespace provides
classes that allow to access information such as on what page and where
on a page particular document elements are positioned, when the document
is formatted into pages.

When you convert a Word document to Pdf format, Aspose.Words builds the page layout of the document and has to create APS (Aspose Page Specification) model in memory. The LayoutCollector class collects mapping of document nodes to layout objects and computes page numbers of nodes.

If page layout model of the document hasn’t been built the LayoutEnumerator calls UpdatePageLayout to build it.

Whenever document is updated and new page layout model is created, a new LayoutEnumerator must be used to access it.

Hope this answers your queries. Please let us know if you have any more queries.

PraneethS · December 8, 2014, 7:45am

Hey Tahir,
If we create a LayoutEnumerator instance just before converting doc to
pdf, we see the actual time for doc.save got reduced. How the
LayoutEnumerator instance/object got associated with doc.save?

final PdfSaveOptions options = new PdfSaveOptions();
options.getOutlineOptions().setDefaultBookmarksOutlineLevel(1);
try {
doc.save(“docx.pdf”, options);
} catch (Exception e) {
throw new RuntimeException(“problem while PDF.”, e);
}

tahir.manzoor · December 9, 2014, 12:53am

Hi Praneeth,

Thanks for your
inquiry. We are in communication with our development team about your query and will get back to you as soon as possible.

PraneethS · December 10, 2014, 12:56am

Thanks for looking into this, please make sure the communication covers:

How Document instance is using that information? What if
LayoutEnumerator instance information got garbage collected just before
Document.save is being done?

How to bind both these together in order to utilize
LayoutEnumerator instance for some preprocessing without affecting the overall system performance and without any side effects (such as memory overridden, layoutenum instance garbage collected, etc.) ?

Thanks.

tahir.manzoor · December 10, 2014, 1:23am

Hi Praneeth,

Thanks for your
inquiry. LayoutEnumerator class enumerates page layout entities of a document. It is just like PageCount property internally invokes Document.UpdatePageLayout method.

Could you please share some detail about your requirements? We will update you accordingly.

PraneethS · December 10, 2014, 1:56am

I shared details several times, I could add some details if that helps you -

We convert docx to pdf and we need LayoutEnumerator instance for Layout information such as header//footer rectangles. Given that you know LayoutEnumerator is costly and we don’t want to increase the time of PDF generation because its already expensive operation.

When experimenting we saw that when we create LayoutEnumerator instance before coverting to PDF (with code already shared), the time in actual pdf generation got distributed and somehow PDF generation using doc.save is utilizing the LayoutEnumerator information, which we don’t know how, because the neither the LayoutEnumerator has anything like LayoutEnumerator.getDocument() nor the Document.save (for PDF).

So the thing not making sense to us is how Document.save (for pdf) utilize LayoutEnumerator instance. We don’t know the rules. We’re afraid if the LayoutEnumerator instance information got lost or garbage collected just before actual pdf generation then wwe would be paying time panelty for pdf generation. So, we are afraid of state being changed between the LayoutEnumerator instance creation and actual pdf generation.

The question actually boils down to -
"Can we assume the state of Document instance (docx) gets changed, once we do new LayoutEnumerator(docx), before that, no matter where we do and no need to fear about being garbage collected?"

Given all this, we’'re in the process of making code changes and need to do that before tomorrow. We’ve priority support for Oracle and for some reasons we could not utilize that. It would be helpful if we could get the proper reasoning as soon as possible (if EOD today, much appreciated).

Let me know if you need some more information.

Thanks.

tahir.manzoor · December 10, 2014, 11:54pm

Hi Praneeth,

Thanks for sharing the detail. We are in communication with our development team about your query and will get back to you as soon as possible.

tahir.manzoor · December 11, 2014, 11:04am

Hi Praneeth,

Thanks for your patience. I have received response from our development team. Please check the following information about your query.

#1. LayoutEnumerator is a helper class which works with a document instance. It does not have any state information for the document, neither it stores any document data. There is a reference to the document yet enumerator instance is neither used nor referenced by the document. It does not matter if it was ever created or had been garbage collected.

#2. Creating LayoutEnumerator should not add time to PDF generation. However when LayoutEnumerator is created it will build page layout model of the document if one does not exist yet and this is time consuming. But this will not happen again inside Save method (and thus will reduce Save time) unless save options passed into Save method have non-default values for the members which affect page layout model of the document (in which case page layout model will be rebuilt again).

#3. Generation of PDF internally does not create LayoutEnumerator, it builds page layout model. LayoutEnumerator on the other side will build page layout model too if one does not exist yet for the document.

Now getting back to the question “How document utilizes layout enumerator’s information”. Both document and layout enumerator utilize page layot model information instead. And this is referenced from the document. Basically when LayoutEnumerator is created it checks if page layout model for the document exists. If it does not then enumerator will build it. In this sense state of the document will change when enumerator is created before Save for the document has been called. If page layout model for the document exists then creating LayoutEnumerator has no effect on document.

If there is a need to use enumerator to analyze the document before it is Saved then make sure you do not change either document itself or options which affect its output (page layout) since that will invalidate the data analyzed and therefore results of that analysis.