Docx Document taking too long for conversion

@RChilli_Nidhi Thank you for additional information.
It takes about 4 seconds to convert 1648786390257-7614b692-f582-4613-8a4f-a4d0c1bb6e57.docx to PDF and about 2 seconds to render 1657520490930-Randhir Pawar_IT PM-CIMB.docx.
I have checked conversion to PDF in MS Word and it also take approximately the same time. I have investigated the document, since it looks suspicious that documents with textual content has such big size. And I have found that in 1657520490930-Randhir Pawar_IT PM-CIMB.docx document there are strange shapes that represents horizontal lines on the first page. With these shapes size of document.xml file inside DOCX is about 4MB, after removing them size is about 98KB. Also conversion to PDF took less than 1 second:

Document doc = new Document("C:\\Temp\\1657520490930-Randhir Pawar_IT PM-CIMB.docx");
doc.getChildNodes(NodeType.GROUP_SHAPE, true).clear();
doc.save("C:\\Temp\\out.pdf");

You can note that this is too much for representing a simple horizontal line. So it looks like interpreting this shape takes the time.

The same problem is in the 1648786390257-7614b692-f582-4613-8a4f-a4d0c1bb6e57.docx document.

Thanks for the reply.

The shapes are required for us.
Additionally, sorry for the confusion we are not converting Doc/Docx to PDF. We are getting the text from the document.

Below is my code:

License license = new License();
InputStream streamLicense = licenceStream();
license.setLicense(streamLicense);
LoadOptions opts = new LoadOptions();
opts.setResourceLoadingCallback(new HandleResourceLoadingCallback());
doc = new Document(filedat, opts);
streamLicense.close();
filedat.close();
//getpages
pageCount = doc.getPageCount();

And for the shared documents, it is taking time at the last line i.e. doc.getPageCount()
Could you please check it?

@RChilli_Nidhi doc.getPageCount() performs document layout rebuild, the same is performed upon saving to PDF. I understand that the lines are required, but if they are inserted properly using one simple shape it will not take so much time to process them. It is too much that a simple horizontal line takes about 1MB in XML. When it can be represented with a simple line shape that takes several lines in XML:

<w:drawing>
	<wp:anchor distT="0" distB="0" distL="114300" distR="114300" simplePos="0" relativeHeight="251659264" behindDoc="0" locked="0" layoutInCell="1" allowOverlap="1" wp14:anchorId="10707A79" wp14:editId="7FA4C897">
		<wp:simplePos x="0" y="0"/>
		<wp:positionH relativeFrom="column">
			<wp:posOffset>38100</wp:posOffset>
		</wp:positionH>
		<wp:positionV relativeFrom="paragraph">
			<wp:posOffset>336550</wp:posOffset>
		</wp:positionV>
		<wp:extent cx="5905500" cy="38100"/>
		<wp:effectExtent l="0" t="0" r="19050" b="19050"/>
		<wp:wrapNone/>
		<wp:docPr id="1" name="Straight Connector 1"/>
		<wp:cNvGraphicFramePr/>
		<a:graphic xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main">
			<a:graphicData uri="http://schemas.microsoft.com/office/word/2010/wordprocessingShape">
				<wps:wsp>
					<wps:cNvCnPr/>
					<wps:spPr>
						<a:xfrm flipV="1">
							<a:off x="0" y="0"/>
							<a:ext cx="5905500" cy="38100"/>
						</a:xfrm>
						<a:prstGeom prst="line">
							<a:avLst/>
						</a:prstGeom>
					</wps:spPr>
					<wps:style>
						<a:lnRef idx="1">
							<a:schemeClr val="dk1"/>
						</a:lnRef>
						<a:fillRef idx="0">
							<a:schemeClr val="dk1"/>
						</a:fillRef>
						<a:effectRef idx="0">
							<a:schemeClr val="dk1"/>
						</a:effectRef>
						<a:fontRef idx="minor">
							<a:schemeClr val="tx1"/>
						</a:fontRef>
					</wps:style>
					<wps:bodyPr/>
				</wps:wsp>
			</a:graphicData>
		</a:graphic>
	</wp:anchor>
</w:drawing>

Or even less if use paragraph border:

<w:p w14:paraId="0B9C4D2A" w14:textId="62DA62A8" w:rsidR="002B2CA3" w:rsidRDefault="002B2CA3" w:rsidP="00685195">
	<w:pPr>
		<w:pBdr>
			<w:bottom w:val="single" w:sz="12" w:space="1" w:color="auto"/>
		</w:pBdr>
	</w:pPr>
</w:p>

So, if you have control over document creation avoid using complicated shapes to draw a simple horizontal lines in your document. This will improve performance in both MS Word and Aspose.Words.

Hi,

I am testing a few more documents using Aspose 22X, getting the invalid page count.
Attaching the document for your reference.

Can you please check it and let me know the issue?

(Attachment c36aa3f1c592ceedc34612efca6e965e.doc is missing)

(Attachment C.V.BLeinerDec2013c.doc is missing)

(Attachment c6g8iqek6zuier69.doc is missing)

CA_Resume_Analytics.docx (29.6 KB)

Hi,

I am testing a few more documents using Aspose 22X, getting the invalid
page count.
Attaching the document for your reference.

Can you please check it and let me know the issue?

File Name Aspose Page Count Actual Page Count
C.V.BLeinerDec2013c.doc 3 2
c36aa3f1c592ceedc34612efca6e965e.doc 3 2
c6g8iqek6zuier69.doc 4 3
CA_Resume_Analytics.docx 3 2

PageCount.zip (68.5 KB)

@RChilli_Nidhi I have checked CA_Resume_Analytics.docx and the returned page count is correct - 2 pages. Other documents were not been attached, Please zip them and attach the archive.
Also, please make sure you are using Aspose.Words in licensed mode. If you use Aspose.Words in evaluation mode, Aspose.Words injects evaluation message at the beginning of the document and this might lead to incorrect page number calculation.

Sharing other resumes, please share the insights !

PageCount.zip (68.5 KB)

@RChilli_Nidhi

  • C.V.BLeinerDec2013c.doc returns 2 pages
  • c6g8iqek6zuier69.doc returns 3 pages
  • c36aa3f1c592ceedc34612efca6e965e.doc returns 2 pages
  • CA_Resume_Analytics.docx returns 2 pages

Number of pages retuned by Aspose.Words is correct and matches the number of pages in MS Word. Here is code I have used for testing:

Document doc = new Document(@"C:\Temp\in.docx");
Console.WriteLine(doc.PageCount);

Please note that Aspose.Words requires to build document layout to calculate number of pages in the document. The fonts used in the documents are required to do this. If Aspose.Words cannot find the fonts used in the document, Aspose.Words substitutes the missed fonts. This might lead to layout differences and as a result incorrect page count. You can implement IWarnungCallback to get notification when font substitution is performed.

Yes got your point, is there a way to test the Aspose Document conversion
to verify whether it resolves our concerns or not!
Without getting the satisfactory result, we can move forward.

@RChilli_Nidhi Sure, you can request a temporary 30-days license to test Aspose.Words without evaluation version limitations.

Thanks for the reply!

Let me check with team and get back to you on this

1 Like

Hi,

We are planning to buy the latest Aspose.Words for Java. But required clarification on one point.

There is a period mentioned for 1 year, what does this actually mean?
Could you please clarify this?

@RChilli_Nidhi If you purchased a license for Aspose.Words, it means you have a 1-year subscription for free upgrades to any new Aspose.Words version that comes out. Any Aspose.Words version released before the subscription expiry date can be used perpetually with your license.

Thanks,

Can we still utilise the same licence after a year? Is there a limitation on it, or may we use the same licence for however long we want?

@RChilli_Nidhi Yes, you can use the same license perpetually with Aspose.Words version released before the license expiration date.

Okay, Thank you!

1 Like

Hi Alexey,

We are looking to buy the Aspose.PDF License.
Is there a way to get the temporary license for testing the PDF Document Conversion?

@RChilli_Nidhi Sure, please request a temporary license as described here:
https://purchase.aspose.com/temporary-license

Okay, thanks!

1 Like

A post was split to a new topic: PDF document taking too long for conversion