Hello, would it be possible to add ‘Frames’ class to Words.
In VBscript I can access this class and get the Paragraphs and Runs within each frame.
Thank you
Hello, would it be possible to add ‘Frames’ class to Words.
In VBscript I can access this class and get the Paragraphs and Runs within each frame.
Thank you
@CQcesar Text frames in MS Word documents are paragraphs with special properties set. You can identify such paragraphs using FrameFormat.IsFrame
property.
Thank you for the response, Alexey. I wonder if there is a better to check all paragraphs inside a Frame.
Right now, I am doing this:
FrameFormat sourceFrame = pars[i - 1].FrameFormat;
FrameFormat targetFrame = pars[i].FrameFormat;
if (sourceFrame.HorizontalPosition != targetFrame.HorizontalPosition ||
sourceFrame.VerticalPosition != targetFrame.VerticalPosition ||
sourceFrame.Height != targetFrame.Height ||
sourceFrame.Width != targetFrame.Width) continue;
I’d appreciate any suggestion.
Thank you,
Cesar
@CQcesar Is you goal to detect whether paragraphs belongs to the same text frame? If possible, please attach your input document here for our reference? We will check it and provide you more information.
Hey Alexey, how can I upload the document as confidential?
@CQcesar You can safely attach documents in the forum. Only you, as a topic starter, and Aspose staff can access the attachments.
Here you go. My main is goal is to remove unneeded linebreaks, and I think the best way is to access each frame and loop through each paragraph with the same formatting. This is an OCR’d document, and requires some formatting fixing, if you any method that could help with it would be helpful too. Thanks!
REFERRALMAILARCHIVE_197118_274681_1737038952486-Aspose.docx (6.6 MB)
@CQcesar Thank you for additional information. Yes, unfortunately, the only way to determine whether paragraphs belong to the same frame is checking equality of all frame properties. For example the following is XML representation of two paragraphs from the same frame:
<w:p w14:paraId="2569E061" w14:textId="77777777" w:rsidR="004F0EE7" w:rsidRDefault="008A6ECB">
<w:pPr>
<w:framePr w:w="2968" w:wrap="auto" w:hAnchor="text" w:x="1702" w:y="1082"/>
<w:widowControl w:val="0"/>
<w:autoSpaceDE w:val="0"/>
<w:autoSpaceDN w:val="0"/>
<w:spacing w:before="0" w:after="0" w:line="224" w:lineRule="exact"/>
<w:jc w:val="left"/>
<w:rPr>
<w:rFonts w:ascii="ALWUFV+ArialMT"/>
<w:color w:val="000000"/>
<w:sz w:val="20"/>
</w:rPr>
</w:pPr>
<w:r>
<w:rPr>
<w:rFonts w:ascii="ALWUFV+ArialMT"/>
<w:color w:val="000000"/>
<w:sz w:val="20"/>
</w:rPr>
<w:t>Utilization Management MAPD</w:t>
</w:r>
</w:p>
<w:p w14:paraId="69D1C1E0" w14:textId="77777777" w:rsidR="004F0EE7" w:rsidRDefault="008A6ECB">
<w:pPr>
<w:framePr w:w="2968" w:wrap="auto" w:hAnchor="text" w:x="1702" w:y="1082"/>
<w:widowControl w:val="0"/>
<w:autoSpaceDE w:val="0"/>
<w:autoSpaceDN w:val="0"/>
<w:spacing w:before="5" w:after="0" w:line="224" w:lineRule="exact"/>
<w:jc w:val="left"/>
<w:rPr>
<w:rFonts w:ascii="QJBLNT+ArialMT"/>
<w:color w:val="000000"/>
<w:sz w:val="20"/>
</w:rPr>
</w:pPr>
<w:r>
<w:rPr>
<w:rFonts w:ascii="QJBLNT+ArialMT"/>
<w:color w:val="000000"/>
<w:sz w:val="20"/>
</w:rPr>
<w:t>P.O. Box 660694</w:t>
</w:r>
</w:p>
As you can see both paragraphs has same frame proprties:
<w:framePr w:w="2968" w:wrap="auto" w:hAnchor="text" w:x="1702" w:y="1082"/>
but frame does not have ID or something similar.
Thanks, Alexey, for your prompt response. I was checking the internal document.xml and I see the structure a bit different than what you showed above. Aspose.Words has any function that fixes OCR’d documents?
I see in your XML w14:textId, w:rsidR, w:rsiRDefault have the same ID numbers in both paragraphs, maybe that could the ID we need for getting the frame?
No, unfortunately, Aspose.Words does not have such functionality.
w14:textId
Specifies version identifier for the paragraph. Unfortunately, there is no public API in Aspose.Words to access this attribute.
w:rsidR
Specifies a unique identifier used to track the editing session when the paragraph was added to the main document. There is no access to this attribute through Aspose.Words API either.
w:rsiRDefault
Specifies a unique identifier used for all runs in this paragraph which do not explicitly declare an rsidR
attribute.
As you can see none of these attributes are frame identifiers.