Good afternoon!
There was a problem working with Aspose.Words for Java (DOCX, DOC documents).
We need to measure the distance from the text (table, image, element) to the footer (see Fig. “Distance between text (table, figure) and header (footer).png”):
The distance from the upper border of the text to the lower border of the header (there is always a table with visible borders in the header)
The distance from the lower border of the text to the upper border of the footer (there is always a table with visible borders in the footer)
Because we didn’t find how to measure this distance using the standard Aspose.Words for Java functional, we did the following:
Convert a DOCX or DOC file to PDF
Measure the distance pixel by pixel in the resulting file
But with such an implementation, problems arose:
For example, the file “Example.docx” was taken.
The file “Example.doc” (from point 1) was converted to PDF. The conversion result (Example_1.pdf) is the same as the source file.
The file “Example.doc” (from clause 1) was re-converted to PDF. The conversion result (Example_2.pdf) does not match the source file: on pages 8-11, the header and footer are missing.
Request:
Tell me, please, is there a way to measure the distance from the text (table, image, element) to the header and footer using Aspose.Words for Java tools? Maybe there is a way to access the MS WORD Ruler tool?
What could be the problem with the disappearance of the footers when converting DOC / DOCX to PDF and how to solve it?
For example, the following code will return coordinates rectangle [(left, top)] and [(width, height)] of all Shapes (images) in Word document:
Document doc = new Document("E:\\Temp\\in.docx");
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);
for (Shape shape : (Iterable<Shape>) doc.getFirstSection().getBody().getChildNodes(NodeType.SHAPE, true)) {
enumerator.setCurrent(collector.getEntity(shape));
String left = String.format("%.2f", enumerator.getRectangle().getX());
String top = String.format("%.2f", enumerator.getRectangle().getY());
String width = String.format("%.2f", enumerator.getRectangle().getWidth());
String height = String.format("%.2f", enumerator.getRectangle().getHeight());
System.out.print("[(x, y) = (" + left + ", " + top + ")]");
System.out.println(" AND [(width, height) = (" + width + ", " + height + ")]");
}
You can use the same logic to calculate coordinates of any node in Word document.
Secondly, after an initial test with the licensed latest version of Aspose.Words for Java i.e. 20.1, we were unable to reproduce this issue (as shown in 8~11 pages of “Example_2.pdf”) on our end. We used the following simple code to produce a “awjava-20.1.pdf” on our end:
Good afternoon!
Thanks, updating to the new version helped.
Question about obtaining coordinates in MS Word: is it possible to get the coordinates of the elements of the first line on each page? If so, can you give an example?
It turned out to get the lines, but a new problem arose: in the text of the lines “null” objects began to appear. And in the document other tags / objects are not visually visible. When receiving text from a paragraph (not a line), zero objects do not appear.
Example:
Source document:ExampleDoc.docx
The result of getting rows on page 4 from the original document (pay attention to the first 2 lines): RenderedDocumentResultPage4.txt
Can you tell me, please, what could be the problem? Files are attached in the archive Example.zip (81.2 KB)
@awais.hafeez,
Good afternoon!
Unfortunately, this option is not suitable for us, because it does not always correctly cover all special cases (for example, when a paragraph is divided into several pages).
We are satisfied with the option of finding the first / last line on the page and working with the resulting text. But we do not know how to solve the following problems:
The problem with the appearance of null in the lines where they are absent.
Getting the coordinates of all characters of the found line
See quote
In addition, I draw attention to the fact that we can not modify the document using Aspose. Words. The document should only be checked for compliance with the rules. The original structure should remain unchanged.
When you convert a Word document to PDF format for example by using the following simple two lines of code, Aspose.Words should preserve/retain all elements in Word document, their position/layout, their formatting etc in generated PDF on its own.
Document doc = new Document(dataDir + "input.doc");
doc.save(dataDir + "output.pdf");
You do not need to write any additional code to calculate/determine the coordinates of different document elements by yourself. If you find any misplacement or content overlapping in generated PDF, that may well be because of some bug in Aspose.Words’ API which needs to be fixed.
Generally, Aspose.Words mimics the behavior of MS Word i.e. if you convert your Word documents (DOC DOCX files etc) to PDF format by using Aspose.Words, the output will look similar to what MS Word produces. We strive hard to ensure that all conversions would have been performed with high fidelity - exactly like Microsoft Word® would have done it. But, still if you find any issues during conversions, please feel free to report in this forum and will be fix the issue(s) in Aspose.Words’ API. Hope, this helps.
Converting to PDF does not interest us. It is important for us to obtain information from a file in the Word format for solving the tasks described above. Converting to PDF will not help us in this case:
Too costly code runtime
Does not provide the functionality that we need
Again. In solving our problems, we encountered the problems described in quote:
Can you help us with their solution? Directly in the form in which the problem is described. Please help us solve these problems.
There was no understanding what the problem is and how to solve it? We are waiting for an answer very much, because of this the development of our project in terms of the implementation of this task was stopped.
Thanks for being patient. It is to update you that we had logged the following tickets in our issue tracking system and linked them with your thread so that you will be notified as soon as the work on these tickets will be completed.
WORDSNET-19858: Code to measure the distance between body text and the Header/Footer WORDSNET-19860: Code to get page coordinates of every character on a Line WORDSJAVA-2297: To fix RenderedDocument example producing unwanted content and ‘NULLs’
@awais.hafeez
Good afternoon!
Have information on the above issues? It seems that the tasks are closed: WORDSNET-19858 ---- Status : Closed WORDSNET-19860 ---- Status : Closed
Can you tell me about the results of these tasks?
It might be possible to find the distance between last inline content in the text column to the first content line in the footer. You may use Aspose.Words’ Layout APIs i.e. LayoutEnumerator + LayoutCollector to achieve this on your end.
Secondly, the Layout model of Aspose.Words does not record character positions, and the layout does not have characters as such but glyphs. So, the requested functionality to ‘get page coordinates of every character on a Line’ is not available.
So, regarding WORDSNET-19858 and WORDSNET-19860, we have completed the work on these issues and come to a conclusion to close them with “Won’t fix” statuses. I am afraid, we will not be able to implement these functionalities in Aspose.Words’ API. We apologize for your inconvenience.