The difference between them is the origin of the coordinates. Word has the upper left corner as the origin, while PDF has the lower left corner as the origin.
@supeiwei Yes, the coordinates returned by LayoutEnumerator
have upper left corner as the origin. But knowing page size you can easily convert them to lower left corner as the origin.
Will the page size change after converting word to pdf?
@supeiwei No, page size will not be changed. After rendering to PDF page size remains the same as in the original MS Word document.
But when I use the rectangular coordinates obtained by parsing word to draw on PDF, the display is wrong.
@supeiwei It is hard to tall what might cause the problem on your side. If you generate PDF using Aspose.Words and calculate coordinates in the same environment, there should not be any problems.
I have attached my code, could you please help me see where is the problem
for (Page page : doc.getPages()) {
double pageHeight = page.getMediaBox().getHeight();
int page_num = page.getNumber();
for (double[] bbox : bboxs) {
int num = (int) bbox[4];
if (num == page_num) {
double pdfX = bbox[0];
double pdfY = pageHeight - (bbox[1] + bbox[3]); // Convert Y (flip it)
// Convert Word's rectangle to PDF rectangle
double pdfWidth = bbox[2];
double pdfHeight = bbox[3];
// Create a PDF rectangle using the converted coordinates
com.aspose.pdf.Rectangle pdfRectangle = new com.aspose.pdf.Rectangle(pdfX, pdfY, pdfWidth, pdfHeight);
DrawRectangleOnPage(pdfRectangle, page);
}
}
}
public static void DrawRectangleOnPage(Rectangle rectangle, Page page) {
page.getContents().add(new GSave());
page.getContents().add(new ConcatenateMatrix(1, 0, 0, 1, 0, 0));
page.getContents().add(new SetRGBColorStroke(1, 0, 0));
page.getContents().add(new SetLineWidth(1));
page.getContents().add(new Re(rectangle.getLLX(), rectangle.getLLY(), rectangle.getWidth(), rectangle.getHeight()));
page.getContents().add(new ClosePathStroke());
page.getContents().add(new GRestore());
}
The parameters of bbox are x, y, width, height and pagenumber of Rectangle2D respectively.
@supeiwei In Aspose.PDF Rectangle the third and forth parameters are not with and height, but coordinates. Please see the following code that properly draws rectangles in the output PDF:
Document doc = new Document("C:\\Temp\\in.docx");
int tmpBkIndex = 0;
for (Paragraph p : (Iterable<Paragraph>)doc.getChildNodes(NodeType.PARAGRAPH, true))
{
// LayoutCollector and LayoutEnumerator work with nodes only in the main document body.
if (p.getAncestor(NodeType.HEADER_FOOTER) != null)
continue;
if (p.getAncestor(NodeType.SHAPE) != null)
continue;
String tmpBkName = "_tmp_" + tmpBkIndex;
tmpBkIndex++;
// Wrap the paragraph into a temporary bookmark.
p.prependChild(new BookmarkStart(doc, tmpBkName));
p.appendChild(new BookmarkEnd(doc, tmpBkName));
}
// Use LayoutCollector and LayoutEnumerator to calculate paragraph bounds.
LayoutCollector collector = new LayoutCollector(doc);
LayoutEnumerator enumerator = new LayoutEnumerator(doc);
// Collect rectangles.
HashMap<Integer, List<Rectangle2D>> rects = new HashMap<Integer, List<Rectangle2D>>();
for (Bookmark bk : doc.getRange().getBookmarks())
{
if (!bk.getName().startsWith("_tmp"))
continue;
enumerator.setCurrent(collector.getEntity(bk.getBookmarkStart()));
while (enumerator.getType() != LayoutEntityType.LINE)
enumerator.moveParent();
// Get the rectangle occuped by the first line of the paragraph.
Rectangle2D rect1 = enumerator.getRectangle();
// Now do the same woth the last line.
enumerator.setCurrent(collector.getEntity(bk.getBookmarkEnd()));
while (enumerator.getType() != LayoutEntityType.LINE)
enumerator.moveParent();
Rectangle2D rect2 = enumerator.getRectangle();
// Union of the rectangles is the region occuped by the paragraph.
Rectangle2D result = rect1.createUnion(rect2);
int pageIndex = enumerator.getPageIndex();
if (!rects.containsKey(pageIndex))
rects.put(pageIndex, new ArrayList<Rectangle2D>());
rects.get(pageIndex).add(result);
}
// Save as PDF
doc.save("C:\\Temp\\out.pdf");
// open the output PDF using Aspose.PDF and draw rectangles.
com.aspose.pdf.Document pdfDoc = new com.aspose.pdf.Document("C:\\Temp\\out.pdf");
for (com.aspose.pdf.Page page : pdfDoc.getPages())
{
double pageHeight = page.getMediaBox().getHeight();
int page_num = page.getNumber();
// get rectangles per page.
List<Rectangle2D> bboxs = rects.get(page_num);
for (Rectangle2D bbox : bboxs)
{
// Create a PDF rectangle using the converted coordinates
double x = bbox.getX();
double y = pageHeight - bbox.getY() - bbox.getHeight();
double x2 = x + bbox.getWidth();
double y2 = y + bbox.getHeight();
com.aspose.pdf.Rectangle pdfRectangle = new com.aspose.pdf.Rectangle(x, y, x2, y2);
DrawRectangleOnPage(pdfRectangle, page);
}
}
pdfDoc.save("C:\\Temp\\out_pdf.pdf");
public static void DrawRectangleOnPage(com.aspose.pdf.Rectangle rectangle, com.aspose.pdf.Page page) {
page.getContents().add(new com.aspose.pdf.operators.GSave());
page.getContents().add(new com.aspose.pdf.operators.ConcatenateMatrix(1, 0, 0, 1, 0, 0));
page.getContents().add(new com.aspose.pdf.operators.SetRGBColorStroke(1, 0, 0));
page.getContents().add(new com.aspose.pdf.operators.SetLineWidth(1));
page.getContents().add(new com.aspose.pdf.operators.Re(rectangle.getLLX(), rectangle.getLLY(), rectangle.getWidth(), rectangle.getHeight()));
page.getContents().add(new com.aspose.pdf.operators.ClosePathStroke());
page.getContents().add(new com.aspose.pdf.operators.GRestore());
}
out_pdf.pdf (28.0 KB)
thank you very much
Hello, I have encountered a new problem. When the paper size of the word file is letter, the coordinates of the pdf will not correspond. Can you help me find out what the problem is?
@supeiwei I am afraid it is difficult to answer your question. Aspose.Words preserves page size upon rendering document to PDF, so there should not be any problems with page size. If possible please attach the problematic input and output documents here for testing. We will check the issue on our side and provide you more information.
@supeiwei Thank you for additional information. Unfortunately, I do not see any problems on my side. Here are PDF documents produced by the above provided code:
out.pdf (45.8 KB)
out_pdf.pdf (46.0 KB)
I modified the logic a bit because I wanted to get the correct width and not the width of the entire row
@supeiwei Most likely, there is something wrong with your code. As I can see there is vertical shift of the bounding boxes. So most likely there is a mistake in conversion of Y coordinates.
public static void convertWordFilesToPDF(String inputDir, String outputDir)
{
File folder = new File(inputDir);
// 遍历文件夹中的所有文件
for (File file : folder.listFiles()) {
// 判断文件是否为Word格式
// 判断文件是否为Word格式
if (file.isFile() && file.getName().matches("(?i).*\\.docx?$"))
{
try
{
// 加载Word文档
com.aspose.words.Document doc = new com.aspose.words.Document(file.getAbsolutePath());
doc.acceptAllRevisions();
doc.getChildNodes(NodeType.COMMENT, true).clear();
doc.getChildNodes(NodeType.FOOTNOTE, true).clear();
doc.getChildNodes(NodeType.HEADER_FOOTER, true).clear();
// 构建输出PDF文件路径
String outputFilePath = outputDir + "\\" + file.getName().replaceAll("(?i)\\.docx?$", ".pdf");
// 将Word文档转换为PDF格式
PdfSaveOptions pdfSaveOptions = new PdfSaveOptions();
pdfSaveOptions.setEmbedFullFonts(true); // 嵌入字体
pdfSaveOptions.setPreserveFormFields(true);
doc.save(outputFilePath, SaveFormat.PDF);
System.out.println("Converted: " + file.getName() + " to PDF.");
}
catch (Exception e)
{
System.err.println("Failed to convert " + file.getName() + ": " + e.getMessage());
}
}
}
The above is my code for converting word to pdf. I found that in some files, the paragraph layout of the converted pdf is inconsistent with that of word.
@supeiwei In your code you are removing comments, header/footers and accept revisions. You should do the same before calculating paragraph’s coordinates.