we are having problems when converting html documents to pdf using Java, as we cannot get word breaking work as expected. We need that the text would be broken at spaces and not in the middle of words. How can this be achieved?
Here is a sample of the input html:
<!DOCTYPE html>
<html lang="en"
xmlns="http://www.w3.org/1999/xhtml"
xmlns:th="http://www.thymeleaf.org"
>
<head>
<title></title>
<meta charset="utf-8"/>
<meta name="viewport"
content="width=device-width, initial-scale=1"/>
<style>
body {
size:A4;
font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
}
#class2 {
font-size: 64px;
font-weight: bold;
word-break: break-word;
width: 100%;
}
#class1{
text-align: center;
width: 100%;
}
</style>
</head>
<body>
<div id="class1">
<div id="class2">
This text should be broken by space
</div>
</div>
<div id="class1">
<div id="class2">
PI
</div>
</div>
</body>
</html>
In the first case “This text should be broken by space”, in the second case, “PI” text should not be broken at all.
public static final int MARGIN_TOP_PT = 30;
public static final int MARGIN_RIGHT_PT = 10;
public static final int MARGIN_BOTTOM_PT = 40;
public static final int MARGIN_LEFT_PT = 10;
But the actual values of margins only have effect on the exact place where the line is broken. Here is an example of the output if I remove setting of margins: cover_wo_margins.pdf (29.9 KB)
We are using the Aspose version 21.8 but it works the same with the version 22.6
We also noticed that in some places a text is broken into multiple lines if the last letter is one of the following: Iijl
Like in my example, the word PI was split into different lines, and if we use Pl/Pi/Pj instead, it is split as well. If there is another last letter, e. g. t, the text remains on the same line (Pt).
I also added another sample html with an analogous situation: if the last letter of a table cell is “l/i/j/I”, then it is moved to another line, regardless of how much free space is still available on the same line. We can see this happenning with the word “Real” in some cells. If the letter “l” is replaced with another one except l/i/j/l, the word is not broken into multiple lines.
Here is a sample html/zip letter_l_break.zip (3.3 KB)
I reduced it to contain less data but the aforementioned line breaking can be observed in table header cells after conversion to pdf.
@arjana
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.
Issue ID(s): PDFJAVA-42747
You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.