Hello,
I have a tiff image that I need to transform into a pdf document and in the pdf document I need to add a text layer to let a user select data from the document.
I’m using java 8 with aspose-pdf version 22.5.
From documentation I’m using this code to load the image into the document:
FileInputStream imageStream = new FileInputStream(new File("myFile.tif"));
Document document = new Document();
Page page = document.getPages().add();
page.getResources().getImages().add(imageStream);
double lowerLeftX = 0f;
double lowerLeftY = 0f;
double upperRightX = page.getPageInfo().getWidth();
double upperRightY = page.getPageInfo().getHeight();
page.getContents().add(new GSave());
Rectangle rectangle = new Rectangle(lowerLeftX, lowerLeftY, upperRightX, upperRightY);
Matrix matrix = new Matrix(new double[]{rectangle.getURX() - rectangle.getLLX(), 0, 0,
rectangle.getURY() - rectangle.getLLY(), rectangle.getLLX(), rectangle.getLLY()});
page.getContents().add(new ConcatenateMatrix(matrix));
XImage ximage = page.getResources().getImages().get_Item(page.getResources().getImages().size());
page.getContents().add(new Do(ximage.getName()));
page.getContents().add(new GRestore());
For the text layer I have some objects that carry the information, consider that I have a class Text
which represents a whole word like ‘something’, I also have another class Symbol
which contain a single letter, like the ‘s’ in the ‘something’. Both classes also have x1, y1 coordinates where the couple (x1, y1) represents the top left corner and the width and height.
What I need to do is to put that information ideally in the exact coordinates, so my approaches were:
for (Text word : getWords()) {
float x = word.getX();
float y = pageHeight - word.getY() - word.getHeight();
float h = word.getHeight();
float w = word.getWidth();
// need to scale everything
float balancedX = (float) (upperRightX * x / pageWidth);
float balancedW = (float) (upperRightX * w / pageWidth);
float balancedY = (float) (upperRightY * y / pageHeight);
float balancedH = (float) (upperRightY * h / pageHeight);
TextFragment textFragment = new TextFragment(word.getText());
textFragment.setPosition(new Position(balancedX, balancedY));
textFragment.getTextState().setFontSize(balancedH);
textFragment.getTextState().setForegroundColor(Color.fromArgb(99, 255, 0, 0)); // for debugging purposes
page.getParagraphs().add(textFragment);
}
But the text is not aligned correctly. The reason, as far as I know, should be the font used, however I cannot know which font the document will use so I need to use one single font for every document.
Suppose I see the word ‘something’ on the pdf and I double click at the start of the word, the text that I see selected is cut in width compared to the true width of the word in the document. If I put the text invisible, a user seeing that selection would think that not all the word is selected.
I also tried using TextBoxField
but it was close to approach 1:
TextBoxField textBoxField = new TextBoxField(page, new Rectangle(balancedX, balancedY, balancedX + balancedW, balancedY + balancedH));
textBoxField.setValue(word.getText());
Border border = new Border(textBoxField);
border.setWidth(1);
textBoxField.setBorder(border);
textBoxField.setColor(Color.Empty);
document.getForm().add(textBoxField, 1);
The second approach I tried is this:
2.
for (Text word : getWords()) {
for (Symbol symbol : word.getSymbols()) {
String symbolText = symbol.getText();
float x = symbol.getX();
float y = pageHeight - symbol.getY() - symbol.getHeight();
float h = symbol.getHeight();
// scaling
float balancedX = (float) (upperRightX * x / pageWidth);
float balancedY = (float) (upperRightY * y / pageHeight);
float balancedH = (float) (upperRightY * h / pageHeight);
TextFragment textFragment = new TextFragment(symbolText);
textFragment.setPosition(new Position(balancedX, balancedY));
textFragment.getTextState().setForegroundColor(Color.fromArgb(99, 255, 0, 0)); // debug
textFragment.getTextState().setFontSize(balancedH);
textFragment.getTextState().setCharacterSpacing(1);
page.getParagraphs().add(textFragment);
}
}
In this approach I use a TextFragment to put a single letter every time on the pdf document. It’ definitely more precise, but the problem with this solution is that I cannot do a double click on a word on the pdf te select the whole word, because many times (I’d say 80%) the letters are separated from each other.
Ideally I would have a word like ‘something’ that is ‘stretched’ to a width that I say, i.e. 200px, so that when I double click the word (at the beginning or whatever) I can see that the whole word is selected and not only a part of it like in the first approach.
Is this achievable in some way?
Thank you.