Stretch text

user1254 · April 15, 2024, 4:40pm

Hello,
I have a tiff image that I need to transform into a pdf document and in the pdf document I need to add a text layer to let a user select data from the document.
I’m using java 8 with aspose-pdf version 22.5.

From documentation I’m using this code to load the image into the document:

FileInputStream imageStream = new FileInputStream(new File("myFile.tif"));
Document document = new Document();
Page page = document.getPages().add();
page.getResources().getImages().add(imageStream);

double lowerLeftX = 0f;
double lowerLeftY = 0f;
double upperRightX = page.getPageInfo().getWidth();
double upperRightY = page.getPageInfo().getHeight();

page.getContents().add(new GSave());
Rectangle rectangle = new Rectangle(lowerLeftX, lowerLeftY, upperRightX, upperRightY);
Matrix matrix = new Matrix(new double[]{rectangle.getURX() - rectangle.getLLX(), 0, 0,
		rectangle.getURY() - rectangle.getLLY(), rectangle.getLLX(), rectangle.getLLY()});

page.getContents().add(new ConcatenateMatrix(matrix));
XImage ximage = page.getResources().getImages().get_Item(page.getResources().getImages().size());

page.getContents().add(new Do(ximage.getName()));

page.getContents().add(new GRestore());

For the text layer I have some objects that carry the information, consider that I have a class Text which represents a whole word like ‘something’, I also have another class Symbol which contain a single letter, like the ‘s’ in the ‘something’. Both classes also have x1, y1 coordinates where the couple (x1, y1) represents the top left corner and the width and height.

What I need to do is to put that information ideally in the exact coordinates, so my approaches were:

for (Text word : getWords()) {
	float x = word.getX();
	float y = pageHeight - word.getY() - word.getHeight();
	float h = word.getHeight();
	float w = word.getWidth();

	// need to scale everything
	float balancedX = (float) (upperRightX * x / pageWidth);
	float balancedW = (float) (upperRightX * w / pageWidth);
	float balancedY = (float) (upperRightY * y / pageHeight);
	float balancedH = (float) (upperRightY * h / pageHeight);

	TextFragment textFragment = new TextFragment(word.getText());
	textFragment.setPosition(new Position(balancedX, balancedY));
	textFragment.getTextState().setFontSize(balancedH);
	textFragment.getTextState().setForegroundColor(Color.fromArgb(99, 255, 0, 0));	// for debugging purposes
	
	page.getParagraphs().add(textFragment);
}

But the text is not aligned correctly. The reason, as far as I know, should be the font used, however I cannot know which font the document will use so I need to use one single font for every document.
Suppose I see the word ‘something’ on the pdf and I double click at the start of the word, the text that I see selected is cut in width compared to the true width of the word in the document. If I put the text invisible, a user seeing that selection would think that not all the word is selected.

I also tried using TextBoxField but it was close to approach 1:

TextBoxField textBoxField = new TextBoxField(page, new Rectangle(balancedX, balancedY, balancedX + balancedW, balancedY + balancedH));
textBoxField.setValue(word.getText());
Border border = new Border(textBoxField);
border.setWidth(1);
textBoxField.setBorder(border);
textBoxField.setColor(Color.Empty);
document.getForm().add(textBoxField, 1);

The second approach I tried is this:
2.

for (Text word : getWords()) {
	for (Symbol symbol : word.getSymbols()) {
		String symbolText = symbol.getText();
		
		float x = symbol.getX();
		float y = pageHeight - symbol.getY() - symbol.getHeight();
		float h = symbol.getHeight();

		// scaling
		float balancedX = (float) (upperRightX * x / pageWidth);
		float balancedY = (float) (upperRightY * y / pageHeight);
		float balancedH = (float) (upperRightY * h / pageHeight);

		TextFragment textFragment = new TextFragment(symbolText);
		textFragment.setPosition(new Position(balancedX, balancedY));
		textFragment.getTextState().setForegroundColor(Color.fromArgb(99, 255, 0, 0));	// debug
		textFragment.getTextState().setFontSize(balancedH);
		textFragment.getTextState().setCharacterSpacing(1);
		page.getParagraphs().add(textFragment);
	}
}

In this approach I use a TextFragment to put a single letter every time on the pdf document. It’ definitely more precise, but the problem with this solution is that I cannot do a double click on a word on the pdf te select the whole word, because many times (I’d say 80%) the letters are separated from each other.

Ideally I would have a word like ‘something’ that is ‘stretched’ to a width that I say, i.e. 200px, so that when I double click the word (at the beginning or whatever) I can see that the whole word is selected and not only a part of it like in the first approach.

Is this achievable in some way?

Thank you.

asad.ali · April 15, 2024, 11:33pm

@user1254

This is something that we need to perform investigation for. If possible, can you please share the sample TIFF and the output PDF documents for our reference so that we can log an investigation ticket and share the ID with you.

user1254 · April 16, 2024, 10:09am

Hello,
I made a sample document with random information.
I run the test 1 and 2, the input file is named test.tif, result of test 1 is test_words.pdf, result of test 2 test_symbols.pdf.
Looks like I cannot upload tif files, I’ll upload a jpg version of the tif file.
test.jpg (120,1 KB)
test_symbols.pdf (1,1 MB)

test_words.pdf (1,0 MB)

Test 1: the text in red is not placed correctly on the document (again, as far as I know because of the font) but if I open the pdf (Adobe reader in my case) and double click at the start or in the middle of the word ‘CONIUGNATA’ the whole word is selected, which is fine.
Ideally though I could ‘stretch’ the text to a certain width that I specify.

Test 2: the text in red is placed almost perfectly (for me is more than enough at least) but if I double click the word ‘CONIUGATA’ at the start, only the first 3 letters are selected, not the whole word. If I double click in the middle of the word, say at letter ‘U’, only that letter is selected. You can see this behaviour in most of the words in the document. I don’t understand why the first 3 letters of the word ‘CONIUGATA’ are kinda connected as a whole string but not the other letters.
Ideally here I if double click on any letter of a word, the whole word should be selected. This is solution is definetely what I would prefer.

If you need more information, please tell me.

asad.ali · April 16, 2024, 4:35pm

@user1254

We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): PDFJAVA-43821

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.