Version Aspose.PDF.dll - 10.7.0.0 (.NET)
Goals
- Calculate per-character text position (x, y, width, height)
- Extract text flow direction (is text flowing left-to-right, top-to-bottom, right-to-left or bottom-to-top, or some other angle)?
Repro
- Download PDF: VerticalText.pdf (146.8 KB)
- Open in aspose and run code like this:
// Make text fragment absorber to grab page text.
var textFragmentAbsorber = new Aspose.Pdf.Text.TextFragmentAbsorber(); textFragmentAbsorber.ExtractionOptions.FormattingMode = Aspose.Pdf.Text.TextOptions.TextExtractionOptions.TextFormattingMode.Pure;
// Grab text
doc.Pages[1].Accept(textFragmentAbsorber);
// Iterate over text char by char.
foreach (Aspose.Pdf.Text.TextFragment (fragText in textFragmentAbsorber) {
foreach (Aspose.Pdf.Text.TextSegment (segText in fragText) {
for (int iChar = 1; iChar <= segText.Text.Length; iChar++) {
Aspose.Pdf.Text.CharInfo charInfo = segText.Characters[iChar];
char ch = segText.Text[iChar - 1];
}
}
}
If you look at the charInfo content you’ll see something like:
- charInfo.Position.XIndent 709.3 double
- charInfo.Position.YIndent 539.97999996688 double
- charInfo.Rectangle.Width 7.5116190600965638E-08 double
- charInfo.Rectangle.Height 1.4572799500456313E-07 double
Results
- The width/height are nearly zero. This isn’t correct.
- I can’t find a property on TextFragments or TextSegments that tell me what the actual text flow direction is (e.g. 0 degrees, 90 degrees, etc.).
Expected
- The width/height params are reasonable
- The x/y indent are correct for all chars after the first.
- I don’t see any property that tells me the text direction
Things I’ve tried
- Raw and Pure TextFormattingMode
- Used AbcPdf module. It correctly returns coordinates and direction of text flow (e.g. 0, 90, 180, 270 degrees).
Closing
Any help with figuring out how to extract the text flow and coordinates of vertical text would be greatly appreciated. If this is something that’s not yet supported, then if you have any idea when it might be implemented, please include that.
Thanks!