Aspose.PDF - Getting Character Coordinates for Vertical Text - Problems

Version Aspose.PDF.dll - 10.7.0.0 (.NET)


Goals

  1. Calculate per-character text position (x, y, width, height)
  2. Extract text flow direction (is text flowing left-to-right, top-to-bottom, right-to-left or bottom-to-top, or some other angle)?

Repro

  1. Download PDF: VerticalText.pdf (146.8 KB)
  2. Open in aspose and run code like this:

// Make text fragment absorber to grab page text.
var textFragmentAbsorber = new Aspose.Pdf.Text.TextFragmentAbsorber(); textFragmentAbsorber.ExtractionOptions.FormattingMode = Aspose.Pdf.Text.TextOptions.TextExtractionOptions.TextFormattingMode.Pure;

// Grab text
doc.Pages[1].Accept(textFragmentAbsorber);

// Iterate over text char by char.
foreach (Aspose.Pdf.Text.TextFragment (fragText in textFragmentAbsorber) {
foreach (Aspose.Pdf.Text.TextSegment (segText in fragText) {
for (int iChar = 1; iChar <= segText.Text.Length; iChar++) {
Aspose.Pdf.Text.CharInfo charInfo = segText.Characters[iChar];
char ch = segText.Text[iChar - 1];
}
}
}

If you look at the charInfo content you’ll see something like:

  • charInfo.Position.XIndent 709.3 double
  • charInfo.Position.YIndent 539.97999996688 double
  • charInfo.Rectangle.Width 7.5116190600965638E-08 double
  • charInfo.Rectangle.Height 1.4572799500456313E-07 double

Results

  • The width/height are nearly zero. This isn’t correct.
  • I can’t find a property on TextFragments or TextSegments that tell me what the actual text flow direction is (e.g. 0 degrees, 90 degrees, etc.).

Expected

  • The width/height params are reasonable
  • The x/y indent are correct for all chars after the first.
  • I don’t see any property that tells me the text direction

Things I’ve tried

  • Raw and Pure TextFormattingMode
  • Used AbcPdf module. It correctly returns coordinates and direction of text flow (e.g. 0, 90, 180, 270 degrees).

Closing
Any help with figuring out how to extract the text flow and coordinates of vertical text would be greatly appreciated. If this is something that’s not yet supported, then if you have any idea when it might be implemented, please include that.

Thanks!

@BustaPose

Thanks for contacting support.

I have tested the scenario with your PDF while using Aspose.Pdf for .NET 17.7 and observed that the values which API returned, were different than those of you shared. Please check the following values which I have observed in our environment.

Position
XIndent = 712.05999999046321
YIndent = 539.97999996688

Rectangle
Height = 12.143999958038307
Width = 7.5116190600965638E-08

Furthermore, with the latest version of the API, you may also determine or set the rotation of the TextFragment by specifying TextFragment.TextState.Rotation property. I have also tried to determine the rotation of the text inside your PDF, but API was not returning correct value. Hence I have logged an issue as PDFNET-43095 in our issue tracking system.

We will further investigate the issue and keep you posted with the status of its correction. Please be patient and spare us little time. We are sorry for the inconvenience.

Thank you for the quick reply, Asad.

Your height looks much better, (glad to see that) but the width still looks wrong (it’s almost 0). Would you agree that this seems like another issue that needs to be bugged?

And thank you for taking a look at this! I’m looking forward to the updates.

@BustaPose

Thanks for writing back.

I have generated an investigation ticket as PDFNET-43096 in our issue tracking system against the issue that width of characters is being returned as almost zero. Product team will further investigate it and we will keep you updated on the status of its resolution. Please be patient and spare us little time.

We are sorry for the inconvenience.

The issues you have found earlier (filed as PDFNET-43095,PDFNET-43096) have been fixed in Aspose.PDF for .NET 19.11.