How to write to PDF document based on single word position?

What is the specific difference between writing text using TextFragment and writing text based on TextSegment? Why do single characters overlap? why.png (247.0 KB)

The following are all the documents and codes used
scan file:test.jpg (387.5 KB)
ocr json data file:ocr.d.json.zip (12.2 KB)

use TextFragment to generate pdf

val doc = Document()
val pageData = [Convert ocr.d.json position to aspose.pdf poistion]
val page = doc.pages.add()
page.setPageSize(pageData.width, pageData.height) // ocr.d.json -> {width,height}

val textBuilder = TextBuiler(page)
for(textData in pageData.textFragments) {
  val text = TextFragment().also {
    it.text = textData.text // ocr.d.json -> prism_wordsInfo -> {word}
    // ocr.d.json -> prism_wordsInfo.pos
    // (left,top),(right,top),(right,bottom),(left,bottom)
    // Max(leftBottom.y - leftTop.y, rightBottom.y - rightTop.y)
    it.textState.fontSize = textData.fontSize
  }
  
  textBuiler.appendText(text)
}
doc.save("new-ocr-text-fragment.pdf")

This is the resulting file:new-ocr-text-fragment.pdf (147.9 KB)

use Textsegment

val textBuilder = TextBuilder(page)
for (textData in pageData.textFragments) {
  val tf = TextFragment()

  // ocr.d.json - prism_wordsInfo.charInfo
  for (segmentData in textData.segments) {
    tf.segments.add(TextSegment().also {
      it.text = segmentData.text
      // ocr.d.json - prism_wordsInfo.charInfo {x, pageHeight - y}
      it.position = Position(segmentData.position.x, segmentData.position.y)
      it.textState.font = FontRepository.findFont("Arial")
      // ocr.d.json - prism_wordsInfo.charInfo {h}
      it.textState.fontSize = segmentData.fontSize
    })
  }
  textBuilder.appendText(tf)
}

This is the resulting file:new-ocr-text-segment.pdf (147.8 KB)

@shiyajun

TextFragment is text between BT/ET operators and TextSegment is text drawn by single Tj operator inside BT/ET block. BT is operator which starts block of text output and ET marks end of text fragment. The text is shown between these operators by text operator (like Tj or TJ). Main thing is that TextFragment contains several text segments.

Furthermore, recommended approach to add text with position attributes is using TextFragments. However, we will also investigate the scenario in further details. Would you kindly share the complete code snippet where you are reading the .json file and pageData/textData objects are formed. This would help us testing the scenario accordingly.

maven dependency

<dependency>
    <groupId>com.alibaba</groupId>
    <artifactId>fastjson</artifactId>
    <version>1.2.62</version>
</dependency>

Call Ocr.conver("ocr.d.json") to get data.

class Ocr {
    companion object {
        fun convert(fileName: String): PageBo {
            if (Files.notExists(Paths.get(fileName))) {
                throw FileNotFoundException("json file not found")
            }

            val json = JSON.parseObject(File(fileName).readText())
            val result = PageBo().also {
                it.height = json.getDouble("height")
                it.width = json.getDouble("width")
                it.content = json.getString("content")
            }

            json.getJSONArray("prism_wordsInfo").forEach { text ->
                val word = text as JSONObject
                val textFragment = createTextFragment(word, result)
                result.textFragments.add(textFragment)
            }

            return result
        }

        private fun createTextFragment(word: JSONObject, page: PageBo): TextFragmentBo {
            val positionArray = word.getJSONArray("pos")
            val leftTop = positionArray[0] as JSONObject
            val rightTop = positionArray[1] as JSONObject
            val rightBottom = positionArray[2] as JSONObject
            val leftBottom = positionArray[3] as JSONObject

            // llx,lly
            val position = PositionBo().also { p ->
                p.x = leftBottom.getDouble("x")
                p.y = page.height - leftBottom.getDouble("y")
            }

            val fontSize = (leftBottom.getDouble("y") - leftTop.getDouble("y"))
                    .coerceAtLeast(rightBottom.getDouble("y") - rightTop.getDouble("y"))
                    .toFloat()

            val segments = mutableListOf<SegmentBo>()
            word.getJSONArray("charInfo").forEach {
                val segment = it as JSONObject
                segments.add(createSegment(segment, page))
            }

            return TextFragmentBo().also {
                it.text = word.getString("word")
                it.position = position
                it.rotate = word.getIntValue("direction")
                it.segments = segments
                it.fontSize = fontSize
            }
        }

        private fun createSegment(segment: JSONObject, page: PageBo): SegmentBo {
            val position = PositionBo().also { p ->
                // llx,lly
                p.x = segment.getDouble("x")
                p.y = page.height - segment.getDouble("y")
            }

            return SegmentBo().also {
                it.position = position
                it.text = segment.getString("word")
                it.fontSize = segment.getFloat("h")
            }
        }

    }
}
class PageBo {
    var height: Double = 0.0
    var width: Double = 0.0
    var content: String = Strings.EMPTY
    var textFragments: MutableList<TextFragmentBo> = mutableListOf()
}

class TextFragmentBo {
    var text: String = Strings.EMPTY
    var fontSize: Float = 14F
    var rotate: Int = 0
    var position: PositionBo = PositionBo()
    var segments: MutableList<SegmentBo> = mutableListOf()

}

class SegmentBo {
    var text: String = Strings.EMPTY
    var fontSize: Float = 0F
    var position: PositionBo = PositionBo()
}

class PositionBo {
    var x: Double = 0.0
    var y: Double = 0.0
}

@shiyajun

We are looking into the scenario and will get back to you in a while. Please give us some time.

1 Like