Extract bolded text from PDF

Hi, I am using aspose PDF to extract text from PDF, may I ask how do I extract the bolded text, thanks!

@yjsdfsdf

We need to investigate this requirement. Can you please share your sample PDF for our reference so that we can test the scenario in our environment and address it accordingly.

@yjsdfsdf

We need to investigate this requirement. Can you please share your sample PDF for our reference so that we can test the scenario in our environment and address it accordingly.

Of course, you can see that some of the fonts inside the PDF are bolded, and I’d like to find the text and location of the bolded fonts, thanks!Desktop.zip (118.8 KB)

@yjsdfsdf

We are checking it and will get back to you shortly.

@yjsdfsdf

Please try using the below code snippet to achieve your requirements and let us know in case you face any issues:

// Load PDF document
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(dataDir + "ExtractBoldText.pdf");

// Create TextAbsorber object to extract text
com.aspose.pdf.TextFragmentAbsorber textAbsorber = new com.aspose.pdf.TextFragmentAbsorber();

// Accept the absorber for all the pages
pdfDocument.getPages().accept(textAbsorber);

for (TextFragment textFragment:textAbsorber.getTextFragments())
{
    if (textFragment.getTextState().getFontStyle() == FontStyles.Bold)
    {
        System.out.println(textFragment.getText());
        System.out.println(textFragment.getPosition().getXIndent());
        System.out.println(textFragment.getPosition().getYIndent());
    }
}