Extract bolded text from PDF

yjsdfsdf · July 30, 2023, 7:09am

Hi, I am using aspose PDF to extract text from PDF, may I ask how do I extract the bolded text, thanks!

asad.ali · July 30, 2023, 5:46pm

We need to investigate this requirement. Can you please share your sample PDF for our reference so that we can test the scenario in our environment and address it accordingly.

asad.ali · July 30, 2023, 5:46pm

@yjsdfsdf

We need to investigate this requirement. Can you please share your sample PDF for our reference so that we can test the scenario in our environment and address it accordingly.

yjsdfsdf · July 31, 2023, 12:09am

Of course, you can see that some of the fonts inside the PDF are bolded, and I’d like to find the text and location of the bolded fonts, thanks!Desktop.zip (118.8 KB)

asad.ali · July 31, 2023, 6:53pm

@yjsdfsdf

We are checking it and will get back to you shortly.

asad.ali · July 31, 2023, 7:21pm

@yjsdfsdf

Please try using the below code snippet to achieve your requirements and let us know in case you face any issues:

// Load PDF document
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(dataDir + "ExtractBoldText.pdf");

// Create TextAbsorber object to extract text
com.aspose.pdf.TextFragmentAbsorber textAbsorber = new com.aspose.pdf.TextFragmentAbsorber();

// Accept the absorber for all the pages
pdfDocument.getPages().accept(textAbsorber);

for (TextFragment textFragment:textAbsorber.getTextFragments())
{
    if (textFragment.getTextState().getFontStyle() == FontStyles.Bold)
    {
        System.out.println(textFragment.getText());
        System.out.println(textFragment.getPosition().getXIndent());
        System.out.println(textFragment.getPosition().getYIndent());
    }
}