Bold Content Extraction

We currently hold a permanent license for Aspose.pdf . we would like to know if you could help us in getting some idea or feature that Aspose has to extract bold data from the current license which we have.

It would be of great help. Documentation links or suggestions would also help. Please let me know if you have any questions.

@Hamza_Ghojaria

Can you please share a sample PDF document with us so that we can test some code sample to extract bold text from it and share our feedback with you accordingly?

could you please use this link and download the PDF and look for bold text on page number 97

@Hamza_Ghojaria

Please check below sample code snippet if it helps:

Document pdfDocument = new Document(dataDir + "pcaob-release-no.-2023-003---noclar.pdf");
//foreach (Aspose.Pdf.Page page in pdfDocument.Pages)
//{
    var textFragmentAbsorber = new Aspose.Pdf.Text.TextFragmentAbsorber();
    pdfDocument.Pages[97].Accept(textFragmentAbsorber);
    Aspose.Pdf.Text.TextFragmentCollection textFragments = textFragmentAbsorber.TextFragments;
    foreach (var item in textFragments)
    {
        if (item.TextState.FontStyle == FontStyles.Bold)
        {
            Console.WriteLine(item.Text);
        }
    }
//}

can you please help for the same in python.

@Hamza_Ghojaria

Please check and try below code sample:

import aspose.pdf as ap

# Load the PDF document
document = ap.Document("input.pdf") 

# Instantiate a TextFragmentAbsorber object
txtAbsorber = ap.text.TextFragmentAbsorber()

# Search text
document.pages[97].accept(txtAbsorber) 

# Get reference to the found text fragments
textFragmentCollection = txtAbsorber.text_fragments

# Parse all the searched text fragments and replace text
for txtFragment in textFragmentCollection:
    if text_fragment.text_state.font_style == 1:
        print(text_fragment.text)
import aspose.pdf as ap

#Load the PDF document
document = ap.Document("input.pdf") 

#Instantiate a TextFragmentAbsorber object
txtAbsorber = ap.text.TextFragmentAbsorber()

#Search text
document.pages[97].accept(txtAbsorber) 

#Get reference to the found text fragments
textFragmentCollection = txtAbsorber.text_fragments

#Parse all the searched text fragments and replace text
for txtFragment in textFragmentCollection:
    print(txtFragment.text_state.font_style )
    print(text_fragment.text)

    if txtFragment.text_state.font_style == FontStyles.BOLD:
        print(text_fragment.text)

The txtFragment.text_state.font_style returns a 0,1,2,3 value how it can be validated with FontStyles.Bold. you can check it from the print statement above. also FontStyle.Bold is not valid can you please share an updated code ?

@Hamza_Ghojaria

The integer values for the styles are as below:

  • Regular = 0
  • Bold = 1
  • Italic = 2

We have updated the code snippet as well.