Text extraction with all its formatting

markandeyp91 · August 28, 2014, 2:30am

I have a technical query regarding text extraction

I want to get the text with all its formatting

can you please help me with some sample code

text format along with its formatting like font details color details and all

what if i want to extract every text content, without any regex?

codewarior · August 29, 2014, 4:34am

Hi Markandey,

Thanks for contacting support.

Aspose.Pdf for .NET supports the feature to search the contents of entire PDF file but for that reason you need to use Regular Expression. Please note that when traversing through whole PDF file, you can get the formatting information of each TextFragment but when extracting the text, it is extracted as plain text because TextDevice or related classes only extract the contents irrespective of their formatting. Furthermore, this extraction also depends upon the format of resultant output file i.e. when extracting the contents to plain text file, the formatting is lost. However you can transform the PDF file to either DOC or HTML format and when using this approach, the formatting of TextFragments is preserved.

For further details, please visit