Hi Team,
Hi I am trying to convert pdf to txt. But when i convert i loose out the formatting. Is there a way i can keep the formatting intact like bold and font size.
I am using the below code.
Document doc = new Document(@"C:\Users\abcd\Downloads\Ppart1\Batch 9.pdf");
Aspose.Pdf.Text.TextAbsorber textAbsorber = new Aspose.Pdf.Text.TextAbsorber();
doc.Flatten();
doc.Pages.Accept(textAbsorber);
string[] returnValue = textAbsorber.Text.Split(new string[] { System.Environment.NewLine }, StringSplitOptions.None);
File.WriteAllText(@"C:\Users\abcd\Downloads\TextFilesForPart1\Batch 9.txt", textAbsorber.Text);
@dewan.ishi
Could you please share your sample source PDF and expected .txt file with us. Also, please share that in which application/utility you want to view output .txt file with all formatting. We will test the scenario in our environment and address it accordingly.
@dewan.ishi
In order to retain formatting, you need to convert the PDF document into a file format which supports it e.g. DOC/DOCX, Excel, HTML, etc. Please check following articles in API documentation to convert PDF into other file formats supported by Aspose.PDF.
You can convert scanned PDF pages to images and perform an OCR operation on obtained images using Aspose.OCR.
Once a string is returned from TextAbsorber, you can use any method as per your convenience to save it in .txt file.
In case of further query, please feel free to ask.