We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Extract PDF Text divided by paragraphs using Aspose.PDF for .NET - keep original formatting

Hi Aspose team,
I’ve used Extract Pdf to Text function from Aspose.Pdf.Facades.dll (v.6.0.0), but the result I got is paragraph has changed, the function divided the paragraph by CRLF. Does it has any way to keep the original paragraph?

Thank you in advance.

								<br>

Hi,<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

Please share your sample code and template file here to reproduce the issue. This will help us understand and identify the issue soon.

Thank You & Best Regards,

Hi,
This is sample code
[C#.NET]
PdfExtractor pdfExtractor = new PdfExtractor();
pdfExtractor.BindPdf(“input.pdf”);
pdfExtractor.ExtractText();
MemoryStream tempMemoryStream = new MemoryStream();
pdfExtractor.GetText(tempMemoryStream);
string text = “”;
using (StreamReader streamReader = new StreamReader(tempMemoryStream, Encoding.Unicode))
{
streamReader.BaseStream.Seek(0, SeekOrigin.Begin);
text = streamReader.ReadToEnd();
}
File.WriteAllText(“output_aspose.txt”, text, Encoding.UTF8);

Thank you in advance.

								<br>

Hello Nuch,

Thanks for sharing the resource files.
<span style=“font-size:10.0pt;font-family:“Arial”,“sans-serif””>

I
have tested the scenario and I am able to reproduce the same problem. For the
sake of correction, I have logged it in our issue tracking system as PDFNEWNET-29944. We will investigate this
issue in details and will keep you updated on the status of a correction.<span style=“font-size:10.0pt;font-family:“Arial”,“sans-serif””><span style=“font-size:10.0pt;
font-family:“Arial”,“sans-serif””>

We apologize for your inconvenience.


@natnapaporn.thin

We would like to share with you that you can now extract entire paragraph from PDF documents using Aspose.PDF for .NET.