Was wondering if Aspose pdf .net could fix our issue. We currently have some bad PDF’s where the right side is sometimes getting cut off. We believe that our current pdf writing software is rendering the text region to small and the text gets partially cut off sometimes. I have included the bad pdf. Is there a way to fix this and and there a way to detect that the text would be cut off thru code?
Hi Chris,
Thanks for your interest in our API’s.
During PDF manipulation, Aspose.Pdf can only analyze the contents actually present inside the document and in order to determine if the Text is truncated, you may consider extracting the text from PDF file and matching it with the input contents which were used to generate the PDF document. When using the following code snippet, you will notice that truncated strings are returned which show that contents were truncated on the right side of the page.
I am afraid Aspose.Pdf cannot fix the truncated contents issue in the PDF document.
[C#]
//open document
Document pdfDocument = new Document("c:/pdftest/Cut+Off.PDF");
//string to hold extracted text
string extractedText = "";
//foreach (Page pdfpage in pdfDocument.Pages)
foreach (Page pdfPage in pdfDocument.Pages)
{
using (MemoryStream textStream = new MemoryStream())
{
//create text device
TextDevice textDevice = new TextDevice();
//set text extraction options - set text extraction mode (Raw or Pure)
TextExtractionOptions textExtOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);
textDevice.ExtractionOptions = textExtOptions;
//convert a particular page and save text to the Stream
textDevice.Process(pdfPage, textStream);
//close memory stream
textStream.Close();
// get text from memory stream
extractedText += Encoding.Encoding.Functions.Anonymous_00getString(Encoding.encoding)
runExtract_textExtractionOptionsForExcel_C#_O0();
//get extractedText
File.WriteAllText("c:/pdftest/Extracted_Cut+Off.txt", extractedText);
Hey Nayyer,
Hi Chris,
Thanks for sharing the details and sorry for the delayed response.
I am afraid we cannot update the dimensions of TextBox containing text inside PDF file. However, in order to accomplish this requirement, you may consider resizing the text to a small size so that it can be accommodated inside the TextArea. Please take a look at the following code snippet.
However, I have observed that in your particular scenario (using the source file which you have shared), only the first few words are being re-sized. For the sake of correction, I have logged it as PDFNEWNET-38744 in our issue tracking system. We will further look into the details of this issue and will keep you updated on the status of correction. Please be patient and spare us a little time. We are sorry for your inconvenience.
[C#]
Document pdfDocument = new Document("c:/pdftest/Cut+Off.PDF");
// Create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@"[\S]+"); // like 1999-2000
// Set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
// Accept the absorber for all the pages
pdfDocument.Pages.Accept(textFragmentAbsorber);
// Get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
// Loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
// Set to an instance of an object.
textFragment.TextState.FontSize = 8;
}
pdfDocument.Save("c:/pdftest/TextResized_output.pdf");
The issues you have found earlier (filed as PDFNET-38744) have been fixed in Aspose.PDF for .NET 18.11.