Fixed Pdf Cut Off

ctran-1 · May 20, 2015, 11:52am

Was wondering if Aspose pdf .net could fix our issue. We currently have some bad PDF’s where the right side is sometimes getting cut off. We believe that our current pdf writing software is rendering the text region to small and the text gets partially cut off sometimes. I have included the bad pdf. Is there a way to fix this and and there a way to detect that the text would be cut off thru code?

codewarior · May 21, 2015, 5:31am

Hi Chris,

Thanks for your interest in our API’s.

During PDF manipulation, Aspose.Pdf can only analyze the contents actually present inside the document and in order to determine if the Text is truncated, you may consider extracting the text from PDF file and matching it with the input contents which were used to generate the PDF document. When using the following code snippet, you will notice that truncated strings are returned which show that contents were truncated on the right side of the page.

I am afraid Aspose.Pdf cannot fix the truncated contents issue in the PDF document.

[C#]

//open document
Document pdfDocument = new Document("c:/pdftest/Cut+Off.PDF");

//string to hold extracted text
string extractedText = "";

//foreach (Page pdfpage in pdfDocument.Pages)
foreach (Page pdfPage in pdfDocument.Pages)
{
    using (MemoryStream textStream = new MemoryStream())
    {
        //create text device
        TextDevice textDevice = new TextDevice();

        //set text extraction options - set text extraction mode (Raw or Pure)
        TextExtractionOptions textExtOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);
        textDevice.ExtractionOptions = textExtOptions;

        //convert a particular page and save text to the Stream
        textDevice.Process(pdfPage, textStream);

        //close memory stream
        textStream.Close();

        // get text from memory stream
        extractedText += Encoding.Encoding.Functions.Anonymous_00getString(Encoding.encoding)

        runExtract_textExtractionOptionsForExcel_C#_O0();

        //get extractedText

        File.WriteAllText("c:/pdftest/Extracted_Cut+Off.txt", extractedText);

ctran-1 · May 21, 2015, 10:11am

Hey Nayyer,

The content is there all the letters are there as if you highlight and do a cut and paste into a plain text editor it will display all the characters. We also opened the PDF using adobe and if you edit the pdf it will show that box it created to draw the content in was to small and that is what is causing the letters on the right hand side to be partial cut off. And if you make the box a little bit bigger then it will show all the text. Is there anything in Aspose that be able to recognize that the box was drawn to small for the text and needs to be made bigger? And is there anyway that aspose can make that box bigger?

Thanks for all the help.

codewarior · May 25, 2015, 12:46am

Hi Chris,

Thanks for sharing the details and sorry for the delayed response.

I am afraid we cannot update the dimensions of TextBox containing text inside PDF file. However, in order to accomplish this requirement, you may consider resizing the text to a small size so that it can be accommodated inside the TextArea. Please take a look at the following code snippet.

However, I have observed that in your particular scenario (using the source file which you have shared), only the first few words are being re-sized. For the sake of correction, I have logged it as PDFNEWNET-38744 in our issue tracking system. We will further look into the details of this issue and will keep you updated on the status of correction. Please be patient and spare us a little time. We are sorry for your inconvenience.

[C#]

Document pdfDocument = new Document("c:/pdftest/Cut+Off.PDF");

// Create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(@"[\S]+");  // like 1999-2000

// Set text search option to specify regular expression usage
TextSearchOptions textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;

// Accept the absorber for all the pages
pdfDocument.Pages.Accept(textFragmentAbsorber);

// Get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

// Loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
    // Set to an instance of an object.
    textFragment.TextState.FontSize = 8;
}

pdfDocument.Save("c:/pdftest/TextResized_output.pdf");

aspose.notifier · November 15, 2018, 4:57pm

The issues you have found earlier (filed as PDFNET-38744) have been fixed in Aspose.PDF for .NET 18.11.