I am wanting to convert text reports to PDF. I have most of the conversion stuff done such as changing to landscape and setting the font to a fixed font (Courier New).
However, during the conversion tabs and form feed (new page) are being ignored.
Without having to read each line and then decide if I need to create a new PDF page, is it possible for either the TextFragment or the Pdf.Page to be told to create a new page on Form Feed (ASCII 12).
I suspect that the missing tabs might be a more difficult issue.
Code from your documentation:
TextFragment text = new TextFragment(tr.ReadToEnd());
page.Paragraphs.Add(text);
The Rectangle member of TextFragment instance, helps to retrieve the rectangle position of text on the PDF page. You can also retrieve the rectangle coordinates of page with the Rect member of the Page instance for comparison purposes, and then decide to add a new page. In order to add an empty page, you can call Add method of the PageCollection class as follows: C#
Document pdfDocument = new Document(dataDir + "input.pdf");
pdfDocument.Pages.Add();
If this does not help, then kindly send the complete code along with the form feed data and an expected output PDF. We will investigate your scenario in our environment, and share our findings with you.
We are afraid that it is not supported. An enhancement has been logged under the ticket ID PDFNET-44710 in our issue tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates. As a workaround, you can insert text into the PDF document, and find the rectangle position of each ASCII Char(12). The rectangle position of each ASCII character can help you to add page breaks as follows: C#
Document doc = new Document("source.pdf");
Document dest = new Document();
PdfFileEditor fileEditor = new PdfFileEditor();
fileEditor.AddPageBreak(doc, dest, new PdfFileEditor.PageBreak[] { new PdfFileEditor.PageBreak(1, 450) });
dest.Save("dest.pdf");
I think that I can find the form feeds with
Aspose.Pdf.Text.TextFragmentAbsorber tfa = new Aspose.Pdf.Text.TextFragmentAbsorber(’\12’);
assuming that the initial text ingestion did not do anything with the charater.
string dataDir = @"C:\Pdf\test830\";
Document doc = new Document(dataDir + "input.pdf");
// Create TextAbsorber object to find all instances of the input search phrase
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("Split");
// Accept the absorber for all the pages
doc.Pages.Accept(textFragmentAbsorber);
// Get the first extracted text fragment by index
TextFragment textFragment = textFragmentAbsorber.TextFragments[1];
Document dest = new Document();
PdfFileEditor fileEditor = new PdfFileEditor();
fileEditor.AddPageBreak(doc, dest, new PdfFileEditor.PageBreak[] { new PdfFileEditor.PageBreak(textFragment.Page.Number, textFragment.Rectangle.URY) });
dest.Save(dataDir + "output.pdf");
This is the ZIP of input and output PDF documents: files.zip (172.4 KB)
That code does work to a limited extent, however I still have 2 issues.
The first is, that to process all of the page breaks in a report, think 100+, the PdfFileEditor is going to be VERY slow and messy reprocessing the document for every page from source to destination. Is there a way to do all at once?
The second is related to the form feed character. I don’t seem to be able to find it using the absorber. Attached are 2 sample files, the original text and the generated pdf from it, using the code below
Aspose.Pdf.Document pdfDocument = new Aspose.Pdf.Document();
Aspose.Pdf.Page pdfPage = pdfDocument.Pages.Add;
pdfPage.PageInfo.IsLandscape = true;
StreamReader sr = new StreamReader(reportFilename);
Aspose.Text.TextFragment pdfText = new Aspose.Text.TextFragment(sr.ReadToEnd);
pdfPage.Paragraphs.Add(pdfText);
If I use
Aspose.Pdf.Text.TextFragmentAbsorber tfa = new Aspose.Pdf.Text.TextFragmentAbsorber(“Date”);
pdfDocument.Pages.Accept(tfa);
I get 4 fragments (as expected), however if I change the “Date” to ‘\12’ I get zero fragments. So did the FF get loaded from the txt to pdf file and if so, how do I locate it with the TFA?
We have recorded both issues under the same ticket ID PDFNET-44710. In order to process 100+ page splits, you can maintain hash table, and if the process is slow, then we recommend to share all details of the scenario, so that we could replicate the same performance issue in our environment. This will help us to find the root cause and fix the problem. After the fix of ticket ID PDFNET-44710, you will be able to split page for each ASCII page break character at once.