[.NET] Remove line numbers from PDF file


#1

Hello,

in the scenario in which we have a PDF generated from an editable version (Word, LaTeX, … ) that includes line numbers, is it possible to remove the line numbers from that PDF document using Aspose.PDF for .NET?

Thanks

Best regards,


#2

@stefano.giannone.frontiers

Thanks for contacting support.

In order to test the scenario and provide our feedback, we need sample PDF document. Would you please share your PDF document with us. This would help us understanding your requirements and assist you accordingly.


#3

Latex_Sample.pdf (89.4 KB)
Word_Sample.pdf (127.2 KB)

Here you have the two sample PDF with line numbers. One is generated starting from a Word file, and the other one starting from a LaTeX file.

We need to strip out the line numbers from both, but we don’t have the source files. So, if possible, we would like to do that starting from the resulting PDF file itself.

Thanks


#4

@stefano.giannone.frontiers

Thanks for sharing sample PDF documents.

We have tried to achieve your requirements using following code snippet but did not get much success as output PDF file had formatting issues. For your kind reference, following is the code that we tried along with generated output:

var startEnd = ".+";
var textFragmentAbsorber = new TextFragmentAbsorber(startEnd);
var textSearchOptions = new TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
Document pdfDocument = new Document(dataDir + "Word_Sample.pdf");
pdfDocument.Pages.Accept(textFragmentAbsorber);

var textFragmentCollection = textFragmentAbsorber.TextFragments;
var count = textFragmentCollection.Count;
foreach (TextFragment textFragment in textFragmentCollection)
{
  if (textFragment.Text.Trim() != "")
  {
   string lineNumber = textFragment.Text.TrimEnd().Substring(textFragment.Text.TrimEnd().Length - 1);
   if (int.TryParse(lineNumber, out int n))
   {
    textFragment.Text = textFragment.Text.TrimEnd().Replace(lineNumber, "");
   }
  }
}
pdfDocument.Save(dataDir + "test18.12.out.pdf"); 

test18.12.out.pdf (194.6 KB)

We have logged an investigation ticket as PDFNET-45791 in our issue tracking system for further investigation whether your requirements can be achieved or not. We will keep you posted with the updates about ticket resolution as soon as we have some. Please spare us little time.

We are sorry for the inconvenience.


#5

That’s great. Thanks for your effort.

I’ll wait for a feedback from you.

Best regards,
Stefano


#6

@stefano.giannone.frontiers

Thanks for your feedback.

We will surely let you know as soon as some additional updates are available.


#7

Hello,

any update on this?

We’re waiting for this to decide if we should proceed buying a license of Aspose.PDF or not.

Thanks for your effort.

Best regards,
Stefano


#8

@stefano.giannone.frontiers

Thanks for your inquiry.

I am afraid that earlier logged investigation ticket is not yet resolved due to previously logged pending issues in the queue. However, we have recorded your concerns and will definitely consider them during investigation. As soon as some definite updates are available we will let you know. Please spare us little time.

We are sorry for the inconvenience.


#9

Hello,

thanks for the time you’re dedicating to this.

In the meanwhile, we would also try to see if we’re able to find a solution our-self, based on the sample code provided by you.

But actually, we can’t. This is because we receive an exception saying ‘At most 4 elements (for any collection) can be viewed in evaluation mode.’ (see attached screenshot).

Capture.PNG (18.9 KB)

Could you provide us with a way to check the code without this restriction? I don’t know if you can provide us with a temporary license key to use just for this investigation. Once we come with a solution we can buy a proper license.

And also, just to mention, we don’t care too much about the fact that the resulting PDF is loosing some formatting style. We do care more about the removal of the line numbers from the text of the PDF.

Thanks again for your support.

Best regards,
Stefano


#10

@stefano.giannone.frontiers

Thanks for getting back to us.

You may please consider applying 30-days temporary license with which you can use API features without any restriction.