Error on highlight pdf

wwwflytorinoitcasellechiudeil2018incalodel22quattordicesimoscaloinitaliaGRUPPOUVETBLUEPANORAMAWEB.pdf (1.2 MB)
hi to all

i try to highlight pdf with your functionality , sometimes have a problem on highlight with carriage return , as you can see in attached pdf i try to highlight blue panorama but the highlight continue at the end of the page and not go in carriage return .

best regards

@francescoes

Thank you for contacting support.

Would you please share source PDF file along with narrowed down code snippet so that we may try to reproduce and investigate it in our environment.

here you are the code snippet and raw pdf article_raw_temp.pdf (1.2 MB)
i remember you that phrase to highlight is this “blue panorama”

Public Sub WriteHighLight(ByVal PathPDFSource As String, ByVal KeyWord As String)

    Try
        Dim licence As Aspose.Pdf.License = New Aspose.Pdf.License
        licence.SetLicense(Application.StartupPath + "\lib\Aspose.Pdf.lic")

        'Preprocessing keywords
        KeyWord = KeyWord.Trim()
        KeyWord = KeyWord.Replace("(?i) ", "(?i)")
        KeyWord = KeyWord.Replace(" |", "|")
        KeyWord = KeyWord.Replace(" ", "\s+")
        KeyWord = KeyWord.Replace("(?i)", "\b(?i)")
        KeyWord = KeyWord.Replace("|", "\b|")
        KeyWord = KeyWord + "\b"

        Dim Document As New Document(PathPDFSource)

        Dim TextFragmentAbsorber As New TextFragmentAbsorber(KeyWord)

        'Set text search option to specify regular expression usage
        Dim TextSearchOptions As New TextSearchOptions(True)

        TextFragmentAbsorber.TextSearchOptions = TextSearchOptions

        Document.Pages.Accept(TextFragmentAbsorber)

        Dim TextFragmentCollection1 As TextFragmentCollection = TextFragmentAbsorber.TextFragments
        If TextFragmentCollection1.Count > 0 Then
            Dim Objxml As New ClsXml
            Objxml.CreateFileXml(System.IO.Path.GetFileNameWithoutExtension(PathPDFSource))
            For Each TextFragment As TextFragment In TextFragmentCollection1

                'SCRITTURA DEL FILE XML PER L'EVIDENZIAZIONE ARTICOLO
                Dim FreeText As New HighlightAnnotation(TextFragment.Page, New Aspose.Pdf.Rectangle(TextFragment.Position.XIndent, TextFragment.Position.YIndent, TextFragment.Position.XIndent + TextFragment.Rectangle.Width, TextFragment.Position.YIndent + TextFragment.Rectangle.Height))
                Objxml.createNode(TextFragment.Page.Number.ToString, TextFragment.Position.XIndent.ToString, TextFragment.Position.YIndent.ToString, (TextFragment.Position.XIndent + TextFragment.Rectangle.Width).ToString, (TextFragment.Position.YIndent + TextFragment.Rectangle.Height).ToString)

                FreeText.Opacity = 0.9
                FreeText.Color = Aspose.Pdf.Color.FromArgb(156, 246, 242)

                TextFragment.Page.Annotations.Add(FreeText)
            Next
            Objxml.closeFileXml()
        End If

        Document.Save(PathPDFSource)
    Catch ex As Exception
        MyLog.Error(ex.Message)

        Dim Sendmail As New ClsMail
        Sendmail.fsettabody = " Conversione Web2PDF Timer1_Tick() -  errore " + ex.Message + " <br>" + "fWriteHighLight() -  errore " + KeyWord
        Sendmail.SendEmail()
    End Try

End Sub

Mostra di più da Francesco Esposito

@francescoes

Thank you for sharing requested data.

Would you please share a sample application containing SSCCE code because shared code snippet includes custom class ClsXml and can not be executed. Kindly share requested application so that we may proceed further to help you out.

here you are the clean code . thank you !

Public Sub WriteHighLight(ByVal PathPDFSource As String, ByVal KeyWord As String)
Try
Dim licence As Aspose.Pdf.License = New Aspose.Pdf.License
licence.SetLicense(Application.StartupPath + “\lib\Aspose.Pdf.lic”)

    'Preprocessing keywords
    KeyWord = KeyWord.Trim()
    KeyWord = KeyWord.Replace("(?i) ", "(?i)")
    KeyWord = KeyWord.Replace(" |", "|")
    KeyWord = KeyWord.Replace(" ", "\s+")
    KeyWord = KeyWord.Replace("(?i)", "\b(?i)")
    KeyWord = KeyWord.Replace("|", "\b|")
    KeyWord = KeyWord + "\b"

    Dim Document As New Document(PathPDFSource)

    Dim TextFragmentAbsorber As New TextFragmentAbsorber(KeyWord)

    'Set text search option to specify regular expression usage
    Dim TextSearchOptions As New TextSearchOptions(True)

    TextFragmentAbsorber.TextSearchOptions = TextSearchOptions

    Document.Pages.Accept(TextFragmentAbsorber)

    Dim TextFragmentCollection1 As TextFragmentCollection = TextFragmentAbsorber.TextFragments
    If TextFragmentCollection1.Count > 0 Then
        
        For Each TextFragment As TextFragment In TextFragmentCollection1

            'SCRITTURA DEL FILE XML PER L'EVIDENZIAZIONE ARTICOLO
            Dim FreeText As New HighlightAnnotation(TextFragment.Page, New Aspose.Pdf.Rectangle(TextFragment.Position.XIndent, TextFragment.Position.YIndent, TextFragment.Position.XIndent + TextFragment.Rectangle.Width, TextFragment.Position.YIndent + TextFragment.Rectangle.Height))
            
            FreeText.Opacity = 0.9
            FreeText.Color = Aspose.Pdf.Color.FromArgb(156, 246, 242)

            TextFragment.Page.Annotations.Add(FreeText)
        Next
        
    End If

    Document.Save(PathPDFSource)
Catch ex As Exception
    'MyLog.Error(ex.Message)

    
End Try

End Sub

@francescoes

We have been able to reproduce the issue in our environment. A ticket with ID PDFNET-45978 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.

hi is it possible to have a time frame to resolution ? it’s very important for our work to fix this problem .

@francescoes

The issue reported by you has recently been logged in our issue management system. Currently, it is pending owing to previously logged tickets and will be investigated on its due turn, that can take few months. We appreciate your patience and comprehension in this regard.

However, we also offer Paid Support, where issues are used to be investigated with higher priority. Our customers, who have paid support subscription, report their issue there which are meant to be investigated urgently. In case your reported issue is a blocker, you may please consider subscribing for Paid Support. For further information, please visit Paid Support FAQs.

ok we have recently update library to last version , with another account , this can change the period or is the same , is there a sort of guarantee on our purchase?

best regards

@francescoes

Please note that we have explained Paid Support subscription only. It will not affect your subscription to latest versions of the API because your license stays the same. However, your issues will then be scheduled soon as compared to free support tickets. You may contact our sales team or go through Paid Support Policies for further information.

hi is it possible to know the state of fix please ? has passed more than one month .

best regards

@francescoes

We are afraid it has not been scheduled yet owing to previously logged tickets in the queue. It may take some more months to resolve. We are thankful to you for your patience.

six month has passed … what is the time schedule to fix problem on your library ?

@francescoes

We are afraid it has not been resolved yet. We will let you know as soon as some significant progress will be made in this regard.

is it possible to know if the problem is resolved ? we wait more than 6 month for have a important fix on your library

@francescoes

We are afraid it is still unresolved and ETA is not available yet. We have recorded your concerns and will try to schedule it soon. We are really thankful for your patience.

@francescoes

We have investigated the issue and found it is not a bug. The common scenario of searching text returns a single text line. In the common scenario:

  • TextFragment contains one or more TextSegment(s) on the single line;
  • TextFragment.Rectangle describes all found text. And it will match TextSegment.Rectangle(s);
  • TextFragment.Position means text starts and matches with the lower-left corner of TextFragment.Rectangle and first TextSegment.Rectangle.

But the current scenario with a multiple lines text search result changes it. The meaning of objects remains intact but it does not match more:

  • TextFragment contains at least two TextSegment(s) on the lines;
  • TextSegment rectangles remain to keep position of segments. One on the end of first-line and the second on the start of second-line;
  • TextFragment.Rectangle describes all found text. Therefore it includes both text lines.
  • TextFragment.Position means text starts. It now matches with the lower-left corner of the first TextSegment.Rectangle but not with the corner of TextFragment.Rectangle.

Please see Figure1.png (36.1 KB).

Therefore construction New Aspose.Pdf.Rectangle(TextFragment.Position.XIndent, TextFragment.Position.YIndent, TextFragment.Position.XIndent + TextFragment.Rectangle.Width, TextFragment.Position.YIndent + TextFragment.Rectangle.Height)) will not working with multi-line text.

Please consider the following code snippets and select one of the alternatives:

Document doc = new Document(dataDir + @"article_raw_temp.pdf");
TextFragmentAbsorber absorber = new TextFragmentAbsorber(@"Blue\r\nPanorama");
//Set text search option to specify regular expression usage
TextSearchOptions options = new TextSearchOptions(true);
absorber.TextSearchOptions = options;
doc.Pages.Accept(absorber);

foreach (TextFragment fragment in absorber.TextFragments)
{                
    Annotation annot = new HighlightAnnotation(fragment.Page, (Rectangle)fragment.Rectangle.Clone());                
    annot.Color = Aspose.Pdf.Color.FromArgb(156, 246, 242);
    fragment.Page.Annotations.Add(annot);
}
doc.Save(dataDir + "45987_out_fragments_highlighted.pdf");

Result: 45987_out_fragments_highlighted.pdf (1.2 MB)

OR

Document doc = new Document(dataDir + @"article_raw_temp.pdf");
TextFragmentAbsorber absorber = new TextFragmentAbsorber(@"Blue\r\nPanorama");
//Set text search option to specify regular expression usage
TextSearchOptions options = new TextSearchOptions(true);
absorber.TextSearchOptions = options;
doc.Pages.Accept(absorber);

foreach (TextFragment fragment in absorber.TextFragments)
{
    foreach (TextSegment segment in fragment.Segments)
    {
        Annotation annot = new HighlightAnnotation(fragment.Page, (Rectangle)segment.Rectangle.Clone());
        annot.Color = Aspose.Pdf.Color.FromArgb(156, 246, 242);
        fragment.Page.Annotations.Add(annot);
    }
}
doc.Save(dataDir + "45987_out_segments_highlighted.pdf");

Result: 45987_out_segments_highlighted.pdf (1.2 MB)