Aspose.PDF For NET 通过TextFragmentAbsorber 查找 ° 符号查找失败

hsheng12 · April 19, 2021, 12:39pm

Aspose.PDF For NET 通过TextFragmentAbsorber 查找 ° 符号查找失败
下面是执行代码
Document pdfDocument = new Document(filepath);

                TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("°");

                pdfDocument.Pages[pageindex].Accept(textFragmentAbsorber);

                PageCollection pages = pdfDocument.Pages;
                pages.Accept(textFragmentAbsorber);

请问有其他方式可以在PDF中查找到 ° 这种符号吗？

asad.ali · April 19, 2021, 6:31pm

@hsheng12

您是否也可以与我们共享您的样本PDF文件，以便我们可以在我们的环境中测试场景并进行相应处理？

hsheng12 · April 20, 2021, 2:08am

32543-8.pdf (446.6 KB)
这是PDF 文件
@asad.ali

asad.ali · April 20, 2021, 9:33pm

@hsheng12

在使用21.4v的API测试场景时，我们能够在环境中重现该问题。因此，在我们的问题跟踪系统中将其记录为PDFNET-49793。我们将进一步调查其详细信息，并向您发布其更正状态。请耐心等待，为我们节省一些时间。

我们对造成的不便很抱歉。

hsheng12 · April 21, 2021, 3:47am

需要等待多久才获得更新？
或者你可以告诉我如何在浏览器的web页面中，获取对应的PDF页面的坐标，用来添加批注；
感谢！
@asad.ali

asad.ali · April 21, 2021, 8:31pm

@hsheng12

该票证最近已记录在我们的问题跟踪系统中，将以先到先得的方式解决。

此外，关于在页面上的特定位置添加注释的要求，是否要在使用Aspose.PDF时捕获Web浏览器中的鼠标单击以确定位置？请分享您的用例的更多细节，以便我们可以相应地提供反馈。

hsheng12 · May 6, 2021, 3:38pm

@asad.ali
在web浏览器中用PDF.JS预览PDF文件，根据鼠标选中的文字坐标来添加PDF的注释。但是我们在浏览器获取到坐标与Aspose.PDF的坐标有偏差。

asad.ali · May 6, 2021, 7:40pm

@hsheng12

您可能会在输出的PDF中注意到不同的坐标，因为PDF格式遵循一个协调系统，其中（0,0）表示左下角。但是，请共享一个PDF，其中包含您通过单击鼠标所获得的坐标值以及要在其中添加注释的预期位置信息。我们将进一步调查可行性，并与您分享我们的反馈。

hsheng12 · May 7, 2021, 2:45pm

我们在WEB浏览器中用PDF.JS展示PDF文件，让客户在网页中添加注释；
现在的方式是用户PDF页面中选择一小段字符，然后我们用选中的字符和PDF的页码通过Aspose.Pdf中的TextFragmentAbsorber来检索指定页码中是否存在这一段字符，从而获取这段字符的坐标。
这是现在使用的代码：

      public bool AddPDFAnnotation(string filepath, int pageindex, string Annotation, string dw)
        {
            try
            {

                License license = new License();
                license.SetLicense("Aspose.Pdf.lic");
                //open document
                Document pdfDocument = new Document(filepath);

                TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(dw);
                pdfDocument.Pages[pageindex].Accept(textFragmentAbsorber);

                PageCollection pages = pdfDocument.Pages;
                pages.Accept(textFragmentAbsorber);
                // textFragmentAbsorber.TextFragments[0].Rectangle;

                if (textFragmentAbsorber.TextFragments.Count == 0)
                { 
                    return false;
                }

                //create annotation
                TextAnnotation textAnnotation = new TextAnnotation(pdfDocument.Pages[pageindex], textFragmentAbsorber.TextFragments[1].Rectangle);// new Aspose.Pdf.Rectangle(200, 200, 200, 200));
                textAnnotation.Title = "注释标题";
                textAnnotation.Subject = "注释主题";
                textAnnotation.State = AnnotationState.Accepted;
                textAnnotation.Contents = Annotation; 
                textAnnotation.Open = true;
                textAnnotation.Icon = TextIcon.Key;
                Border border = new Border(textAnnotation);
                border.Width = 5;
                border.Dash = new Dash(1, 1);
                textAnnotation.Border = border;
                textAnnotation.Rect = textFragmentAbsorber.TextFragments[1].Rectangle;// new Aspose.Pdf.Rectangle(200, 100, 100, 100);
                //add annotation in the annotations collection of the page
                pdfDocument.Pages[pageindex].Annotations.Add(textAnnotation);
                ////save output file
                pdfDocument.Save(filepath);
                return true;
            }
            catch (Exception ex)
            {
               // Loger.Error(ex, " ");
                return false;
            }
        }

但是上面的代码有缺陷，一旦用户选择的内容包含空格、特殊符号或者选中的字符太长了，就会出现TextFragmentAbsorber 检索不到PDF内容，导致获取不到坐标的情况；

希望你们可以提供一个解决方式给我们来处理这个问题；

或者可以告诉我如何在PDF.JS显示的PDF文件中获取到精确的坐标来对应PDF文件的有效坐标来添加注释；
拜托了，谢谢！

这个是客户生产的PDF文件格式：
32543-8.pdf (446.6 KB)
微信截图_20210507224424.jpg (148.0 KB)

asad.ali · May 9, 2021, 11:15pm

@hsheng12

感谢您共享文件。我们正在测试该方案，并将尽快与您联系。

hsheng12 · May 11, 2021, 2:29am

@asad.ali 可以给我们一个答复的时间范围吗？我们这边很着急。

asad.ali · May 11, 2021, 11:23pm

@hsheng12

感谢您的耐心等待。

我们已经在使用21.4版本的API的环境中测试了该方案。

我们搜索了一个文本，例如“ MS0001D1”，并使用代码“ pdfDocument.Pages [1] .Annotations.Add（textAnnotation，true）;”为其添加了文本注释。并注意到输出是正确生成的。

看起来PDF内的页面已旋转，最好在添加注释的同时考虑旋转。如上所示，您可以在添加注释的同时将第二个参数设置为“ true”。随附完整的代码和生成的输出，以供您参考。

Document pdfDocument = new Document(dataDir + "32543-8.pdf");

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("MS0001D1");
pdfDocument.Pages[1].Accept(textFragmentAbsorber);

PageCollection pages = pdfDocument.Pages;
pages.Accept(textFragmentAbsorber);
// textFragmentAbsorber.TextFragments[0].Rectangle;

//create annotation
TextAnnotation textAnnotation = new TextAnnotation(pdfDocument.Pages[1], textFragmentAbsorber.TextFragments[1].Rectangle);// new Aspose.Pdf.Rectangle(200, 200, 200, 200));
textAnnotation.Title = "注释标题";
textAnnotation.Subject = "注释主题";
textAnnotation.State = AnnotationState.Accepted;
textAnnotation.Contents = "Annotation";
textAnnotation.Open = true;
textAnnotation.Icon = TextIcon.Key;
Border border = new Border(textAnnotation);
border.Width = 5;
border.Dash = new Dash(1, 1);
textAnnotation.Border = border;
textAnnotation.Rect = textFragmentAbsorber.TextFragments[1].Rectangle;// new Aspose.Pdf.Rectangle(200, 100, 100, 100);
//add annotation in the annotations collection of the page
pdfDocument.Pages[1].Annotations.Add(textAnnotation, true);
////save output file
pdfDocument.Save(dataDir + "output.pdf");

output.pdf (447.0 KB)

hsheng12 · May 13, 2021, 2:40pm

@asad.ali
感谢您的解答；
我们现在使用的就是这个方式；但是这个方式有缺陷：
1、不支持℃，温度对于我们的客户是非常重要的衡量参数。（这个前面您已经解答了，现在不支持。）
2、如果一页文档中有相同的文本时，就会出现定位不准确的问题，程序不会知道用户选择的是哪一个文本，它只会选择第一个。
例如32543-8.pdf中第3页的“纯度” 和 “GC-14C,DB-1,H2,80℃,10℃/min10min” 页面中有多个这样的文字时，用文字定位就会出现问题。
我们选择一长串文字，用选中的文字去TextFragmentAbsorber中定位时也会出现搜索不到的情况。

所以我们希望您能提供一种用PDF.JS或者其他在浏览器中预览PDF时通过选中的区域坐标添加注释的方法，而不是用TextFragmentAbsorber搜索文字的方式。
拜托了！

asad.ali · May 17, 2021, 9:52pm

@hsheng12

我们正在检查该问题，并将尽快与您联系。

asad.ali · June 13, 2021, 9:38pm

hsheng12:

感谢您的解答；
我们现在使用的就是这个方式；但是这个方式有缺陷：
1、不支持℃，温度对于我们的客户是非常重要的衡量参数。（这个前面您已经解答了，现在不支持。）
2、如果一页文档中有相同的文本时，就会出现定位不准确的问题，程序不会知道用户选择的是哪一个文本，它只会选择第一个。
例如32543-8.pdf中第3页的“纯度” 和 “GC-14C,DB-1,H2,80℃,10℃/min10min” 页面中有多个这样的文字时，用文字定位就会出现问题。
我们选择一长串文字，用选中的文字去TextFragmentAbsorber中定位时也会出现搜索不到的情况。

所以我们希望您能提供一种用PDF.JS或者其他在浏览器中预览PDF时通过选中的区域坐标添加注释的方法，而不是用TextFragmentAbsorber搜索文字的方式。
拜托了！

@hsheng12

关于您的上述要求，我们在我们的问题管理系统中记录了一个单独的问题 PDFNET-50074。我们将进一步检查它，并让您知道我们是否有关于其可行性的消息。请给我们一些时间。