Dear Aspose support team,
I create a searchable PDF in the following way with Aspose.PDF Version 21.7.0 in C#:
const string dataDir = @"C:\Temp\_\Aspose";
var document = new Aspose.Pdf.Document();
var page = document.Pages.Add();
var width = 2550.0 * 72.0 / 300.0;
var height = 3300.0 * 72.0 / 300.0;
page.SetPageSize(width, height);
page.PageInfo.Margin.Bottom = 0;
page.PageInfo.Margin.Top = 0;
page.PageInfo.Margin.Left = 0;
page.PageInfo.Margin.Right = 0;
using (var stream = File.OpenRead(System.IO.Path.Combine(dataDir, "0.tif")))
{
page.AddImage(File.ReadAllText(@"C:\Temp\_\Aspose\0.hOCR.html"), stream, new Aspose.Pdf.Rectangle(0, 0, width, height));
}
document.Convert(@"C:\Temp\_\Aspose\log.xml", Aspose.Pdf.PdfFormat.PDF_A_1A, Aspose.Pdf.ConvertErrorAction.Delete);
using (var output = File.OpenWrite(System.IO.Path.Combine(dataDir, "pdf.pdf")))
{
document.Save(output);
}
This works perfectly. But now I would like to add tags. Preferably automatically, of course, but I have not found a way. The following Convert call did not lead to the desired result.
document.Convert(@"C:\Temp\_\Aspose\log.xml", Aspose.Pdf.PdfFormat.PDF_UA_1, Aspose.Pdf.ConvertErrorAction.Delete);
I came across the following approach:
[https://contentlab.io/aspose-pdf-net-accessibility/](https://Document Accessibility with Aspose.PDF for .NET)
But here the text coordinates of the OCR are lost, so to speak. It seems to me the tagged content can only be added in “flow”, but not with fixed coordinates. Also, I can’t find the option to make the text invisible, as it is in the source document.
Does Aspose.PDF for .NET provide a way to accomplish this?