We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Replace image with text in pdf page using Aspose.PDF for .NET

i have a scenario where there is a pdf page contins both text and image and i use aspose.ocr
to convert this image into text now I’m stuck there is no function to replace this image with text the existing function only replace the image with another one as a workaround I’ve tried to delete this image and append the text but when i use pdfcontenteditor.createtext or .createfreetext both of this functions display the text as image inside the page so any suggestion to fix this issue.

@YEHIAAHMED

Would you kindly share your sample PDF document along with an expected output PDF. We will test the scenario in our environment and address it accordingly.

I need to convert nonsearchable PDF to searchable PDF using OCR and .Net so for example in the attached PDF file page no 5 this page contains [ text - image - text] i use the aspose.pdf and ocr to extract this image and convert it to text but i cannot write it back in the same position in the same page when i tried to conver the whole page to image and this convert it back to text and write it to the pdf i lost the original format of the text of this page
Change Management Plan v0.4.pdf (737.6 KB)

@YEHIAAHMED

We have checked the source file which you have shared. It contains images with graphics/icons and text. We regret to share that currently API does not offer any feature to replace images with text in a way you require.

As we requested earlier, would you kindly provide an expected output PDF as well. This would help us understanding your complete requirements and investigate them accordingly whether they could be feasibile to achieve or not using API.

I’ve attached one image with name originalfilename it’s just screen shoot of page no 5 from the shared pdf and there is another image with name expectedoutput to decribe the desired result … and here is the code I’m trying to solve the issue because there is no function to replace image with text first i get the position of the image and save x and y position after that i delete this image and tried to create textfragment in the same position of the image.
the deletion work but the code to write the text to pdf doesnot work it’s inside region with name “WriteTextToPDF” (it only work if i first save the edited pdf and reload it again and this is consuming more time) ExpectedOutput.png (30.0 KB)
OriginalFile.png (78.2 KB)
and here is the full code:

public static void ConvertOCRToPDF(string pdfFileName, string rootPath)
{
{

            // Initialize license object
            Aspose.Pdf.License license = new Aspose.Pdf.License();
            FileStream myStream = new FileStream(rootPath + "Aspose.Pdf.lic", FileMode.Open);
            license.SetLicense(myStream);
            Aspose.OCR.License ocrLicense = new Aspose.OCR.License();
            FileStream ocrMyStream = new FileStream(rootPath + "Aspose.OCR.lic", FileMode.Open);
            ocrLicense.SetLicense(ocrMyStream);
            //Start PDF Processing
            var pdfDocument = new Aspose.Pdf.Document(rootPath + pdfFileName);
            var ocrEngine = new Aspose.OCR.OcrEngine();
            int pageCount = 1;
            Document document = new Document();

            while (pageCount <= pdfDocument.Pages.Count)
            {

                MemoryStream ms = new MemoryStream();
                //Instantiate PdfExtractor object
                PdfExtractor extractor = new PdfExtractor();
                //Bind the input PDF document to extractor
                extractor.BindPdf(rootPath + pdfFileName);
                //Process page by page
                extractor.StartPage = pageCount;
                extractor.EndPage = pageCount;
                ImagePlacementAbsorber abs = new ImagePlacementAbsorber();
                pdfDocument.Pages[pageCount].Accept(abs);

                extractor.ExtractImage();
                ImagePlacementAbsorber abstst = new ImagePlacementAbsorber();
                if (extractor.HasNextImage())
                {
                    pdfDocument.Pages[pageCount].Accept(abstst);
                    System.Drawing.Rectangle curRect = new System.Drawing.Rectangle();

                    foreach (ImagePlacement imagePlacement in abs.ImagePlacements)
                    {
                        curRect.Width = Convert.ToInt32(imagePlacement.Rectangle.Width);
                        curRect.Height = Convert.ToInt32(imagePlacement.Rectangle.Width);
                        curRect.X = imagePlacement.Resolution.X;
                        curRect.Y = imagePlacement.Resolution.Y;
                    }

                    MemoryStream curImageStream = new MemoryStream();
                    extractor.GetNextImage(curImageStream);

                    // Get the image using ImagePlacement object
                    //XImage xImage = pdfDocument.Pages[pageCount].Resources.Images[1];

                    ocrEngine.Image = Aspose.OCR.ImageStream.FromStream(curImageStream, Aspose.OCR.ImageStreamFormat.Jpg);
                    System.Drawing.Image img = System.Drawing.Image.FromStream(curImageStream);
                    img.Save(rootPath + pdfFileName + pageCount + ".Jpeg", ImageFormat.Jpeg);
                    string converstionText = string.Empty;
                    if (ocrEngine.Process())
                    {
                        converstionText = ocrEngine.Text.ToString();
                    }
                    PdfContentEditor pdfContentEditor = new PdfContentEditor();
                    pdfContentEditor.BindPdf(rootPath + pdfFileName);
                    // First Delete the image on the current page
                    pdfContentEditor.DeleteImage(pageCount, new int[] { 1 });

                    //This part of code is not affected on the pdf
                    #region WriteTextToPDF
                    Aspose.Pdf.Page pdfPage = pdfDocument.Pages[pageCount];
                    TextFragment textFragment = new TextFragment(converstionText);
                    textFragment.Position = new Position(curRect.X, curRect.Y);
                    // Set text properties
                    textFragment.TextState.FontSize = 12;
                    textFragment.TextState.Font = FontRepository.FindFont("TimesNewRoman");
                    textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.LightGray);
                    textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Red);
                    TextBuilder textBuilder = new TextBuilder(pdfPage);                
                    textBuilder.AppendText(textFragment);
                    #endregion

                    // Save output PDF
                    pdfContentEditor.Save(rootPath + "ReplaceImage_out.pdf");

                    //I just return after the first page processing to view the result
                    return;

                }
                pageCount = pageCount + 1;
            }

        }

    }

@YEHIAAHMED

You may please try using FloatingBox as an alternative approach. You can place it on the image location and add text inside it. For example, please check following code snippet that places a floating box on Page#5 and add text inside it.

//This part of code is not affected on the pdf
#region WriteTextToPDF
TextFragment textFragment = new TextFragment("Test Text");
// Set text properties
textFragment.TextState.FontSize = 12;
textFragment.TextState.Font = FontRepository.FindFont("TimesNewRoman");
textFragment.TextState.BackgroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.LightGray);
textFragment.TextState.ForegroundColor = Aspose.Pdf.Color.FromRgb(System.Drawing.Color.Red);

FloatingBox floatbox = new FloatingBox(curRect.Width, curRect.Height);
floatbox.BackgroundColor = Color.Red;
floatbox.Left = -15;
// because the page has margins
floatbox.Top = curRect.Top - 45;
floatbox.Margin = new MarginInfo(0, 0, 0, 0);
floatbox.Paragraphs.Add(textFragment);
pdfDocument.Pages[pageCount].Paragraphs.Add(floatbox);
#endregion

thanks for your reply … i will try this solution