Word to PDF hidden text behavior

Hello

I have a Word document with certain text hidden. I’m exporting this document to PDF. Can ASPOSE find hidden text in PDF for this specific scenario?

Thanks
DK

@D_K

You can use LayoutOptions.ShowHiddenText property to set or get indication of whether hidden text in the document is rendered.

Moreover, Font.Hidden property returns true for hidden text. Please use the following code example to render the hidden text in the output PDF.

Document doc = new Document(MyDir + "input.docx");
doc.LayoutOptions.ShowHiddenText = true;
doc.Save(MyDir + "20.3.pdf");

If you still face problem, please share some more detail about your query. We will answer your query accordingly.

Hello Tahir

Thanks for your response

Let me just clarify one point.

I already have pdf documents that were generated using third party software by Conor ring word to pdf. The original word document had hidden text

My question is can aspose see hidden text in pdf file

Thanks
DK

@D_K

Please note that Aspose.Words does not read the PDF file. Aspose.Words does not provide API to detect either PDF text is hidden text or not.

Perhaps, you can achieve your requirement using Aspose.PDF. Could you please ZIP and attach your input Word, PDF files and expected output? We will then provide you more information about your query.

Tahir

I was referring to Aspose PDF.

The ask is very simple. Attached are two files. One is Word document with some hidden text.
Another one is same word document saved as Pdf.

My question is can I programmatically access the hidden text in Pdf.

Thanks
DK

Word_doc_with_hidden_text_saved_as_Pdf.pdf (179.4 KB)
Aspose_pdf_hidden_text.zip (177.8 KB)

@D_K

In your case, we suggest you following solution.

  1. Load the document into Aspose.Words’ DOM.
  2. Iterate over Run nodes of document and check the text of Run node either it is hidden or not. You can check hidden font formatting of text using Run.Font.Hidden property.
  3. Bookmark the Run node.
  4. Save the document to PDF.
  5. Use Aspose.PDF to detect the text of bookmarks.

We have moved this forum thread to Aspose.Total forum where you will be guided appropriately regarding Aspose.PDF.

@D_K

We have checked both Word and PDF files and were unable to notice any hidden text (Select All text using Ctrl + A and paste in Notepad). The obtained text was only 1000000’ which was visible. Would you kindly highlight the hidden text in shared documents or please share a way how you are able to extract it manually. We will further proceed to assist you accordingly.

Asad

Thanks for looking into this.

The word doc Sample_Word_Doc_With_Hidden_Text has hidden text.

If you open the document using Word and press Ctrl-Shift-8 you will see 1000000

Tahir

Thanks for your time to look into this.

I have no control of how word document is created. I just know there is hidden text in it. I also don’t have control over how this word doc is converted to PDF.

My task is to find hidden text in the PDF output.

I’m guessing this hidden text doesn’t make it to the PDF file.

You can very easily replicate my steps end to end as follows:

  1. Start Microsoft Word
  2. Type the following string: “HiddenText-VisibleText”
  3. Highlight “HiddenText” and press Ctrl-Shift-H
  4. Save this document as PDF file
  5. Try finding the string “HiddenText” using Aspose PDF API

@D_K

Thanks for sharing more details.

We were unable to find the hidden text in PDF document converted by Aspose.Words and have logged an investigation ticket as PDFNET-47816 in our issue tracking system. We will further investigate the scenario in detail and let you know as soon as we have additional updates in this regard. Please spare us some time.

We are sorry for the inconvenience.

Hello Asad any luck resolving the issue?

@D_K

Regretfully the issue is still pending for analysis. It has been logged under normal support model and will be investigated/resolved on first come first serve basis. We will surely let you know as soon as we have some certain updates in this regard. Please spare us some time.

@D_K

We have investigated the issue and found it is not a bug in Aspose.PDF. The point is that ‘Word_doc_with_hidden_text_saved_as_Pdf.pdf’ contains no hidden text. It contains the visible text of ‘1000000’ only. (See: Contents.png)Contents.png (18.2 KB)

Sample_Word_Doc_With_Hidden_Text.docx actually contains the hidden text of <Tag> (twice). So, it seems the issue is related to Aspose.Words that it is not including this text to the PDF as invisible text or another kind of object (bookmark, annotation, etc.). We will further review it from Aspose.Words perspective and will get back to you.

@D_K

Please note that Aspose.Words mimics the behavior of MS Word. If you convert your document to PDF using MS Word, you will get the same output.