HTML file to tagged pdf

Hi Aspose,

Is it possible to automatic convert a HTML file into a tagged pdf document? The best would be if I can insert a html fragment into the document and it is automatically tagged, but it could also work for create a pdf document directly from a html file / html text. Please let me know what my options are.

Br, Thomas

1 Like

@thoms31

We need to further investigate your requirements in details. However, could you please provide a sample HTML file and an expected output PDF so that we can further proceed accordingly.

@asad.ali

Any update on this?

@Arjun7766

As requested in our earlier response, we need a sample HTML file along with expected output PDF to investigate this requirement. Can you please provide it so that we can log an investigation ticket and share the ID with you.

@asad.ali I am unable to upload the file due to file extension limitations so I have attached sample HTML below. I am using Aspose.words (builder) to construct a dynamic word document and converting it into a pdf file. Now it is possible to convert it to a Tagged PDF? I have tried converting word into HTML and loaded it to ASPOSE.PDF but still, I’m unable to generate tagged pdf. Could you please suggest the best way to create tagged pdf and also let me know if aspose has the auto-tagging feature.

NOTE: I am using the adobe pro accessibility tool to verify this and I am getting tagged content failed error

<html>
<body>
	<p> User Details : </p>
	<table>
		<tr>
			<td> User Name : </td> <td> Arjun7766</td>
		</tr>
		<tr>
			<td> First Name : </td> <td> Arjun </td>
		</tr>
		<tr>
			<td> Last Name : </td> <td> G </td>
		</tr>
	</table>
</body>
</html>

@Arjun7766

Aspose.PDF for .NET offers the feature to convert a simple PDF into Tagged PDF. Please try to use the code snippet below after converting your Word file into PDF and let us know in case you face any issues:

Aspose.Pdf.Document pdfDoc2 = new Aspose.Pdf.Document(dataDir + "input.pdf");
pdfDoc2.Convert(dataDir + "log.xml", PdfFormat.PDF_UA_1, ConvertErrorAction.Delete);
pdfDoc2.Save(dataDir + "tagged.pdf");

@asad.ali
Thank you for the quick reply, I have tried the code you provided and tested output pdf in both Adobe PRO Accessibility checker and PAC 2021 tools and it was not completely tagged and has WCAG errors. Please find the attached file, it has a sample code with two scenarios and a pdf accessibility report.

Aspose.zip (43.7 KB)

@Arjun7766

Please share the sample source .docx file with us as well so that we can test the scenario in our environment and address it accordingly. Also, please share the generated output PDF at your end.

@asad.ali Sure, I will share the documents. I have a few more questions,

  1. it is possible to have two tagged two div’s (div1 and div2) one after another (horizontally) each div should take 50% width. ? (refer to the code below)

  2. it is possible to have a tagged order/unordered list (ol/ul) using tagged content. ?

  3. I am trying to load HTML to aspose.words.builder using InsertHtml and some of the inline HTML styles/ global css are removing while trying to add two div’s (div1 and div2) one after another (horizontally) each div should take 50% width.

> // Create PDF Document
>     var document = new Document();
>     // Get Content for work with TaggedPdf
>     ITaggedContent taggedContent = document.TaggedContent;
>     var rootElement = taggedContent.RootElement;
>     var div1  = taggedContent.CreateDivElement();
>     var div2  = taggedContent.CreateDivElement();

@Arjun7766

These requirements of yours need to be investigated further and for the sake of it, we need a sample expected output PDF as well. Please try to share it along with the sample source file requested earlier. We will log investigation tickets accordingly and share the ID with you.

In above case, please create a post in Aspose.Words category where you will be assisted from Aspose.Words perspective.

@asad.ali

We are trying to achieve below design in PDF using tagged content

UserName: Arjun7766 Active: true
First Name: Arjun Last Name: G

@Arjun7766

An investigation ticket as PDFNET-50694 has been logged in our issue tracking system for your specific requirements. We will further look into its details and keep you posted with the status of ticket resolution. Please be patient and spare us some time.

We are sorry for the inconvenience.