Convert PDF file tables to html string issue

VenkateshBT · September 20, 2019, 7:29am

Hi,

I have a requirement for importing PDF file content to my application, where if we use this PDF file which contains tables to convert to html, its just converting as DIVs with styles so we are unable to edit the cells data and unable to copy the entire table also. Basically, we are not getting the html string in table format i.e, "<“table>”<“table>instead we get “<“div>”<”/div>”

Can you please guide me is there any api to get the entire pdf content as html with appropriate html elements.

We have purchased Apsose.Tatal license.

IntegrateTableWithDatabase.pdf (44.6 KB)Out.zip (32.1 KB)

I have attached pdf file and its output. But i want tables data as '"<“table>” elements as html strings.

Note: I have used coats ("") to display the html tags due to display problems. Please ignore these coats.

asad.ali · September 20, 2019, 6:34pm

@VenkateshBT

We have used following code snippet with Aspose.PDF for .NET 19.9 and noticed the same behavior that you have mentioned.

```
var pdf = new Document(dataDir + "IntegrateTableWithDatabase.pdf");
HtmlSaveOptions options = new HtmlSaveOptions
{
                FontSavingMode = HtmlSaveOptions.FontSavingModes.AlwaysSaveAsTTF,
                FontEncodingStrategy = HtmlSaveOptions.FontEncodingRules.DecreaseToUnicodePriorityLevel,
                PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedCssOnly,
                TryMergeAdjacentSameBackgroundImages = false,
                RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsPngImagesEmbeddedIntoSvg
};
pdf.Save(dataDir + "output199.html", options);

We have logged an enhancement ticket as PDFNET-47006 in our issue tracking system for your requirement of generating HTML with appropriate tags. As PDF to HTML conversion is feature related to Aspose.PDF API, we will investigate further about the feasibility of requirements and will let you know as soon as we have some updates. Please spare us little time.

We are sorry for the inconvenience.