Anyway to get tables with converting from pdf to HTML?

Hi when I convert a pdf that has tables to HTML it uses divs to create the tables.


Is there anyway to get it to use standard html table markup? Eg
etc.

Thanks

Hi Brian,


Thanks for contacting support.

Currently when exporting PDF file to HTML format, the page elements are rendered in
tags. However please share your source/input PDF file, so that we can test your scenario in our environment.

Sure see attached.


We have aspose.words as well, is converting from pdf to word then word to html an option to get tables?

Hi Brian,


Thanks for sharing the resource files.

I
have tested the scenario and I am able to reproduce the same problem that table is represented by DIV tags, instead of standard Table HTML markup. For the
sake of correction, I have logged it in our issue tracking system as
PDFNEWNET-39027. We
will investigate this issue in details and will keep you updated on the status
of a correction.
We
apologize for your inconvenience.


However as a workaround, you may consider converting PDF file to Excel worksheet and then transform Excel file to HTML format using Aspose.Cells. For more information, please visit

Just checking is there any update on if you plan to support HTML tables when converting from PDF to HTML?

Hi Brian,


Thanks for your inquiry. I am afraid your above reported issue is still not resolved. It is pending for investigation as product team is busy in resolving other issues in the queue, reported earlier. We will notify you as soon as we made some significant progress towards issue resolution.

We are sorry for the inconvenience caused.

Best Regards,

Hi,

I am using aspose-pdf version - 18.4.(Java). I am getting the same issue when converting pdf to html . Tables are coming as div. Do you have any updated version to solve this issue?

@hashmiya,

Please send us your source PDF and code. We will investigate your scenario in our environment, and share our findings with you.

Hi, I shared the code and Documents in private mail. Please have a look into it. Hope you will update me the issue asap

@hashmiya,

I have not received your email message in my inbox. Please share your email address which you have used to send the email. You can send a private message in this forum or create another private thread to share files.

public static void main(String[] args) throws Exception {

	setLicenceForAspose();
	
	convertpdfToHtmlintoSingleSource();
	
	System.out.println("Completed");
}

/**
 * Function set licence
 * @throws Exception
 */
public static void setLicenceForAspose() throws Exception {
	
	com.aspose.words.License wordLicense = new com.aspose.words.License();
	com.aspose.pdf.License pdfLicense = new com.aspose.pdf.License();
	
	
	// Create a stream object containing the license file
	FileInputStream fstreamWord = new FileInputStream(LicencePath + "Aspose.Words.lic");
	FileInputStream fstreamPdf = new FileInputStream(LicencePath + "Aspose.Pdf.lic");
	
	//Set the license through the stream object
	wordLicense.setLicense(fstreamWord);
	pdfLicense.setLicense(fstreamPdf);
}	

/**
 * Function to convert Pdf to html. The html should be a single resource.
 */
public static void convertpdfToHtmlintoSingleSource() {
	
	// For complete examples and data files, please go to https://github.com/aspose-pdf/Aspose.Pdf-for-Java
	com.aspose.pdf.Document doc = new com.aspose.pdf.Document(pdfPath + "pdfToHtmlSample.pdf");

	com.aspose.pdf.HtmlSaveOptions newOptions = new com.aspose.pdf.HtmlSaveOptions();

	// Enable option to embed all resources inside the HTML
	newOptions.PartsEmbeddingMode = com.aspose.pdf.HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;

	// This is just optimization for IE and can be omitted
	newOptions.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
	newOptions.RasterImagesSavingMode = com.aspose.pdf.HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
	newOptions.FontSavingMode = com.aspose.pdf.HtmlSaveOptions.FontSavingModes.SaveInAllFormats;

	// we can use some non-existing file name all real saving will be done in CustomerHtmlSavingStrategy
	String outHtmlFile = htmlPath + "pdfToHtmlSample.html";
	doc.save(outHtmlFile, newOptions);
}

pdfToHtmlSample.pdf (110.8 KB)

@hashmiya,

We managed to replicate the said problem in our environment. We have logged an enhancement ticket ID PDFJAVA-37758 in our issue tracking system. We have linked your post to this ticket and will keep you informed regarding any available updates.

Thankyou for your updates.
Is there any alternate solution for this?

@hashmiya,

There is no workaround at the moment. We will let you know in case of any other way round.