Hi Team,
I was using Aspose.Word for converting word document to HTML string. Can I get a similar example to do the same thing for converting Pdf to Html string?
Below is the example of word document.
private static string WordDocumentToHtml(Stream fileStream)
{
var document = new Aspose.Words.Document(fileStream);
var options = new Aspose.Words.Saving.HtmlSaveOptions() {
ExportImagesAsBase64 = true,
UseHighQualityRendering = true
};
using (var output = new MemoryStream())
{
document.Save(output, options);
var html = Encoding.UTF8.GetString(output.GetBuffer(), 0, (int)output.Length);
return html;
}
}
I need a method to return converted html as string. We need this html string to display on TinyMce editor. One problem which we face is TinyMce not support g and svg elements of html.
Thanks,
Mukesh Singh
@msingh02
Thank you for contacting support.
You may use below code snippet to convert a PDF document to HTML with base64 embedded images using Aspose.PDF for .NET API.
private static string PDFDocumentToHTML(Stream fileStream)
{
var document = new Aspose.Pdf.Document(fileStream);
Aspose.Pdf.HtmlSaveOptions htmlOptions = new Aspose.Pdf.HtmlSaveOptions();
htmlOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;
htmlOptions.LettersPositioningMethod = HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
htmlOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
using (var output = new MemoryStream())
{
document.Save(output, htmlOptions);
var html = Encoding.UTF8.GetString(output.GetBuffer(), 0, (int)output.Length);
return html;
}
}
We hope this will be helpful. Please feel free to contact us if you need any further assistance.
Thanks, @Farhan.Raza,
I tried your example but getting exception on run.
System.ApplicationException: ‘Inconsistent saving options detected : ‘CustomStrategyOfCssUrlCreation’,‘CustomCssSavingStrategy’,‘CustomResourceSavingStrategy’ may not be null when requested saving to stream!’
Please help.
Thanks,
Mukesh Singh
@msingh02
Thank you for the details.
We have investigated the scenario and would like to update you that converting PDF to HTML is possible only with using saveOptions.CustomHtmlSavingStrategy
.
Change the saving method document.save(resultOutputStream, saveOptions); as following:
// document.save(resultOutputStream, saveOptions);
final ByteArrayOutputStream _outputHtmlStream = resultOutputStream;
saveOptions.CustomHtmlSavingStrategy = new HtmlSaveOptions.HtmlPageMarkupSavingStrategy()
{
public void invoke(HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo)
{
savingToStream(htmlSavingInfo, _outputHtmlStream);
}
};
String outHtmlFile = System.getProperty("java.io.tmpdir");//Use any directory that exist.
document.save(outHtmlFile, saveOptions);
And add the savingToStream
mathod:
private static void savingToStream(HtmlSaveOptions.HtmlPageMarkupSavingInfo htmlSavingInfo, ByteArrayOutputStream stream)
{
/*Byte*/
byte[] resultHtmlAsBytes;
try
{
resultHtmlAsBytes = new /*Byte*/byte[(htmlSavingInfo.ContentStream.available())];
htmlSavingInfo.ContentStream.read(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);
stream.write(resultHtmlAsBytes, 0, resultHtmlAsBytes.length);
} catch (IOException e)
{
e.printStackTrace();
}
}
We hope this will resolve the problem you are currently facing. Please let us know if you need any further assistance.
@Farhan.Raza Looks like this program is for java. I need a C# example.
Thanks
Mukesh Singh