Greetings!
I have issue in conversion of PDF files containing forms to HTML. I tried 2 ways of doing it, but didn’t got a convinient result. Could you, please, help me? Am I doing somethinf wrong or there’s some kind of bug in Aspose conversion?
Source file for both examples:
1003 - Unmarried Addendum.pdf (69.1 KB)
First try - direct conversion with Aspose.PDF:
Result file:
1003 - Unmarried Addendum_pdf.7z (434.9 KB)
using (var document = new Aspose.Pdf.Document("1003 - Unmarried Addendum.pdf"))
{
var htmlOptions = new Aspose.Pdf.HtmlSaveOptions
{
SaveFullFont = true,
DocumentType = Aspose.Pdf.HtmlDocumentType.Html5,
UseZOrder = true,
AntialiasingProcessing = Aspose.Pdf.HtmlSaveOptions.AntialiasingProcessingType.TryCorrectResultHtml,
CompressSvgGraphicsIfAny = true,
FontSavingMode = Aspose.Pdf.HtmlSaveOptions.FontSavingModes.SaveInAllFormats,
TrySaveTextUnderliningAndStrikeoutingInCss = true,
TryMergeAdjacentSameBackgroundImages = true,
LettersPositioningMethod = Aspose.Pdf.HtmlSaveOptions.LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss,
SaveShadowedTextsAsTransparentTexts = false,
SaveTransparentTexts = true,
PartsEmbeddingMode = Aspose.Pdf.HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml,
ConvertMarkedContentToLayers = true,
FixedLayout = false,
RasterImagesSavingMode = Aspose.Pdf.HtmlSaveOptions.RasterImagesSavingModes.AsPngImagesEmbeddedIntoSvg,
HtmlMarkupGenerationMode = Aspose.Pdf.HtmlSaveOptions.HtmlMarkupGenerationModes.WriteAllHtml,
};
// Save the output HTML
document.Save("output.html", htmlOptions);
}
This code results in total loss of styles and some strage artifact that propaply sould be parts of selected inputs in PDF - but no inputs itsef.
Second try - converting PDF file to DOCX, and then converting it to HTML
Result file:
1003 - Unmarried Addendum.7z (30.7 KB)
using (var document = new Aspose.Pdf.Document("1003 - Unmarried Addendum.pdf"))
{
// Instantiate DocSaveOptions object
Aspose.Pdf.DocSaveOptions savePdfOptions = new Aspose.Pdf.DocSaveOptions
{
Format = Aspose.Pdf.DocSaveOptions.DocFormat.DocX,
Mode = Aspose.Pdf.DocSaveOptions.RecognitionMode.Flow,
TryMergeAdjacentSameBackgroundImages = true,
//RelativeHorizontalProximity = 1f,
RecognizeBullets = true,
ReSaveFonts = true,
ConvertType3Fonts = true,
};
document.Save(temporaryFileOut, savePdfOptions);
}
Aspose.Words.Document doc = new Aspose.Words.Document(temporaryFileOut);
// Set different properties of HtmlSaveOptions class
Aspose.Words.Saving.HtmlSaveOptions saveOptions = new Aspose.Words.Saving.HtmlSaveOptions();
saveOptions.AllowNegativeIndent = true;
saveOptions.DmlEffectsRenderingMode = DmlEffectsRenderingMode.Fine;
saveOptions.DmlRenderingMode = DmlRenderingMode.Fallback;
saveOptions.ImlRenderingMode = ImlRenderingMode.Fallback;
saveOptions.ExportPageSetup = true;
saveOptions.CssStyleSheetType = CssStyleSheetType.Inline;
saveOptions.ExportPageMargins = true;
saveOptions.ImageResolution = 90;
saveOptions.ExportImagesAsBase64 = true;
saveOptions.ExportFontsAsBase64 = true;
saveOptions.ExportDocumentProperties = true;
saveOptions.ExportHeadersFootersMode = ExportHeadersFootersMode.PerSection;
saveOptions.HtmlVersion = Aspose.Words.Saving.HtmlVersion.Html5;
saveOptions.AllowEmbeddingPostScriptFonts = true;
saveOptions.Encoding = Encoding.UTF8;
saveOptions.ExportFontResources = true;
saveOptions.ExportListLabels = ExportListLabels.ByHtmlTags;
saveOptions.ExportShapesAsSvg = true;
saveOptions.PrettyFormat = true;
// Save output HTML
doc.Save("result.html", saveOptions);
Ok. Everything is not so bad here - formatting and styling is ok, but all form elements and moved from positions where they shoud be.
Could you please advise, if there’s anything I’m missing in my code?
Thank you!