We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Is there a way to preserve html document structure?

Hi. I have a sample code that imports and exports a .html file:

var loadOptions = new LoadOptions
{
ResourceLoadingCallback = new ResourceLoadingCallback()
};

var document = new Document(“source.html”, loadOptions);
document.Save(“target.html”, SaveFormat.Html);
public sealed class ResourceLoadingCallback : IResourceLoadingCallback
{
public ResourceLoadingAction ResourceLoading(ResourceLoadingArgs args)
{
return args.ResourceType == ResourceType.Image ? ResourceLoadingAction.Skip : ResourceLoadingAction.Default;
}
}
Everything works fine, but if you open source and target files with a text editor, you will see that their structures are quite different. Is there a way to tell Aspose Words to preserve html structure of a source file in target file as much as possible?

P.S. I’m using Aspose Words for .NET 14.4.0.0

Hi Artem,

Thanks for your inquiry. Aspose.Words supports importing and exporting
HTML based documents. You can load such documents in the Document Object Model,
edit and add new content and convert them to any supported format such as DOCX,
PDF, Image etc.

Please note that Aspose.Words mimics the same behavior as MS Word does. If you open and save the html document using Aspose.Words, the output document will be converted with high-fidelity. The layout of input and output will be same. If you open the input and output documents in browser or MS Word, the layout of both document will be same.

Moreover, upon processing HTML, some features of HTML might be lost. You can find a list of limitations upon HTML exporting/importing here:

http://www.aspose.com/docs/display/wordsnet/Save+in+the+HTML+%28.HTML%2C+.XHTML%2C+.MHTML%29+Format
http://www.aspose.com/docs/display/wordsnet/Save+in+the+HTML+%28.HTML%2C+.XHTML%2C+.MHTML%29+Format