Text in Textareas Hidden, Embedded CSS

We use Aspose.Words to convert HTML documents to Word documents and then convert them to PDFs. We recently upgraded from Aspose.Words 14.3.0.0 to 18.8.0.0.

We noticed that in the older version embedded CSS was ignored for the most part. Once upgrading to the new version, the embedded CSS is now applied during the conversion. The documents we are converting are typically very old HTML docs and we really are not concerned with the CSS. The text content of the document is more important.

The upgrade has caused a lot of the HTML documents to render using the embedded styles, which in our case, make fonts larger. This forces some of the text in textarea controls to wrap to new lines that are not in the viewable area of the textarea. When the conversion is done those pieces of text end up being hidden in final PDF.

This leads me to two questions:

  1. Is there an option that tells Aspose.Words to ignore embedded CSS when converting an HTML document?

  2. Is there an option in Aspose.Words that will resize textareas so that all of the text inside of the textarea is visible after the conversion?

I could not seem to find either of these options and need to be sure they don’t exist before I can start exploring other solutions. Thanks.

@puremass

Thanks for your inquiry. To ensure a timely and accurate response, please attach the following resources here for testing:

  • Your input HTML document.
  • Please attach the output Word file that shows the undesired behavior.
  • Please attach the expected output Word file that shows the desired behavior.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

The output is actually being saved as PDF not a Word Document. We are using the Aspose.Words SaveOptions to save as a PDF.

The uploaded zip file contains the PDF that was generated using Aspose.Words 14.3, the PDF that was generated using Aspose.Words 18.8, and the source HTML file.

ResourceFiles.zip (146.7 KB)

@puremass

You cannot ignore the CSS while importing HTML into Aspose.Words’ DOM. However, you can reset the text formatting to default paragraph formatting using ParagraphFormat.ClearFormatting method. The default paragraph formatting is Normal style, left aligned, no indentation, no spacing, no borders and no shading. You can also format the text according to your requirement. Please read the following article.
Specifying Formatting

In this case, we suggest you please use the TextBox.FitShapeToText property as shown below to get the desired output. Please check the attached output PDF. 19.3.pdf (66.8 KB)

Document doc = new Document(MyDir + "InputHtmlFile.html");
foreach (Shape shape in doc.GetChildNodes(NodeType.Shape, true))
{
    if (!shape.HasImage)
    shape.TextBox.FitShapeToText = true;
}
doc.Save(MyDir + "19.3.pdf");