Incorrect conversion of message with html table to txt

Hello
We’ve encountered a problem with txt body of an incoming message with html table in body while. Problem was encountered with Aspose.Email 22.10.0 for .Net
If html is formatted like this:

<table>
    <tbody>
                    <tr>
                        <td>
                            <b>ФИО</b>:
                        </td>
                        <td>
                            <span class="name">XXXXX X.X.</span>
                        </td>
                    </tr>

then spaces used to format html get to text in message.Body string.
I’am attaching the zip file HTML_tables_formatting_problem.zip (2.1 KB)
containing HTML file that we used to replicate problem, problematic txt that contains what we got as an message.Body output, and a txt that contains expected message.Body output. Expected txt can also be created via opening html with internet browser (we used chrome) and copying text to notepad by hand.

To be clear, if html file does not have spaces or TABs, then text output is correct, so

<table>
<tbody>
<tr>
<td>
<b>ФИО</b>:
</td>
<td>
<span class="name">XXXXX X.X.</span>
</td>
</tr>

should result in correct text output.

@directum

To ensure a timely and accurate response, please attach the following resources here for testing:

  • Your input message file.
  • Please create a standalone console application (source code without compilation errors) that helps us to reproduce your problem on our end and attach it here for testing.

As soon as you get these pieces of information ready, we will start investigation into your issue and provide you more information. Thanks for your cooperation.

PS: To attach these resources, please zip and upload them.

Here is zip with html that was used in input message and console app to reproduce the problem.
AsposeEmail.zip (4.2 KB)

@directum

We suggest you please read the following article to achieve your requirement.
Export EMAIL to TEXT via .NET

Following code example shows how to use Aspose.Email to convert EML to HTML and Aspose.Words to convert HTML to TXT.

MailMessage message = MailMessage.Load(MyDir + "testMessage.eml");
message.Save(MyDir + "HtmlOutput.html", Aspose.Email.SaveOptions.DefaultHtml);

Document document = new Document(MyDir + "HtmlOutput.html");
TxtSaveOptions txtSaveOptions = new TxtSaveOptions();
txtSaveOptions.PreserveTableLayout = true;
document.Save(MyDir + "output.txt", txtSaveOptions);

Hello, we’ve tried this solutuion, but the option you suggested does not work for us. The table that we get using Aspose.Words doesn have formating that we expect.
We have also tried using Aspose.Cells and were able to get table format that we want by using following settings:

Aspose.Cells.TxtSaveOptions txtSaveOptionsCells = new Aspose.Cells.TxtSaveOptions
{
  FormatStrategy = CellValueFormatStrategy.CellStyle,
  QuoteType = TxtValueQuoteType.Never,
  Separator = '\t'
};
Aspose.Cells.HtmlLoadOptions htmlLoadOptionsCells = new Aspose.Cells.HtmlLoadOptions
{
  ConvertNumericData = false,
  DeleteRedundantSpaces = true
};
var workbook = new Workbook(testHTMLpath, htmlLoadOptionsCells);
workbook.Save(MyDir + "outputCells.txt", txtSaveOptionsCells);

But using this method we encountered another problem, althought the table we get is fine text that is outside table gets messed up. We could probably use both Aspose.Cells and Aspose.Words, but this is not ideal. Is there any other solution so we could get table conversion output simular to Aspose.Cells, and the rest of text converted correctly?

Hello, @directum

It seems that email body conversion is incorrect. Aspose.Email inserts extra spaces related to HTML formatting that shouldn’t be there.We have created the EMAILNET-40874 ticket to fix it.

Thanks, and sorry for the inconvenience.