Extra text added when convert eml to text

I am trying to extract text from an eml file using the Aspose Word API. While doing that, I get an extra line on the top of the text file which contains the From Address. Please let me know how to remove this additional line at the top of the text file.
Here is the code I used

Aspose.Email.MailMessage message = Aspose.Email.MailMessage.Load(emailPath);
message.Save(outputFilePath, SaveOptions.DefaultMhtml);

Aspose.Words.Document textDocumnents = new Aspose.Words.Document(outputFilePath);
textDocumnents.Save(textFilePath, Aspose.Words.SaveFormat.Text);

4.Extra text highlighted image.png (38.3 KB)Aspose support.7z (52.0 KB)

@gmurugesan can you please attach an example of the input file that you are using?

thanks for the response, I added zip file that has the needed files

Thanks for the extra information.

@gmurugesan I cannot reproduce the problem, I am using version 23.2.0 of Aspose.Words and version 23.1.0 of Aspose.Email. I recommend upgrading to the latest API version, but if that is not an option, you can simply remove the first empty paragraph manually:

Aspose.Email.MailMessage message = Aspose.Email.MailMessage.Load("C:\\Temp\\input.eml");
message.Save("C:\\Temp\\input.mht", Aspose.Email.SaveOptions.DefaultMhtml);

Document webDoc = new Document("C:\\Temp\\input.mht");
MemoryStream stream = new MemoryStream();

// Saving the document to a stream instead of to the disk to be able to clear the first empty line
webDoc.Save(stream, SaveFormat.Text);

Document textDoc = new Document(stream);

if (string.IsNullOrEmpty(textDoc.FirstSection.Body.FirstParagraph.Range.Text.Trim()))
{
    textDoc.FirstSection.Body.FirstParagraph.Remove();
}

textDoc.UpdatePageLayout();
textDoc.Save("C:\\Temp\\output.txt", SaveFormat.Text);

From your code i understand that you are removing the empty space
if (string.IsNullOrEmpty(textDoc.FirstSection.Body.FirstParagraph.Range.Text.Trim()))
{
textDoc.FirstSection.Body.FirstParagraph.Remove();
}
but i want to remove the from mail address at top

Is there any way to remove particular tag loaded to the Document

@gmurugesan I see, sorry for the confusion, in that case you don’t need to use Aspose.Email at all, you can do all the process using Aspose.Words:

Aspose.Words.Document webDoc = new Aspose.Words.Document("C:\\Temp\\input.eml");
webDoc.Save("C:\\Temp\\output.txt", Aspose.Words.SaveFormat.Text);

Output.zip (1.2 KB)

that output file you added don’t have from to mail information’s in it.

Is there any way to remove particular tag loaded to the Document

@gmurugesan you can use the Replace method to replace the text with email address format, and set a callback to check if the paragraph is the correct, but since for your particular situation you just want to remove the single email occurrence I recommend you to use the following code:

Aspose.Email.MailMessage message = Aspose.Email.MailMessage.Load("C:\\Temp\\input.eml");
message.Save("C:\\Temp\\input.mht", Aspose.Email.SaveOptions.DefaultMhtml);

Aspose.Words.Document webDoc = new Aspose.Words.Document("C:\\Temp\\input.mht");
MemoryStream stream = new MemoryStream();
webDoc.Save(stream, SaveFormat.Text);

Document textDoc = new Document(stream);

RemoveUnnecesaryFirstParagraphs(textDoc.FirstSection);

textDoc.UpdatePageLayout();
textDoc.Save("C:\\Temp\\output.txt", Aspose.Words.SaveFormat.Text);
private static void RemoveUnnecesaryFirstParagraphs(Section section)
{
    var continueCheck = true;
    var reg = new Regex(@"^\w+([\.-]?\w+)*@\w+([\.-]?\w+)*(\.\w{2,3})+$");
    while(continueCheck)
    {
        var text = section.Body.FirstParagraph.Range.Text.Trim();
        if (string.IsNullOrEmpty(text) || reg.IsMatch(text))
        {
            section.Body.FirstParagraph.Remove();
        }
        else
        {
            continueCheck = false;
        }
    }
            
}

output.zip (868 Bytes)

Thanks for support

1 Like