Emails received have "?" sign at the beginning

Hi,

in a company we are using Aspose.Email java for sending emails from our app.

Lately we have a problem with (most likely) SmtpClient and/or MailMessage classes,
basically I want to send an email which I compose from generated html (using Apose.Words), and then send it.
Problem is that while the generated HTML shows no signs of issues and basically is clean, after the email is received it contains “?” sign in the beginning right before … tag, basically this:

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">?<html><head>

instead of

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252"><html><head>

One thing is that we force UTF-8 charset when setting MailMessage using

content.setSubjectEncoding(UTF8);
content.setBodyEncoding(UTF8);
content.setPreferredTextEncoding(UTF8);

But it arrives with Windows-1252.

I’ve debugged the code and on each line where the email body is set or somehow manipulated I never seen “?”, and because I cannot debug (obviously) the Aspose classes I assume the issue is somewhere inside them.

public void sendMessage(EmailMessage message) throws DAFException {
SmtpClient client = new SmtpClient();

  try {
  	client.setHost(getGridProperties().getEmailSmtpHost());
  	client.setPort(getGridProperties().getEmailSmtpPort());
  	client.setUsername(getGridProperties().getEmailSmtpUsername());
  	client.setPassword(getGridProperties().getEmailSmtpPassword());
  } catch (Exception fail) {
  }
  try {
  	MailMessage content = new MailMessage();

  	if (getGridProperties().isEmailFromAllowOverride() && message.getFrom() != null && message.getFrom().contains("@")) {
  		content.setFrom(getMailAddress(message.getFrom(), message.getFromName()));
  	} else {
  		content.setFrom(getMailAddress(getGridProperties().getEmailFromAddress(), getGridProperties().getEmailFromDisplayName()));
  	}
  	content.setTo(
  			new MailAddressCollection() {{
  				add(message.getTo());
  			}}
  			);
  	content.setSubject(message.getSubject().replaceAll("\\<.*?>", ""));
  	content.setBody(message.getBody().replaceAll("\\<.*?>", ""));
  	content.setHtmlBody(message.getHtmlBody());

  	content.setSubjectEncoding(java.nio.charset.StandardCharsets.UTF_8);
  	content.setBodyEncoding(java.nio.charset.StandardCharsets.UTF_8);
  	content.setPreferredTextEncoding(java.nio.charset.StandardCharsets.UTF_8);

  	client.send(content);
  } catch (Exception fail) {
  }
  } finally {
  	if (client != null) {
  		client.dispose();
  	}
  }

}

I’ve removed some code unrealted to code manipulation like loggers, validation checks and such and replaced constants with magic Strings
the getGridProperties() call is just to get proper data from our application server.

I forgot about important thing, I’ve also tested adding the HTML directly instead of it being generated and the “?” wa still there when email was received.

@Jan89,

I have observed the issue shared by you and request you to please first try using latest Aspose.Email for Java 19.7 on your end. In case the issue still persist then share the working sample example along with source file (if any) reproducing the issue on your end that we may try to reproduce on our end.

Thank you for your response, I’ve tested it with Aspose.Email 19.7 but the issue is still there.

Can you tell me what exactly do you need for your investigation? I’ve posted the code we are using in my original post so perhaps something else?

@Jan89,

Can you please source email file which you used on your end so that we may further investigate to help you out.

beforeAndAfterFiles.zip (1.6 MB)

I’ve added .zip file with one .txt and one .html files, these represent the data we use in
MailMessage content = new MailMessage(); where content is content.setHtmlBody(message.getHtmlBody());

the “message” is just POJO, getHtmlBody() returns String, however the String is exactly what I send in those two files, no ‘?’ added to it anywhere in method until
client.send(content); where client is SmtpClient client = new SmtpClient();
which I obviously cannot debug.

I’ve also attached email as it arrives already with broken html.
We use Aspose.Words for file conversion/s but this seems to not cause the problem as the html I’ve sent is being generated before the POJO/ htmlBody field even receives it and is correct.

@Jan89,

I have worked with sample code shared by you. Can you please share complete working sample project. The code snippet you have shared includes some undeclared variables so would you please share SSCCE code reproducing the issue so that we may try to reproduce and investigate it in our environment.

Here is pastebin with code https://pastebin.com/uF0wUR5M

I’ve noticed one thing though, there is deprecated method in SmtpClient - setEncoding(),
when I use this like this:

.setEncoding(java.nio.charset.StandardCharsets.UTF_8);

and together with the HTML hardcoded the “?” is not showing anymore.

But it still shows the “?” when I just parse generated HTML (identical to one I sent before or the one in the pasteBin code) using getter method from EmailMessage class which returns String and nothing else…

@Jan89,

I have worked with your sample code for further investigation can you please share temporarily the test account credentials so that we may help you out.

Unfortunately I cannot give you our test email server credentials, not only it is accessible only with VPN but I am also not allowed to share them.

@Jan89,

I can understand the limitation on your end. We have created an issue with ID EMAILJAVA-34573 as investigation on our end to possibly reproduce and resolve the issue on our end. This thread has been linked with the issue so that you may be notified once the issue will be fixed.

1 Like

Hi there!
Like Jan89, I have been tasked with this problem (an unknown symbol that appears in the body of an email message).
What I managed to find out. There is a class com.aspose.words.Document. This class has the
Document.save (ByteArrayOutputStream output, SaveOptions options) method. If options.type = “text / html”, then three bytes with negative values [-17, -69, -65, …] appear at the beginning of the output array. Since each of these bytes supposed be converted into a character code, negative value ​​produces a nonexistent symbol, which appears at the very beginning of the outgoing message. To eliminate this problem I had to write a patch where I remove these wrong bytes from the array. Perhaps this will tell you where to look for a solution to the problem.

@Jan89,

We have investigated the issue on our end. Actually, the BodyEncoding must be set before of body’s value, for example:

MailMessage content = new MailMessage();

// !!!Set Message encoding settings!!!
content.setSubjectEncoding(java.nio.charset.StandardCharsets.UTF_8);
content.setBodyEncoding(java.nio.charset.StandardCharsets.UTF_8);
content.setPreferredTextEncoding(java.nio.charset.StandardCharsets.UTF_8);

content.setFrom(getMailAddress(message.getFrom(), message.getFromName()));
content.setTo(
    new MailAddressCollection() {{
        add("jan.galis@infor.com");
    }}
);

content.setSubject("email setSubject test".replaceAll("\\<.*?>", ""));
content.setBody("email setBody test".replaceAll("\\<.*?>", ""));
content.setHtmlBody("<html><head><meta http-equiv=\"Content-Type\" content=\"text...");

client.send(content);

I hope the shared information will be helpful.

@Arty,

Can you please also consider the above suggestions on your end.

I did what you suggested. On the one hand, it really works. Now the message body is encoded correctly and unreadable characters are not displayed. On the other hand, this replacement does not eliminate the problem of the appearance of negative values ​​in the output array. There are still three negative elements at the beginning. So, when I put different ecoding in content.setBodyEncoding() i.e. ISO_8859_1, an email still comes with wrong symbols in HTML code.

@Arty,

I have observed your comments and request you to please share the source file, working sample code and snapshot of issue incurring on your end. Please share the requested information so that we may investigate that further on our end.

Hi!
Excuse me for the delay. Finally, I found an chance to sketch out a JUnit test, where you can track the appearance of negative bytes in the output array. Please find the archive attached.
As you can see, I use a very simple document as a template. When the encoding setting is UTF_8 or UTF_16, negative bytes appear at the beginning of the output array. Why? Whatever HtmlSaveOptions.encoding has been set up, there should not be any symbols before tag at the result. Right?
Explain to me please how to manage Aspose.Word to convert doc or docx files correctly and get clear HTML code without any weird symbols.

AsposeWordTest.zip (197.4 KB)

@Arty,

We have tested the scenario and have managed to reproduce the same issue at our side. For the sake of correction, we have logged this problem in our issue tracking system as WORDSJAVA-2193. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

@Arty,

Thanks for your patience. It is to inform you that the issue which you are facing is actually not a bug in Aspose.Words. So, we have closed this issue (WORDSJAVA-2193) as ‘Not a Bug’.

The weird symbols are just Byte Ordering Mark of Unicode encoding which is not prohibited by Unicode specification. Please check the detail below.

Negative bytes stand for BOM - Byte Order Mark.

If you specify UTF-8 before saving the document (e.g. htmlOptions.setEncoding(StandardCharsets.UTF_8) ) you’ll see [-17, -69, -65] bytes in the resultant array (e.g. outputArr = output.toByteArray() ). These three bytes are [0xEF, 0xBB, 0xBF] in hex. Please note that Java doesn’t support unsigned byte type. So in order to translate -17 to 0xEF you need to add 256 , like this: -17 + 256 = 239 = 0xEF .

If you save the HTML in UTF-16 Big Endian you’ll get two leading negative bytes [-2, -17] . Which are [0xFE, 0xEF] in hex. In case of UTF-16 Little Endian you’ll see [0xFF, 0xFE] , etc.

Obviously, if you specify htmlOptions.setEncoding(StandardCharsets.ISO_8859_1); you will not get negative bytes at the beginning of the output array.

As for Aspose.Words for Java, it provides as much as possible information about certain encoding when saving text in Unicode. This is why the library adds BOM in case of every Unicode encoding but omits BOM for other encodings.

It depends on target encoding.

As already said, extra characters do not appear if you do not specify a Unicode encoding.

But if you would like to save the HTML in Unicode and get rid of BOM you may remove extra bytes manually. BOM forms are depicted in the table under When a BOM is used, is it only in 16-bit Unicode text? section of official documentation. You can just ignore leading bytes from array, e.g. copy the rest of the array without BOM.

Otherwise, you may use java.lang.String#String(byte[], java.nio.charset.Charset) constructor.

If you save the HTML in UTF-8, e.g. htmlOptions.setEncoding(StandardCharsets.UTF_8); then you should read the byte array as UTF-8, e.g. String outputString = new String(outputArr, StandardCharsets.UTF_8) but not String outputString = new String(outputArr, StandardCharsets.ISO_8859_1) . Unfortunately, in your JUnit test you did:

htmlOptions.setEncoding(StandardCharsets.UTF_8); … String outputString = new String(outputArr, “ISO-8859-1”);

So you got <html> in output string due to mismatch of the encodings.