Doc convert to PDF . Unnecessary question marks appear

Hello :
after convert doc to pdf, I found many “?” appears inside the PDF that not in the doc.
Pls anyone can help to resolve it ?

source file:

pdf:

here is my code

if (suffix == ".docx" || suffix == ".doc")
{
    MemoryStream outputStream = new MemoryStream();
    Aspose.Words.Document doc = new Aspose.Words.Document(instream);
    foreach (Aspose.Words.Section section in doc.Sections)
    {
        if (orientation == 0)
        {
            section.PageSetup.Orientation = Aspose.Words.Orientation.Portrait;
        }
        else
        {
            section.PageSetup.Orientation = Aspose.Words.Orientation.Landscape;
        }
    }
    Aspose.Pdf.PdfSaveOptions saveOptions = new Aspose.Pdf.PdfSaveOptions();
    doc.Save(outputStream, Aspose.Words.SaveFormat.Pdf);
    ret = outputStream;
}

@Ykworm

Seems like your inquiry is related to Aspose.Words. We have moved it to the respective category where you will be assisted accordingly.

1 Like

@Ykworm Could you please attach your input and output documents here for testing? We will check the issue and provide you more information. Unfortunately, it is impossible to analyze the problem using screenshots.

Hi
this is another file have similar problem , the file format is RTF

also convert to PDF

the result of PDF :

segemented from the RTF:

{\fldrslt {\rtlch\fcs1 \af0 \ltrch\fcs0 \lang1024\langfe1024\kerning2\noproof\insrsid3932176 \loch\af0\dbch\af18\hich\f0 \hich\af0\dbch\af18\loch\f0 MR.  LEE, TUNG}}}\sectd \ltrsect\linex0\headery284\footery284\colsx425\endnhere\pgbrdropt32\sectlinegrid360\sectspecifyl\sftnbj {\rtlch\fcs1 \af0 \ltrch\fcs0 \kerning2\insrsid3932176 \hich\af0\dbch\af18\loch\f0  }{\field{\*\fldinst {\rtlch\fcs1 \af0 
\ltrch\fcs0 \kerning2\insrsid3932176 \hich\af0\dbch\af18\loch\f0  MERGEFIELD F_TP_CHN_NAME }}{\fldrslt {\rtlch\fcs1 \af0 \ltrch\fcs0 \lang1024\langfe1024\kerning2\noproof\insrsid3932176 \loch\af0\dbch\af18\hich\f0 \loch\af18\hich\af18\dbch\f18 \uc1\u26446\?\hich\af0\dbch\af18\loch\f0  \loch\af18\hich\af18\dbch\f18 \u26481\?}
}

\u26446 is the unicode (in HTML representation) of 李
\u26481 is the unicode (in HTML representation) of 東

You can find that there is a ‘?’ literal following the two unicode characters. It seems, in RTF standard, it is a presentation of fallback character. Say, for literal \u26446\?, it means if the rendering program cannot find a font presentation of \u26446, it should display a ‘?’

Does anyone know how to resolve it?

@Ykworm Could you please zip and attach the whole problematic RTF document so we can test with it?

#1 for these 2 reason , I can’t upload the test file:

  1. casue of confidentiality agreement .
  2. after modified the file by MSWord , the problem is fixed automitically(“?” is disapear and the text that was not displayed can be displayed using the new file conversion). it means I can’t delete the content that need for confidentiality.

#2 I tired convert the rft to PDF , the “𧙗” is disapeal ,

source rtf :

gened file:

but if I do
step 1: rtf → png
step 2: png → PDF

succeeded !!!

Does anyone know why ?

@Ykworm Unfortunately, it is impossible to tell what might cause the problem on your side without ability to reproduce the problem on our side. And we cannot reproduce the problem without the problematic documents.

IC ,btw :
May I ask , Can the Aspose DOC-PDF library handle the 4-byte character ?

@Ykworm Do you mean surrogate pair characters? Yes, Aspose.Words supports them. If possible, please attach a sample document with problematic characters here for testing. This will hep us to analyze and if require resolve the problem.

I finally edited a file that can display the problem :
b-new.zip (4.4 KB)

We can see if the word “𧙗” can be displayed when opening a file using MSWord (It seems that only Word can display correctly)

after convert to PDF, “𧙗” has been lose.

Pls help to check , thx a lot

@Ykworm
We have opened the following new ticket(s) in our internal issue tracking system and will deliver their fixes according to the terms mentioned in Free Support Policies.

Issue ID(s): WORDSNET-27050

You can obtain Paid Support Services if you need support on a priority basis, along with the direct access to our Paid Support management team.

1 Like

Hi @alexey.noskov any news?

@Ykworm The issue is currently in the queue for analysis. Once analysis is do we will be able to provide you more information.

1 Like

The issues you have found earlier (filed as WORDSNET-27050) have been fixed in this Aspose.Words for .NET 24.8 update also available on NuGet.