Quotes converting to question mark while extracting content from word document

Hi,

I am extracting nodes from uploaded document, creating a document with the extracted nodes and converting it to html. using below code,

ByteArrayOutputStream docStream = new ByteArrayOutputStream();
dstDocument.save(docStream, saveOptions);
String dstHtml = docStream.toString();

I am deploying our war file including aspose jars on a docker container, where it is creating an issue in the converted html.

If the uploaded document contains any smart quotes for single quotes or double quotes, they are getting converted to question marks.

And its happening on the application running inside a docker, where as it is working fine on the application running directly on tomcat on ubuntu(version 18).

I am adding the sample input input.docx (15.9 KB) and output output.docx (16.2 KB).

Is there any setting I have to do while adding aspose in docker ?

Please help me to figure out this issue.

Thank you

@Gptrnt The problem is not in Aspose.Words, but in conversion ByteArrayOutputStream to string. I would suggest you to store HTML or extracted document as a byte array to avoid such problems in future. For example see the following code:

Document dstDocument = new Document("/temp/in.docx");

HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.setExportImagesAsBase64(true);
ByteArrayOutputStream docStream = new ByteArrayOutputStream();
dstDocument.save(docStream, saveOptions);
byte[] dstHtmlBytes = docStream.toByteArray();

ByteArrayInputStream bais = new ByteArrayInputStream(dstHtmlBytes);
Document tempDoc = new Document(bais);
tempDoc.save("/temp/out.docx");

Hi,

I am facing the same issue in my docker container while just taking the text from extracted node. I am taking the text from extracted nodes by below code,

extractedNodes.stream().filter(n->n !=null && (n.getNodeType() == NodeType.PARAGRAPH || n.getNodeType() == NodeType.RUN))
    .map(Node::getText).reduce(String::concat).orElse("");

Please help me fix this as well.

Thank you

@Gptrnt It looks like this is general Java problem on Linux, but not Aspose.Words issue. For example you can observe the same problem by running the following simple code in Linux Docker:

String test = "“jtest” ,ckdf ,’sdhfjsdhf’";
System.out.println(test);

Exactly the same code run on Windows prints correct output.