RTF To PDF conversion leads to junk output

vedjaipraful · May 17, 2022, 8:48am

Hello.
We are using JAVA Aspose.Words 20.12 in order to perform conversions of different office file formats to pdf.
Problem : For RTF files, sometimes it happens that the output file consists of all junk - data which seems to be internal representation of the rtf format.
This is happening randomly when we are running the code in a container on linux. On trying to check the same for the windows standalone program, we have not observed this.
While creating the new document we are using the constructor with 2 parameters in order to load the options needed. The load options are to manage the skipping of the external URL resources (implementation for com.aspose.words.IResourceLoadingCallback)

For RTF is there anything specific to be done ?

Attached is the original RTF file and its PDF conversion.

Again this happens randomly on our Linux environment, but some other times also works.
RTFToPDF.zip (1.6 MB)

alexey.noskov · May 17, 2022, 1:15pm

@vedjaipraful I have tested conversion on my side and RTF is converted properly. I have used a simple code for testing:

FileFormatInfo info = FileFormatUtil.detectFileFormat("/temp/RTF File.rtf");
System.out.println(info.getLoadFormat());

Document doc = new Document("/temp/RTF File.rtf");
doc.save("/temp/out.pdf");

And simple Dockerfile:

FROM openjdk:17
COPY ./out/artifacts/TestJava_jar/ /tmp
WORKDIR /tmp
ENTRYPOINT ["java","-jar","TestJava.jar"]

I have analyzed the PDF document you have attached, and noticed that it’s content is part of RTF internal representation. The problem might occur because you load your document from stream and stream position is shifted from the beginning of the document. In this case Aspose.Words detects load format of the document as TXT and you see internal RTF representation (part of it) in your output PDF.

vedjaipraful · May 17, 2022, 2:00pm

Hello @alexey.noskov
Thanks for the response.
Load of the document is not from stream. We load from path. We give the path where the source file is present.

Here is the code snippet - sourceFilePath is the absolute path of the source rtf file

doc = new Document(sourceFilePath, AsposeConfig.getWordsLoadOptions());
doc.setFontSettings(AsposeConfig.getWordsFontSettings());

warningCallbackWords = new WarningCallbackWords();
doc.setWarningCallback(warningCallbackWords);

targetFilePath = (outputDir + fileNameWithoutExt + ".pdf");
doc.save(targetFilePath);
LOGGER.debug("DOC Conversion - preview save done");

alexey.noskov · May 17, 2022, 2:06pm

@vedjaipraful Could you please also share the source code of AsposeConfig.getWordsLoadOptions()? Also, please check what load format FileFormatUtil detects on your side.

vedjaipraful · May 17, 2022, 2:14pm

@alexey.noskov
Not sure to have completely understood " FileFormatUtil detects on your side". You want me to run some test to check this.

Once again, its only random that this happens. But once it starts happening, then it keeps giving this result. When I restart the container, all goes OK and the conversion is a good one.

The code for AsposeConfig.getWordsLoadOptions()

wordsLoadOptions = new com.aspose.words.LoadOptions();
wordsLoadOptions.setResourceLoadingCallback(new HandleResourceLoadingWords());

    public static com.aspose.words.LoadOptions getWordsLoadOptions()
    {
        return wordsLoadOptions;
    }

alexey.noskov · May 17, 2022, 3:25pm

@vedjaipraful

Yes, the idea was to check what load format is used by Aspose.Words to load the document.

Unfortunately, it is difficult to say what is going wrong without an ability to reproduce the problem. I can only guess what can cause the problem. Probably you can use FileFormatUtil to detect when the problem occurs and somehow isolate it.

vedjaipraful · May 17, 2022, 4:05pm

@alexey.noskov
Can u please help understand more on FileFormatUtil.
I am not sure to understand if it is some tool to be run? Or some code checks to be done to identify something ?
I can still reproduce the issue in one of my snadboxes.

Could u share some documentation link for the same ?

alexey.noskov · May 17, 2022, 4:53pm

@vedjaipraful FileFormatUtil is Aspose.Words class that allows to detect original file format. You can use the following simple code to detect file format of the file:

FileFormatInfo info = FileFormatUtil.detectFileFormat("/temp/RTF File.rtf");
System.out.println(info.getLoadFormat());