TEXT file to PDF in memory - getting empty results

BryceKK · November 27, 2018, 12:27am

Hey, I’m trying to convert a text file to PDF, and no matter what I do, I get an empty file as the result. however this implementation does work for .docx files. Code below (in Scala):

// docInputStream is an inputStream type representation of the file I want to convert
        val fileFormat = FileFormatUtil.detectFileFormat(docInputStream).getLoadFormat
        val loadOptions = new LoadOptions()
        val charSet = Charset.forName("GB2312")
        loadOptions.setLoadFormat(fileFormat) //I have verified that I get 62 for text files here
        loadOptions.setEncoding(charSet)
        val doc = new Document(docInputStream, loadOptions)
        val pdfStream = new ByteArrayOutputStream()
        doc.save(pdfStream, SaveFormat.PDF)

Previously I was not using LoadOptions at all and it worked for docx, but I’ve added it in an attempt to make this work with text files. I’m using aspose-words-18.10-jdk16.jar

I’ve attached the file that I’m attempting to convert here: 2B640C97-901E-455B-9CDA-6A3DDC62BF94.zip (829 Bytes)

and here is the output that I’m getting: 2B640C97-901E-455B-9CDA-6A3DDC62BF94 (1).zip (1.2 KB)

mannanfazil · November 27, 2018, 12:04pm

@BryceKK
Thanks for your inquiry. Please use the following modified code to get the desired output. Hope this helps you.

val loadOptions = new LoadOptions();
val charSet = Charset.forName(“GB2312”);
loadOptions.setLoadFormat(LoadFormat.TEXT);
loadOptions.setEncoding(charSet);

val doc = new Document(docInputStream, loadOptions);
val pdfStream = new ByteArrayOutputStream();
// Save document to stream
doc.save(pdfStream, SaveFormat.TEXT);

// Get document bytes
byte[] docBytes = pdfStream.toByteArray();
// Now recreate document from byte array
val docInStream = new ByteArrayInputStream(docBytes);
// Create document from stream
val outDoc = new Document(docInStream);
// Save document
outDoc.save("D:\\Temp\\output.pdf");

BryceKK · November 27, 2018, 1:05pm

Hey, thanks for responding @mannanfazil .

Unfortunately, this doesn’t resolve my problem, as I say, I need to do this all in memory - no local storage of files. This is because I have this operation running on AWS Lambda.

The next lines of code involve writing to s3, which requires me to have the PDF version of the document as a byte array (I used pdfStream.toByteArray)

So I need a way to convert using Document.save(OutputStream, SaveOptions) or Document.save(OutputStream, SaveFormat)

Is there a way to convert .txt to .pdf in memory (without saving any local files)? It seems Document.save should be able to do it with an OutputStream and the correct second parameter, but I’m only getting blank .txt files back.

mannanfazil · November 27, 2018, 11:49pm

@BryceKK

Thanks for your inquiry. We are working on your query and will get back to you soon.

mannanfazil · November 28, 2018, 7:02am

@BryceKK

Thanks for your patience. Please check the following code, it is working properly and retuning Stream Size: 46282.

InputStream docInputStream = new FileInputStream("D:\\Temp\\2B640C97-901E-455B-9CDA-6A3DDC62BF94");

LoadOptions loadOptions = new LoadOptions();
Charset charSet = Charset.forName("GB2312");
loadOptions.setLoadFormat(LoadFormat.TEXT); //I have verified that I get 62 for text files here
loadOptions.setEncoding(charSet);

Document doc = new Document(docInputStream, loadOptions);

ByteArrayOutputStream pdfStream = new ByteArrayOutputStream();
// Save document to stream
doc.save(pdfStream, SaveFormat.PDF);
System.out.println("Stream Size: " + pdfStream.size());

// Just to make sure that pdfStream populated by Aspose.Words is actually not empty
OutputStream outputStream = new FileOutputStream("D:\\Temp\\awjava-18.11.pdf");
pdfStream.writeTo(outputStream);

BryceKK · December 6, 2018, 11:28pm

@mannanfazil

Thanks for the help! It appears that there was no issue with how I was doing the converting, which is why yours looks about the same as mine, but the key here is that I was testing the file type, and your solution doesn’t. In testing the filetype with FileFormatUtil, I was reading the stream, meaning that when it came time to convert, I was just converting an empty stream.

So probably this won’t be a useful topic for others, but if anyone is getting empty files, my advice is to check that they’re only reading the stream once. My eventual solution was to fetch the first few bytes of the stream, determine the file type, and then if it was valid, fetch the whole stream separately and convert.

Thanks again!

Bryce

mannanfazil · December 7, 2018, 9:53am

@BryceKK

Thanks for your feedback and it is great you were able to find what you were looking for. Moreover, please refer to the following article about using FileFormatUtil class. Hope, this helps.

FileFormatUtil Class
Please let us know any time you have any further queries.