Text encoding is wrong after CSV to PDF conversion

@tahir.manzoor,

I think we have a misunderstanding here.

1.) For TXT file conversion, we have used Aspose.words for Java 21.5.

com.aspose.words.LoadOptions options = new com.aspose.words.LoadOptions();
options.setLoadFormat(com.aspose.words.LoadFormat.TEXT);
Document document = new Document(GZIPInputStream, options);
document.save(outputStream, SaveFormat.PDF);

For text file conversion with Aspose.words, we have the above issue. (For small files, aspose is not auto-detecting the encoding of file.)

2.) For CSV file, we have used Aspose.Cells for Java 21.4 since we need the grid(column and row) structure. For this, encoding is not auto-detecting for the shared files.

TxtLoadOptions options = new TxtLoadOptions(LoadFormat.CSV);
Workbook csvworkbook = new Workbook(GZIPInputStream, options);
csvworkbook.save(saveLocation + fileName, pdfopts);

Kindly help us resolve this issue.
utf-8-encode.zip (766 Bytes)

Regards,
Raz

@dev.raz

Your query is related to Aspose.Cells. So, we have moved it to Aspose.Cells forum where you will be guided appropriately.

@dev.raz,

You need to specify the relevant encoding type while loading the CSV file. See the following sample code for your reference:
e.g.
Sample code:

TxtLoadOptions options = new TxtLoadOptions(LoadFormat.CSV);
//options.setEncoding(Encoding.getUTF8());
//options.setEncoding(Encoding.getUnicode());
options.setEncoding(Encoding.getEncoding("Shift_JIS"));
 
//Workbook csvworkbook = new Workbook("f:\\files\\utf-8-encode.csv", options);
//Workbook csvworkbook = new Workbook("f:\\files\\unicode-encoding.csv", options);
Workbook csvworkbook = new Workbook("f:\\files\\sample-shift-jis.csv", options);
csvworkbook.save("f:\\files\\out1.pdf"); 

Let us know if you still find any issue.

Hi @Amjad_Sahi

If I set encoding, the files are converted correctly. But my csv files can have any different encoding. Is it possible for Aspose.cells to convert those files to PDF with out mentioning the encoding type in code.?

Reagrds,
Raz

@dev.raz,

I am afraid, this is not possible. Please note, since CSV file format is just a text format which does not follow specific standards to evaluate specific encoding type in it. In short, you have to specify the encoding type while using load options for your needs, so you will use TxtLoadOptions class and set your desired encoding type for your CSV file accordingly.

Hi, @Amjad_Sahi

I understand that we need to set encoding for CSV files before converting them to PDF.

As mentioned in this link, Aspose.words can auto recognize the encoding of a file.

  1. Is it possible for Aspose.cells to do the same?
  2. Is there any other API in Aspose that could expose the encoding type of a file so that we don’t need to set the encoding type for every file.?

Regards,
Raz

@dev.raz,

We evaluated it already. A CSV file is just a plain text file and one can use any way and any encoding to create it. It might not be possible to give a solution to handle all kinds of template files. You may specify the encoding for your files by TxtLoadOptions.Encoding. Otherwise the used encoding completely depends on the System, just like you create a StreamReader from a Stream without specifying the encoding.

Anyways, we will further evaluate and discuss it with team internally and let you update here (once available).