Broken PDFs from IOStreams

Hello. I am using Aspose.PDF as part of an automatic word to PDF and Excel to PDF conversion webapp that is used for internal memos and the like.

I am using the following code to create PDF files from those sources. I do not want to save copies of the files to disk, and so instead am using IO.Stream to move data around.

Dim FileID As Guid = Guid.NewGuid()

Dim PostedFile As HttpPostedFile = Me.FI_FileInput.PostedFile

Dim FileData() As Byte

Dim PDFfile As New Aspose.Pdf.Pdf

Dim filestream As New System.IO.MemoryStream

If PostedFile.FileName.EndsWith(".doc") Then

Dim docfile As New Aspose.Words.Document(PostedFile.InputStream)

Dim XMLStream As New System.IO.MemoryStream

docfile.Save(XMLStream, SaveFormat.AsposePdf)

PDFfile.BindXML(XMLStream, Nothing)

PDFfile.Save(filestream)

End If

If PostedFile.FileName.EndsWith(".xls") Then

Dim xlsfile As New Aspose.Cells.Workbook

xlsfile.Open(PostedFile.InputStream)

Dim XMLStream As New System.IO.MemoryStream

xlsfile.Save(XMLStream, SaveFormat.AsposePdf)

PDFfile.BindXML(XMLStream, Nothing)

PDFfile.Save(filestream)

End If

The difficulty is that the filestream generated by this code gives an unusable PDF for the word conversion, and an error for Excel.

Trying to convert a word file gives me a PDF, but Acrobat Reader claims the file is damaged, and won't read it.

Trying to convert an Excel file throws an exception: There is an invalid character in the given encoding. Line 1, position 1.

Do you have any ideas on how to fix this?

Hi,

Can you please provide us with the word and excel documents that you are using for testing so that we can more accurately determine the cause of the problem.

Thanks.

Sure. The word document is attached, and the Excel spreadsheet will be in the next post.

Excel Document attached.

Hi,

I have tested the document that you provided using your code and didn't get any problem with the Word to Pdf conversion. For the Excel to Pdf conversion there is an error in your statement. Please change it to

xlsfile.Save(XMLStream, FileFormatType.AsposePdf)

Please note that SaveFormat is a Aspose.Words enumeration and cannot be used with Aspose.Cells.

I tested with Aspose.Pdf v3.6.0.0 Aspose.Words v4.4.0.0 and Aspose.Cells v4.4.0.0.

Thanks.

Well, I'm glad to get a response, but I am unable to replicate your success.

I downloaded the new versions of the packages, and ran the code above on the files above. Although a PDF is now (after your fix) generated for the Excel file, now both PDFs read as damaged. I have attached a copy of the PDF generated by the excel table for an example.

No matter what I do with the stream after the code I posted, I get errors. Thsi includes saving the file to disk as well as saving to a SQL BLOB.

Any other ideas on how to fix this issue?

Hi,

Can you please tell me what code you used to write the memory stream to file.

Thanks.

Sure, but keep in mind that the ultimate goal here is to save to a SQL database, not a file. The SQL saving code is included after the file saving code.

'File writing to disk, for testing purposes.

Dim drr As New IO.FileStream("C:\test\" + PostedFile.FileName.Substring(PostedFile.FileName.LastIndexOf("\") + 1) + ".pdf", IO.FileMode.OpenOrCreate)

drr.Write(FileData, 0, FileData.Length)

drr.Close()

Also worth including is the code for saving to a SQL database, as follows.

'File writing to SQL Database.

'Get a row schema for t_filestore

Dim FSconn As New SqlConnection(ConnectionString)

Dim FScomm As New SqlCommand

FScomm.Connection = FSconn

FScomm.CommandText = "Select top 1 * from t_FileStore"

Dim FSda As New SqlDataAdapter(FScomm)

Dim FSdt As New DataTable

FSda.Fill(FSdt)

Dim FSdr As DataRow = FSdt.NewRow()

'get a row scheme for t_filereference

Dim FRconn As New SqlConnection(ConnectionString)

Dim FRcomm As New SqlCommand

FRcomm.Connection = FRconn

FRcomm.CommandText = "Select top 1 * from t_FileReference"

Dim FRda As New SqlDataAdapter(FRcomm)

Dim FRdt As New DataTable

FRda.Fill(FRdt)

Dim FRdr As DataRow = FRdt.NewRow()

'Add appropriate values.

FSdr("FileID") = FileID

FSdr("FileTitle") = Me.tb_AddTitle.Text

FSdr("FileContent") = FileData

FSdr("FileName") = PostedFile.FileName.Substring(PostedFile.FileName.LastIndexOf("\") + 1) + ".pdf"

FSdr("FileType") = "application/pdf"

'Add appropriate values.

FRdr("FileID") = FileID

FRdr("Summary") = Me.tb_AddSummary.Text

FRdr("CategoryCode") = Me.ddl_AddCategory.SelectedValue

FRdr("FileDate") = DateTime.Parse(Me.tb_AddDate.Text.Trim)

FRdr("Topic") = Me.tb_AddTopic.Text

FRdr("DocumentNum") = Me.tb_AddDocNumber.Text

'Update t_fileStore.

Dim FSUpdate As New SqlCommand("insert into t_FileStore values(@FileID, @FileTitle, @FileContent, @FileName, @FileType)", FSconn)

FSUpdate.Parameters.Add("@FileID", FSdr("FileID"))

FSUpdate.Parameters.Add("@FileTitle", FSdr("FileTitle"))

FSUpdate.Parameters.Add("@FileContent", FSdr("FileContent"))

FSUpdate.Parameters.Add("@FileName", FSdr("FileName"))

FSUpdate.Parameters.Add("@FileType", FSdr("FileType"))

FSconn.Open()

FSUpdate.ExecuteNonQuery()

FSconn.Close()

'Update t_fileReference.

Dim FRUpdate As New SqlCommand("insert into t_FileReference values(@FileID, @Summary, @CategoryCode, @FileDate, @Topic, @DocumentNum)", FRconn)

FRUpdate.Parameters.Add("@FileID", FRdr("FileID"))

FRUpdate.Parameters.Add("@Summary", FRdr("Summary"))

FRUpdate.Parameters.Add("@CategoryCode", FRdr("CategoryCode"))

FRUpdate.Parameters.Add("@FileDate", FRdr("FileDate"))

FRUpdate.Parameters.Add("@Topic", FRdr("Topic"))

FRUpdate.Parameters.Add("@DocumentNum", FRdr("DocumentNum"))

FRconn.Open()

FRUpdate.ExecuteNonQuery()

FRconn.Close()

I tested with the following code and it works:

Dim FileData() As Byte
Dim PDFfile As New Aspose.Pdf.Pdf
Dim filestream As New System.IO.MemoryStream

Dim docfile As New Aspose.Words.Document(“D:\test\testword\TestDoc.doc”)
Dim XMLStream As New System.IO.MemoryStream
docfile.Save(XMLStream, SaveFormat.AsposePdf)

PDFfile.BindXML(XMLStream, Nothing)
PDFfile.Save(filestream)

filestream.Seek(0, SeekOrigin.Begin)
FileData = New Byte(filestream.Length) {}
filestream.Read(FileData, 0, filestream.Length)
Dim drr As New IO.FileStream(“d:\test\test.pdf”, IO.FileMode.OpenOrCreate)

drr.Write(FileData, 0, FileData.Length)
drr.Close()

Hi,

I see that you are using a byte array to store the file data when writing to file or into database. I have checked the PDF file and it appears that it is corrupted because all bytes in it are zero. So the problem is that the file is not being copied properly from the memory stream to the byte array. The most probable cause of this is that you are using Read function of memory stream to copy the contents of the stream to byte array but before doing this you havn't reset the pointer to the begining of the stream. If you use a code like this:

FileData = New Byte(filestream.Length - 1) {}

filestream.Seek(0, IO.SeekOrigin.Begin)

filestream.Read(FileData, 0, filestream.Length)

Then this should solve your problem. However if you are using some other method of copying data from stream to byte array then please tell me and I will look into it.

Thanks.

Between those last two posts, a solution was hit upon.

Using the code from the final post, slightly edited, the issue has been resolved. Thanks to both of you for your prompt and useful help.