Free Support Forum - aspose.com

Multiple issues combining various file types to pdf. Please help!

I am working on a PDF Combine process to consolidate different types of files into a single PDF file. It works for 90% of the files but some files causes strange errors when converting to pdf.

We are experiences the following types of errors:

Value cannot be null. Parameter name: name

Unknown file format.

Object reference not set to an instance of an object.

I have included a comprehensive sample to reproduce the issues for you. The erroneous files have been included.

Any help would be greatly appreciated! Thanks,

Hi Jason,


Thanks for contacting support.

I have tested the scenario by executing the sample project which you have shared and I am able to notice errors in log file which is created as a result of operations performed. However I am still not able to see any resultant file under C:\Data\Folio Automation\Watch\PDF Combine\Running folder. There are many files present under C:\Data\Folio Automation\Watch\PDF Combine\Error folder.

As per my understandings, you are reading input files from source folder, converting them to PDF format and concatenating the newly created PDF file with main PDF document inside combinePDFWithAssociatedFiles(…) method.

Furthermore, in log file generated under C:\Data\Folio Automation\Logs\Folio Automation folder, I have noticed Error tag against some PDF files but when I have tried reading these files into Aspose.Pdf.Document object, I am unable to notice any issue.

  • GCU_SAN001_85694 (2013-03-21 01.06.23).pdf
  • HWT_ATL009_52600 (2013-03-21 01.06.25).pdf
  • HWT_CHA013_53261 (2013-03-21 01.06.27).pdf
  • HWT_CHA013_53301 (2013-03-21 01.06.28).pdf
I am still not sure on how to determine which component (Aspose.Pdf or Aspose.Words) causing the issues during PDF file generation or the problem is occurring while concatenating the documents. Can you please identify the input files and steps on how quickly we can replicate the issues at our end. We are sorry for this inconvenience.

Hi, thanks for the reply.

You are correct about the functionality of the process. It runs as a SharePoint Timer Job and monitors the "watch" folder for files to combine. As a file is processed, it is copied to the "Running" folder. If the combine succeeds, the file is copied to the "Completed" folder. If any error occurs during the combine process, it is copied to the "Error" folder and the error description is written to the logs.

To get the full stack-trace of the errors, you can modify the NLog configuration settings in the sample's app.config file. Change the following line:


To:

This should indicate where the errors are occurring.

Please read the _readme.txt file to get the sample working. You can replicate the issues by copying the attached sample files to the "watch" folder and running the application. All files should be processed and moved to the "Error" folder with the full tracing/error information in the logs files. If you put a breakpoint in the "combinePDFWithAssociatedFilescombinePDFWithAssociatedFiles" method, you can step through it and see where each of the file combines fail. Depending on the file type, either Aspose.PDF or Aspose.Words causes the errors.

The problem could be that some of the input files are simply corrupt. I can confirm the one of attached .rtf files is corrupt and is not a valid .rtf file. But the other HTML/PDF/TIF files appear to be fine and I cannot see any problem with them.

Thanks,

jasondicker:
You are correct about the functionality of the process. It runs as a SharePoint Timer Job and monitors the "watch" folder for files to combine. As a file is processed, it is copied to the "Running" folder. If the combine succeeds, the file is copied to the "Completed" folder. If any error occurs during the combine process, it is copied to the "Error" folder and the error description is written to the logs.

Please read the _readme.txt file to get the sample working. You can replicate the issues by copying the attached sample files to the "watch" folder and running the application. All files should be processed and moved to the "Error" folder with the full tracing/error information in the logs files.

Hi,

Thanks for sharing the details. In my earlier attempt, I managed to run the application by following teh instructions specified in _readme.txt file and have noticed that all the flies are copied under Error folder. Which indicates that resultant/concatenated PDF file is not generated.

jasondicker:
If you put a breakpoint in the "combinePDFWithAssociatedFilescombinePDFWithAssociatedFiles" method, you can step through it and see where each of the file combines fail. Depending on the file type, either Aspose.PDF or Aspose.Words causes the errors.

As shared in my earlier post, I checked the error log and identified the PDF documents against whom Error tag was present. However when I have tired reading those documents using Aspose.Pdf for .NET, I did not notice any issue.

jasondicker:
The problem could be that some of the input files are simply corrupt. I can confirm the one of attached .rtf files is corrupt and is not a valid .rtf file. But the other HTML/PDF/TIF files appear to be fine and I cannot see any problem with them.
I am not entirely certain if the problem is occurring during RTF to PDF conversion but I have asked my fellow worker to have a look over this problem.


We are really sorry for this inconvenience.

Hi Jason,

Thanks for your inquiry.
Jason:

The problem could be that some of the input files are simply corrupt. I can confirm the one of attached .rtf files is corrupt and is not a valid .rtf file. But the other HTML/PDF/TIF files appear to be fine and I cannot see any problem with them.

There are two RTF documents in your attached SampleData.zip archive of which Aspose.Words detects 'DEW_YOU005_78341_284022~$rpefelt Mr.s P x 4 (2013-03-20 10.44.25).rtf' as in unrecognized format and cannot be loaded by Aspose.Words. The latest version of Aspose.Words (13.2.0) throws UnsupportedFileFormatException during loading it in memory. I have logged this issue in our bug tracking system. The issue ID is WORDSNET-7993. Our developers will further look into the details of this problem. Your request has also been linked to this issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.

Best regards,

Thanks, can you please confirm whether you can reproduce the issues pertaining to the other input files? From earlier discussion, it sounds like the errors can be produced in the sample application, but when used in your tests it works fine. Just to reiterate, the PDF Combine process works for hundred's of thousands of documents, but the attached input files causes the mentioned errors.

Any help will be greatly appreciated.

Thanks,

Hi Jason,


As shared in my earlier post, during my testing, I noticed Error tag with the name of some PDF files but I did not notice any issue while reading these documents with Aspose.Pdf.Document class. May be the concatenate method is causing issue because for some RTF files, incorrect PDF files are being generated with Aspose.Words. May be once these issues are resolved, we can again try executing the example and notice the behavior of application.

We are sorry for this inconvenience.

Are you saying that errors are due to bugs in the sample application and not in the Aspose libraries? What do you mean by "I did not notice any issue while reading these documents with Aspose.Pdf.Document class"?

Have you made the app.config modifications to write out the full stack track of the errors? This will tell you exactly where the errors are occurring. I have done this and can confirm that the errors are occurring in the Aspose libraries. As mentioned earlier, this sample application works perfectly for hundreds of thousands of documents, only the attached sample files are causing the problems.

If you say the problem if in the sample application and not in the Aspose libraries, then you need to be more specific and offer advice on how to resolve it using your libraries.

Thanks

Hi Jason,


I will further test this scenario and will share my findings. Please be patient and spare us little time.

Hi Jason,


Thanks for your patience. Regarding WORDSNET-7993, it is to inform you that on further investigation our development team came to know that they won’t be able to implement the fix to your issue. Most likely, your issue will be closed with ‘‘Won’t Fix’’ resolution. You are right, the input RTF file in question is simply corrupted and is not valid. It is unreadable for Aspose.Words; besides, Microsoft Word also couldn’t read it. Please let me know if I can be of any further assistance.

Best regards,

Hi, thanks for the response.

Please confirm why it won't be fixed. Can you confirm that it is a problem in the Aspose libraries or if there is any workaround I can implement to avoid these issues?

Thanks,

Hi Jason,


Thanks for your inquiry. If you open this RTF document with Microsoft Word, it asks you to select the encoding to be able to make your document readable; unfortunately, you won’t get any readable results by choosing any of the encodings in list. There are some Asian encodings with partially readable output but last part of the document is unreadable. The source code of this document is in Unknown format, therefor we think that it’s either a corrupted document or an ‘Unknown file format’. Moreover, please not that in your case Aspose.Words does the right thing and there is no issue with Aspose.Words.

Best regards,

Hi Jason,


Thanks for your patience.

I have tested the scenario in details where I have first placed .txt files in ‘c:\Data\Folio Automation\Watch\PDF Combine’ folder and executed the application and I did not notice any issue (no files were copied to ‘c:\Data\Folio Automation\Watch\PDF Combine\Error’ folder and I also did not notice any information in log file. Then I placed the .HTM and .TIFF files in PDF Combine folder and I did not notice any problem.

However when I have placed PDF files in respective folder, problem started appearing. Later on I tried loading the PDF files into Aspose.Pdf.Document object and I started getting InvalidPdfFileFormatException: Startxref not found exception. So the problem seems to be related to PDF files. For the sake of correction, I have logged this
issue as
PDFNEWNET-35116 in our issue tracking system. We will
further look into the details of this problem and will keep you updated on the
status of correction. Please be patient and spare us little time.

We are sorry
for this inconvenience.

Hi Jason,


We have further investigated the issue in which some PDF files are causing issue when trying to read and concatenate them into a single output file. As per our observations, the following files seem to be corrupted as their size is just 272 Byte. I even canot view these files using Adobe Reader 11.0.3. However for rest of the PDF files, Aspose.Pdf.Document class can open and get the information associated with them.

  1. JHI_AFR003_244986 (2013-03-20 10.49.48).pdf
  2. JHI_CON016_247443 (2013-03-20 10.49.50).pdf
  3. JHI_FLI002_244401 (2013-03-20 10.49.52).pdf
  4. JHI_FRO001_246746 (2013-03-20 10.49.54).pdf
  5. JHI_GEN054_247741 (2013-03-20 10.51.27).pdf
  6. JHI_HOT000_240582 (2013-03-20 10.51.29).pdf
  7. JHI_INT036_246493 (2013-03-20 10.51.31).pdf
  8. JHI_KLM001_243652 (2013-03-20 10.51.33).pdf
  9. JHI_SAA164_239943 (2013-03-20 10.51.35).pdf
  10. JHI_SWI001_241589 (2013-03-20 10.51.37).pdf
  11. JHI_SWI001_246464 (2013-03-20 10.51.39).pdf
  12. JHI_TRA227_246944 (2013-03-20 10.51.41).pdf
  13. JHI_UTA001_247141 (2013-03-20 10.51.50).pdf
  14. JHI_WIN000_238159 (2013-03-20 10.52.02).pdf
  15. JHI_TWF039_242800 (2013-03-20 10.51.48).pdf (only this file is 432 KB)

It seems like the PDF files are not correct and they seem to be corrupted. Can you please take a look over your end.

Hi,

Upon further investigation, the files you reported are indeed corrupt and that is something we can live with. However, a lot of the files seem to be fine and open correctly in their respective client programs but are still causing problems in the Aspose libraries. For example, these files are not corrupt and the PDF Combine process fails:

HWT_ATL009_52600 (2013-03-20 10.48.31).pdf
HWT_ATL009_52600_2017809 (2013-03-20 10.48.31).txt
HWT_ATL009_52600_2017872 (2013-03-20 10.48.31).txt
HWT_ATL009_52600_63782Reg card and voucher - Mulaudzi 28-02-13 (2013-03-20 10.48.31).tif

An "Object reference not set to an instance of an object" error is thrown when combining the associated TIF file. Put a break point in the "appendImageFileToPDFDocument" method and you will see that the error occurs in the Aspose library.

Thanks,

Hi Jason,


I am working over this scenario and will get back to you soon.

jasondicker:

Hi,

Upon further investigation, the files you reported are indeed corrupt and that is something we can live with. However, a lot of the files seem to be fine and open correctly in their respective client programs but are still causing problems in the Aspose libraries. For example, these files are not corrupt and the PDF Combine process fails:

HWT_ATL009_52600 (2013-03-20 10.48.31).pdf
HWT_ATL009_52600_2017809 (2013-03-20 10.48.31).txt
HWT_ATL009_52600_2017872 (2013-03-20 10.48.31).txt
HWT_ATL009_52600_63782Reg card and voucher - Mulaudzi 28-02-13 (2013-03-20 10.48.31).tif

An “Object reference not set to an instance of an object” error is thrown when combining the associated TIF file. Put a break point in the “appendImageFileToPDFDocument” method and you will see that the error occurs in the Aspose library.

Hi Jason,


Thanks for your patience.


I
have tested the scenario and I am able to notice the same problem. For the sake
of correction, I have separately logged it as PDFNEWNET-35185 in
our issue tracking system. We will further look into the details of this problem
and will keep you updated on the status of correction. Please be patient and
spare us little time.

We are sorry for this inconvenience.<o:p></o:p>