Issue with reading PDF to XML in Excel file


#1

Hi, we are using Aspose.PDF for .net to read PDF file to XML in Excel file. Unfortunately the product could not read the last page correctly. I attach the issue sample here as a reference. Please note that only the first PDF sample can be read correctly. PDF file reading issue.pdf (106.4 KB)
I would appreciate if support team can help to solve the problem. Please feel free to let me know if you had any question about the issue.


#2

@helen.xu

Thank you for contacting support.

Would you please elaborate the problem a little more while sharing source and generated ZIP files along with SSCCE code so that we may try to reproduce and investigate it in our environment. Before sharing requested data, please ensure using latest version of the API.

Moreover, it is recommended to use the API with valid license. In case you do not have a valid license, you can get a free 30-days temporary license from our website.


#3

data source.zip (184.5 KB)
@Farhan.Raza, thanks very much for replying my technical support request. We did purchase a license for the product. Currently only reading concern is about ‘Unrecorded Meal Breaks’. The records for ‘Unrecorded Meal Breaks’ is usually coming in the last page of file. Depends on what records are in the last page, the reading process may mess up the position of the data on the XML file. I zip all information you may need including the 3 XML files read from the corresponding pdf files. Comparing with the PDF files, the records for ‘Unrecorded Meal Breaks’ on the last page of XML files are messed up for Sample2.pdf and Sample3.pdf. Please feel free to let me know what you need to solve the problem. Thanks very much.


#4

@helen.xu

Thank you for sharing requested data and elaborating it further.

We have used below code to convert the PDF file to XML and then compared generated file with source file.

Document pdfDocument = new Document(dataSourceFile);
Aspose.Pdf.ExcelSaveOptions excelsave = new ExcelSaveOptions();
pdfDocument.Save(outPutFile, excelsave);

However, the data is appearing identical when XML file is viewed with MS Excel as in attached screenshot. Would you please share screenshots while elaborating a little more so that we may proceed further. Comparison.PNG


#5

@Farhan.Raza Thank you for message. However, I don’t have anything more to elaborate on this issue beside the zip file I sent with last message. My understanding is, depending the pattern at the beginning of last page, if there are 4 columns in each section, then the tool can read well, otherwise the output XML will be messed up. Since the data in this XML file will be loaded into database staging table, it’s hard to fix the problem using T-SQL because each file might be different. We do need this fixed as soon as possible. Thank you very much for your support.


#6

@helen.xu

Thank you for the feedback.

We have logged a ticket with ID PDFNET-46804 in our issue management system for further investigations. We will let you know once any update will be available in this regard.


#7

@Farhan.Raza Thank you for taking care this request. I look forward to hearing from you soon about the solution.


#8

@helen.xu

We will get back to you once any further update will be available regarding this ticket.


#9

Hi Farhan,

How is the thing going? Have you made some progress on this ticket? I hope to get this issue resolved as soon as possible, we need those data deployed to production this week. Thanks very much for support.

Thanks, Helen


#10

@helen.xu

We are afraid it has not been investigated yet. Please note it has been logged under free support model and will be resolved under first come first serve policy which may take some months.

Moreover, we also offer paid support model where issues are resolved on urgent basis and have priority over the issues logged under free support model. You may check our Paid Support options for your reference.