Hi,
I use Aspose.Pdf.dll to parse a data from PDF-files. It feets well for our needs but today I bumped into an issue of data shifting after parsing. Below an example of data and the parsing result
image (2).png (135.6 KB)
This is expected a result of parsing:
image (3).png (46.0 KB)
This issue happens when header and data are placed in different pages.
Below is a code that I use for PDF parsing:
{
string dataSourceFile = Variables.ScoreCardFullPath.ToString();
string outPutFile = Path.ChangeExtension(dataSourceFile, “.xml”);
Aspose.Pdf.License license = new Aspose.Pdf.License();
// Set license
license.SetLicense(System.IO.Path.Combine(asposePath, "Aspose.Pdf.lic"));
// Instantiate ExcelSave Option object
Document pdfDocument = new Document(dataSourceFile);
Aspose.Pdf.ExcelSaveOptions excelsave = new ExcelSaveOptions();
pdfDocument.Save(outPutFile, excelsave);
using (var stream = File.Open(outPutFile, FileMode.Open))
{
XmlDocument doc = new XmlDocument();
doc.Load(stream);
XmlNode root = doc.DocumentElement;
XmlNamespaceManager nsmgr = new XmlNamespaceManager(doc.NameTable);
nsmgr.AddNamespace("ss", "urn:schemas-microsoft-com:office:spreadsheet");
GetData(root, "//ss:Row", nsmgr);
}
}
How we can fix the issue?