Word and PDF Files.zip (352.0 KB)
Attached are the files. Below is the sample code:
Import Word doc and save it as PDF using Aspose.Words:
var filename = @“C:\beanstalk\pdfGeneratorTestFiles\Original Word Doc with Tables”;
var doc = new Document(filename + “.docx”);
doc.Save(filename + " Saved with Aspose Words.pdf", SaveFormat.Pdf);
Import resulting PDF doc and attempt to access tables using Aspose.PDF:
Document pdfDocument = new Document(@“C:\beanstalk\pdfGeneratorTestFiles\Original Word Doc with Tables Saved with Aspose Words.pdf”);
TableAbsorber absorber = new TableAbsorber();
//table count is 0
var tableCount = absorber.TableList.Count;
If i import attached file “Original Word Doc with Tables Saved Manually From Word.pdf” then the table count renders 8 tables, which is somewhat confusing as well since there should only be 3 tables.
To give you a little background, we have a requirement to concatenate many PDF files into one PDF file and provide a linked TOC. The TOC has to be in a specific format and we would like to allow for a TOC to be created as “template” in a Word document with various placeholders or tokens for data. From there I would need to insert data into the table to create the TOC for the file (using document names and page numbers). I would then need to convert the Word document to a PDF and pre-pend the this PDF as the TOC. However, i also need a way to put the hyperlinks on the TOC to the appropriate page numbers. So i need a way to accurately access the text data in the PDF file so i can put the correct page number on the hyperlink.
Also, if you can think of another way to meet this requirement I am open to suggestions. Thanks!