Error using TableAbsorber to iterate through tables

ksubramaniyam · March 16, 2016, 11:29am

I am using the example specified at the following URL to iterate through tables:

http://www.aspose.com/docs/display/pdfnet/Manipulate+tables+in+existing+PDF

First I convert HTML to PDF using the following HTML string:

test

test2

Then I run the code below on the converted PDF (which has 1 page).

Dim _absorber = New Aspose.Pdf.Text.TableAbsorber()

Dim p As Integer

For p = 1 To pdf.Pages.Count

Dim _page = pdf.Pages(p)

_absorber.Visit(_page)

An error is thrown on the line "_absorber.Visit(_page)" that says "Index was out of range. Must be non-negative and less than the size of the collection.". The page object is valid and when the PDF is saved as PDF, the file is an exact representation what I would expect.

This seems like a simple setup to use the example you have at the above URL so I do not know what is wrong. If you can help me with it that would be great.

tilal.ahmad · March 17, 2016, 10:57am

Hi Kuna,

Thanks for your inquiry. I have tested your scenario with shared HTML using Aspose.Pdf for .NET 11.4.0 and managed to observe the reported exception. For further investigation, I have logged an issue in our issue tracking system as PDFNEWNET-40426 and also linked your request to it. We will keep you updated via this thread regarding the issue status.

We are sorry for the inconvenience caused.

Best Regards,

ksubramaniyam · March 17, 2016, 11:11am

Thanks for your reply!

But bugs like this are shaking our confidence in recommending your tools for development especially for something that is direct from your documentation. Do you guys not do unit testing etc? Looking at other issues I see on the forum and our own, we have no time estimates for any of the fixes - which makes it hard to use your tools. At least can you propose a work around so we can use it in the mean time?

How do you propose we use tableabsorber to visit tables without running into this issue? Is there any other way to go through the tables?

I am tasked with evaluating Aspose.PDF for our needs and I already have run across so many bugs (reported separately in different threads) and left with no idea when these bugs will be fixed. But the bigger question I have is, why am i running across so many issues when I thought this product is very mature?

We have been using Aspose.Words and Aspose.Slides for years and I do not believe we have had this many bugs getting started with any of them.

I am just trying to understand why so many simple bugs are cropping up in this particular tool so our company can make a better decision.

Thanks

Kuna

codewarior · March 20, 2016, 8:24am

ksubramaniyam@dymaxium.com:
But bugs like this are shaking our confidence in recommending your tools for development especially for something that is direct from your documentation. Do you guys not do unit testing etc? Looking at other issues I see on the forum and our own, we have no time estimates for any of the fixes - which makes it hard to use your tools. At least can you propose a work around so we can use it in the mean time?

Hi Kuna,

Thanks for contacting support.

Before every new release, we undergo a thorough unit testing and unless the API has passed certain tests, the version is not released. However the issues are resolved in first come first serve basis and the time taken to resolve any particular issue depends upon the complexity and structure of source file.

ksubramaniyam@dymaxium.com:
How do you propose we use tableabsorber to visit tables without running into this issue? Is there any other way to go through the tables?

TableAbsorber is used to manipulate table in existing PDF file. However the product team will investigate the problem and figure out the reasons causing earlier reported issue.

ksubramaniyam@dymaxium.com:
I am tasked with evaluating Aspose.PDF for our needs and I already have run across so many bugs (reported separately in different threads) and left with no idea when these bugs will be fixed. But the bigger question I have is, why am i running across so many issues when I thought this product is very mature?

Can you please share the other issue ID’s so that I can get the latest status and share the results. We are sorry for your inconvenience.

ksubramaniyam · March 21, 2016, 9:25am

Thanks for your reply.

I believe you can click on my name and see my recent posts to see the issues I have posted.

Given the issues that we are facing in converting our HTML to PDF and, based on our testing, the success we are having with doing the same conversion using Aspose.Word (using the saveas PDF option), we would like to know what are the critical differences between Aspose.PDF and Aspose.Word when used to convert HTML to PDF.

I have asked this in another thread as well and have not gotten any direct answer. Since they are both your products and doing the same thing using one seems to give us less issues (Aspose.Word) over another (Aspose.PDF), it’s best for you to direct us to a page or provide this information. It would help us decide better on how to tackle our conversion problems.

And since we already have license for Aspose.Word, and based on your answer, if there is not much difference between the two, we can work with Apose.Words as it also tends to produce some features that Aspose.PDF does not support (like internal named anchor links, etc)

Please provide direct comparision of Aspose.Word and Aspose.PDF when converting HTML to PDF.

Edit: The other post where I posted this question mentioned the advantage of using Aspose.PDF to do post processing (After HTML is imported in) but that’s what we are finding buggy with Aspose.PDF - besides Aspose.Word’s post processing seems more solid (page resizing and reflow of content for one). At this point, after many hours of evaluating Aspose.PDF and trying to work around issues, we would really appreciate a full comparison of Aspose.Word vs Aspose.PDF when converting HTML to PDF. We are especially interested in the following areas:

1) Reproduction of the look and feel of the original HTML (Formatting, styles, internal links, table sizing, etc)

2) Ability to reflow content when a page is resized. If a table is too large for a page, and we programmaticaly change the page size to a higher dimension (A1, etc), we require the content to reflow to the new page size. (seems to work on Aspose.Word)

3) Ability to control page breaks from HTML through “page-break-after” CSS (crashes Aspose.PDF and works on Aspose.Word)

4) Advanced features like tooltip conversion, internal acnhor based links, cleaning up of invalid links (javascript etc) (All seem to not work at all on Aspose.PDF)

5) Conversion speeds. With our limited testing, we are already running across speed issues with small dossiers. In production, we need to convert large dossiers which could span 300+ pages. Our testing indicates Aspose.Word is much quicker in doing the conversion than Aspose.PDF. We want you to outline the differences here more specifically as I assume you have more in-depth test data.

Time is of the essence and we need to make a decision shortly. We have spent last 3 weeks on this and have kept running across issues when we tried to find workarounds to existing issues. Without any known workarounds for existing issues, or ETA on fixes, it is very difficult for my team to continue our evaluation of Aspose.PDF or recommend it.

Any information that helps us understand the Aspose.Word vs Aspose.PDF will help us choose between the two. Otherwise, we will need to look at other PDF components.

Thanks for your time

Kuna

codewarior · March 22, 2016, 1:46pm

Hi Kuna,

Thanks for sharing the details.

Aspose.Pdf and Aspose.Words are two separate API’s and both have different Document Object Model for files rendering/manipulation. However form your above details, it appears that Aspose.Words is producing better results while rendering HTML files to PDF format and as long as it fulfills your requirements, you may consider using Aspose.Words for HTML to PDF generation.

Please note that Aspose.Pdf provides the feature to create as well as manipulate existing PDF files and it also offers the feature to transform various file formats to PDF format and it can also be used to transform PDF files to various supported file formats.

Similarly, Aspose.Words offers the feature to create as well as manipulate existing MS Word files and it also provides the capabilities to various different file formats and render output as either MS Word or other files formats (including PDF).

ksubramaniyam · March 22, 2016, 2:46pm

Thanks for your reply.

I think you have not answered my questions. I am not interested in your answer regarding what Aspose.PDF does GENERALLY but interested more in what your team has found in terms of HTML to PDF conversion differences using Aspose.Word vs Aspose.PDF. Please restrict your answer to only this functionality as our need are based on this for now.

For the issues we have come across with Aspose.PDF, we have found Aspose.Word seems to handle them better but that doesn’t mean Aspose.Word is better in this case overall as we do not have the inner knowledge of either product to make a judgement call. I am sure Aspose.Words handles other areas of HTML conversion worse than Aspose.PDF - it’s that we don’t have the resources to test everything during this evaluation phase.

What we need is your team to provide information that you probably already have comparing what areas Aspose.PDF is better at in converting HTML to PDF vs Aspose.Word. Given both products are created by you, and in this case, the functionality in question is the same, I would assume your team would have the expertise to outline to us so we can make the decision.

If we do go with Aspose.PDF, it will be a significant purchase on our end in terms of dollar amount, so we would appreciate specific information.

Lacking any detailed comparison (based on the questions in my earlier posts), if you can at least provide direction on what HTML tags and CSS styles are currently supported by Aspose.PDF, it would be helpful. Or list of things that are not supported or in the works. Any detailed directions outside of generic responses would really help us.

What can we do on our end to get better support to help us answer some of these questions? Or elevate priority of the items we have raised? Does getting a subscription help with either? Or do we need to go with other support packages? Please provide details, as we want to make a decision soon.

Thanks

Kuna

codewarior · March 25, 2016, 11:58am

ksubramaniyam@dymaxium.com
I think you have not answered my questions. I am not interested in your answer regarding what Aspose.PDF does GENERALLY but interested more in what your team has found in terms of HTML to PDF conversion differences using Aspose.Word vs Aspose.PDF. Please restrict your answer to only this functionality as our need are based on this for now.

ksubramaniyam@dymaxium.com
For the issues we have come across with Aspose.PDF, we have found Aspose.Word seems to handle them better but that doesn’t mean Aspose.Word is better in this case overall as we do not have the inner knowledge of either product to make a judgement call. I am sure Aspose.Words handles other areas of HTML conversion worse than Aspose.PDF - it’s that we don’t have the resources to test everything during this evaluation phase.

ksubramaniyam@dymaxium.com
What we need is your team to provide information that you probably already have comparing what areas Aspose.PDF is better at in converting HTML to PDF vs Aspose.Word. Given both products are created by you, and in this case, the functionality in question is the same, I would assume your team would have the expertise to outline to us so we can make the decision.

Hi Kuna,

Thanks for sharing the details and sorry for the delayed response.

Aspose.Pdf for .NET is being widely used by many customers for HTML to PDF conversion and there have been few cases where customers have reported that API is not honoring some HTML and CSS tags during conversion. Among these issues, there have been many scenarios that were investigated and resolved and some are still under investigation and hopefully, they will be resolved soon. Now concerning supported features for Aspose.Pdf for .NET, please visit

Frankly speaking, we do not have any comparison document between features of Aspose.Pdf and Aspose.Words but I have asked my fellow worker from Aspose.Words team to share required information for HTML to PDF conversion using Aspose.Words.

ksubramaniyam@dymaxium.com
What can we do on our end to get better support to help us answer some of these questions? Or elevate priority of the items we have raised? Does getting a subscription help with either? Or do we need to go with other support packages? Please provide details, as we want to make a decision soon.

All the issues reported by our customers are equally important for us and the team will surely consider investigating your reported issue as per their schedule. As soon as we have some further updates, we will let you know.

ksubramaniyam · March 29, 2016, 3:27pm

Thanks for your post.

All the links you posted are for the PDF.Generator based conversion which you no longer recommend.

I have been told in other posts that, Generator has been discontinued and I should use Document based conversion. I am sure what you posted doesn’t apply as a whole to Document based approach, as we have ourselves have come across issues between Generator vs Document (mostly with tag support and formatting). I do not want to open another can of worms with that comparison here again.

I guess the answer is you do not currently have any relevant/updated documents for Document based conversion? If not, I would highly recommend that you create necessary documents, as it would help people like us to evaluate your products better and easily. If you do, please share what you can.

thanks

Kuna

codewarior · March 30, 2016, 10:57pm

Hi Kuna,

The links shared earlier are related to Aspose.Pdf.Generator and the purpose of sharing the links has been to elaborate information regarding supported HTML to PDF conversion features. Though the links are related to legacy Aspose.Pdf.Generator, but most of these features are supported by new DOM approach. Nevertheless, I am in coordination with product team to get an exclusive list of supported HTML/CSS tags in DOM. As soon as we have some definite updates, we will let you know.

ksubramaniyam · March 31, 2016, 12:57pm

Thanks for the reply.

As referred to earlier in my post, we have ourselves found and reported things that work using the “Generator” way that don’t work using the “Document” way - so we can’t rely on old documents. Initially, we used Generator for our evaluation not realizing it was discontinued. You can refer to other posts I have made in the forum (as well as by others who have had issues with upgrading to Document).

In any event, we appreciate your help but want to make sure our decisions are based on accurate information. What we are interested in most is list of issues that others have reported or you are aware of with HTML to PDF conversion using the Document method - so that we know what we are getting into. Even if the list is not complete.

Thanks

Kuna

codewarior · April 3, 2016, 1:51am

Hi Kuna,

Thanks for sharing the details.

The HTML parser is fully based on official specification (https://www.w3.org/TR/html5) and supports the list of all available tags (see.: section 4 of the specification), except several are not supported by pdf-renderer for instance: KEYGEN, OUTPUT, PROGRESS, METER, RUBY and some states of the INPUT element such as: color; password; date; time; embedded content like: TRACK, SOURCE, AUDIO, which are currently not supported.

Furthermore, the CSS is also based on official specification. At first we implemented CSS2.1 specification and the list of the supported properties (https://www.w3.org/TR/CSS2/propidx.html) except ‘aural’ group (speech-rate, voice-family etc.). Now, we are focused on implementation a new features and rendering behaviors from CSS3 specification (https://www.w3.org/Style/CSS/current-work). Unfortunately, this part of the w3c is under development and the most of the css-properties are in experimental mode, their behavior is not defined properly and they keep changing very quickly, but we are trying our level best to keep these properties up to date.

Now concerning to list of problems in new DOM approach during HTML to PDF conversion, there have been some scenarios where customers have reported problems and from time to time, we keep on fixing them and in case you are facing problems in this area, please share the details and we will be more than happy to help you out in this regard.

codewarior · April 5, 2016, 12:40pm

Hi Kuna,

We have further investigated the earlier reported issue and as per my observation, it’s a bit difficult to prepare a specific list, because we’re working with specification which is under development and for instance, one of the basic parts of the specification, such as a ‘Table Module’ was sent to rework and we have been waiting for any updates from w3c for a year. If this module is changed too much, it will be marked as a problematic area, but now it works. The similar situation occurs with other parts of the CSS specifications. They are working now, but it may change after the significant improvements in the official documentation. Fortunately, our team is always ready to make changes to the CSS rendering core and we are trying to do it as soon as possible.

On other hand, we really have a few features that we don’t support such as

’css3-animations’, because the output (pdf) format is static and can’t support the animation in the same way as browsers; or ‘css3-speech’ that is responsible for aural presentation of information.

Additionally, the main direction we are working on is to improve the JavaScript engine. Current implementation works fine with the common JS scripts, but for execution the more complicated scripts such as jQuery we need to improve our implementation.