I’ve been surfing the internet to look for .NET tools to convert word to pdf for a long time. Now, Aspose.Words + Aspose.PDF is the best way as far as I could find.
I’ve tested a number of docs with tables, text, images…etc and it works nicely. However, I’ve found one problem; and it’s a big problem to my project requirements… That is, if the Word document contains autoshapes AND text, the autoshapes in the converted pdf will have problems in positioning! The shape objects will overlap with the text!
However, if the document only contains autoshapes but no text, then it seems to be nice, without positioning problem
autoshapes_text.doc (will have problem)
shape_only.doc (has no problem)
Truly I’m much appreciated at the great work by Aspose.Words. However… My project has a requirement that needs to convert doc to pdf… and it’ll be a BIG problem…
If any guys could point out any suggestion, I’d give many many … many thanks and surely I’d purchase it!!
Thank you for considering Aspose and for your kind words.
In the referenced documents I can see two problems:
- Document contents after the shapes overlap them. This is caused by improper handling of inline Canvas on which shapes are drawn. This issue is known as #3229. It was investigated and needs cooperation with Aspose.Pdf team to be fixed. I already requested their help and I am waiting for the fix.
If changing the documents manually is acceptable for you then I can give you a workaround to overcome this issue. You can make the Canvas floating, non-inline and insert several empty paragraphs in place it should be to allocate enough vertical space. Anchor the Canvas to the first of these empty paragraphs. Maybe this approach could also require an extra page break. But it really works.
- Line coordinates are confused. For instance line should be from (x1,y1) to (x2,y2). In Aspose.Pdf XML intermediate file it is exported as from (x1,y2) to (x2,y1). This issue is known as #3232. It has less priority but now I’ll get back to it.
As a workaround re-inserting the line from different end could help. I would try with that.
Please let me know whether these workarounds suitable for you or not. We’ll try to do something. In any case I’ll remind Aspose.Pdf team about the first issue and look at second myself.
Really many thanks for your kind help! As I’m buidling an application that needs to convert MS Word documents (in a number of cases, the document would contain diagrams, flowcharts, boxes… etc) to pd, so proper outloook in the final pdf has to be made.
It’s great to hear your quick response and your co-operation with the Aspose.Pdf team! At the same time, I’ll test some more documents containing autoshapes. I look forward to your news!
Thank you for your kind words. We are glad to help you!
Aspose.Pdf team leader has responded regarding #3229. (Really they have their own issues and numbering.) The priority has been increased. So I hope they will fix this relatively soon. After that I’ll make some changes on by side and perform testing. This all affects only inline Canvas objects. There shouldn’t be problems with diagrams, flowcharts expecting that they are embedded OLE objects or just images.
#3232 is now under investigation.
Please let me know if you can find anything strange or buggy with converting other documents. Any other feedback is also much appreciated.
You said that:
You can make the Canvas floating, non-inline and insert several empty paragraphs in place it should be to allocate enough vertical space. Anchor the Canvas to the first of these empty paragraphs. Maybe this approach could also require an extra page break. But it really works.
um… what does “floating” and “non-inline” mean exactly? Do u mean changing the Canvas properties by right-clicking the Canvas and choose to format the canvas? If so, I guess the approach would be to use the Interop.Word API to change so. (as I want the things to be all done invisibly to the user)
Also, I find another problem (it seems to have been appeared in another post that I’ve read before…) about autoshapes. That is, more complex autoshapes like the cylinder, star, folder… etc) will all become rectangle in the pdf converted. Most likely Aspose Word might not be able to draw all kinds of shapes, and I suppose it’s due to the complexity behind…Perhaps it would be a heavy engineering, but it’s nice to see the autoshapes correctly rendered in the pdf; as it’s common that the input word documents contain diagrams drawn by the various kinds of autoshapes…
I’m not familiar with the engineering behind, but would it be possible to have another mechanism to handle autoshapes, so that they will be rendered properly?
Thank you for your notes.
About the Canvas. I have refactored your document and attached it to the post. I performed the following:
- Changed Canvas placement to “Behind text”;
- Added several empty paragraphs (you can see them after you push the button with this symbol in MS Word: ¶);
- Moved the Canvas to the place it should occupy.
Things will be done invisibly after we fix #3229. But now we are pending on the corresponding fix from Aspose.Pdf team. I already wrote this.
Note that #3232 is already fixed in current codebase so the Line gets converted correctly on my side. (Is it intentionally drawn out of canvas as a separate floating shape?) This fix will be available with the next hotfix.
About the autoshapes.
You are probably meaning this thread:
Thanks for your quick and great help! I used the refactored document attached by you, and I inserted a 5 more line breaks, then the pdf looks pretty nice!
I have the following observation: The number of lines to be inserted should be propotional to the height of the canvas. And I think a few more lines should be inserted for most stability as I’ve found that I still have to insert 5 more lines (do u have to do so in your machine?) in the refactored document to avoid overlapping of the autoshapes with the text. I’m using Office XP.
Anyway, thanks again for your effort on fixing #3232 , though I couldn’t see its effect right now.
Regrading the autoshapes rendering, yes, it would be an expensive effort to draw all 200 shapes. I think basic shapes like rectangle, circle, triangle and line would be enough in this moment.
Please let me know if there’re further news and I’d purchase it!!
That’s the matter of taste how many line breaks to insert. I did it to achieve approximately the same visual layout that original document has. Unfortunately it can depend on environment. Again, sorry for the need in such workarounds.
Rectangle, circle, triangle and line are already implemented since they are the most “popular” shapes. I’m attaching the test document which we use to check our effort in drawing autoshapes. There are less than 200 because the others are very special. These are the probable candidates to be rendered in the future. In PDF arrows I wrote about are also rendered since I tested on the current codebase.
Thanks for the follow-up for #3229 and #3232 again. Great to hear that those candidates would be probably rendered in the future! For the question of number of line breaks, perhaps it’d be more “safe” to have more line breaks. Myabe it would result in more space in the bottom of the canvas in some environments, but it’s better than overlapping a bit with the text in other environments. (just my little supposition…)
I look forward to hearing the release of the next hotfixes!
We have released a new version of Aspose.Words that contains a fix for one your issues.
Issue # 3232 - Arrows’ coordinates don’t correspond to the shapes they are intended to connect.
The new version of Aspose.Words is available for download from here.
oh yeah, Thanks Aspose Team for your great and quick effort.
wishing #3229 to be fixed pendingly.
The issues you have found earlier (filed as 3229) have been fixed in this update.