Fail to open PDF file after extraction from word document

masterkiller · May 30, 2008, 3:15am

Hi,
I tried to extract the pdf document inside a word document using aspose.words component (5.1 and 5.2). The original pdf can be open using Adobe Reader 7.0 . But after extraction , the extracted pdf can only be open using Adobe Reader 8.1 , and not Adobe Reader 7.0. While opening the extracted pdf using Adobe 7.0, System will throw an error “The file is damage and could not be repaired” .
Is there anyway to solve this? Does aspose do any conversion on the extracted pdf ? Kindly advise.
I enclosed sample document with pdf.
Thanks
Andy

alexey.noskov · May 30, 2008, 4:29am

Hi
Thanks for your reporting this to us. I managed to reproduce this problem. I have created new issue #5238 in our defect database. I will notify you as soon as it is fixed.
Best regards.

Senlee · June 17, 2008, 8:01pm

hi
Any updates on this issue ?
Is the fix or solution release already ?
Appreciate if you can advise the schedule for the release of this fix.
Thanks

alexey.noskov · June 18, 2008, 3:26am

Hi
Thanks for your request. Unfortunately this issue is still unresolved. At the moment I can’t provide you any estimate. Please expect a reply before next hotfix (within 3-4 weeks)
Best regards.

Senlee · June 18, 2008, 8:04pm

As this issue impacts our program functionality adversely, and user unable to use the function, so we would appreciate that you can convey to your development team to kindly put this as a high priority issue to resolve at the earliest possible date.
Hope to hear good news from you.
regards
Sen Lee

alexey.noskov · June 19, 2008, 2:10am

Hi
Thanks for your request. This issue has “Show stopper” priority in our defect database. We will try to resolve this issue in the next release. But note that there are many other urgent issues we have to work on.
Best regards.

Senlee · June 19, 2008, 2:54am

Yes, I understand that your team will have many other items to work or resolve, but I would like you to put yourself in my shoes.
I have customers/users who are really really unhappy that this issue is taking so long to resolve that it affects their business and operations.
Also with every passing day, this issue remains unsolve, confidence in aspose is eroding by the day, this will then impact your good name and brand which I believe your company have invested substantial resources, $$ etc and would definately not want all these investments to go to waste.
Therefore, I hope you can really help to push your team to come up with a fix asap (like perhaps next week) so that our confidence and faith in Aspose can be restored and we remain satisfied customers of your product.
Regards
Sen Lee

alexey.noskov · June 19, 2008, 4:37am

Hi
I will ask our developer to fix this issue asap.
Best regards.

Senlee · June 19, 2008, 8:06pm

Hi Alexey
thank you very much, appreciate your help tremendously.
Once the fix is done and ready for release pls kindly let us know.
Best Regards
Sen Lee

Senlee · June 25, 2008, 6:19am

hi alexey
pls kindly advise the status for the resolution of this issue.
Is the fix ready to be release ?
Pls advise asap.
Thanks
Sen Lee

alexey.noskov · June 25, 2008, 6:43am

Hi
Unfortunately this issue is still unresolved yet. Maybe it is not so easy to resolve this problem.
Best regards.

Senlee · June 25, 2008, 8:12pm

hi
can you then advise what is the progress so far, what are the problems, difficulties that your team is encountering, also it will be extremely helpful that you can also give a hard date that you will fix the problem.
Also, if the problem is as you said that complex, have your team consider any workaround that can temporary get it to work ?
Pls advise, we are all on edge here.
Regards
Sen Lee

alexey.noskov · June 26, 2008, 3:51am

Hi
I spend some time on this inquiry and found workaround for you. Please see the following code:

Document doc = new Document(@"Test208\in.docx");
NodeCollection shapes = doc.GetChildNodes(NodeType.Shape, true);
int i = 0;
MemoryStream pdfStream = new MemoryStream();
foreach (Shape shape in shapes)
{
    if (shape.OleFormat != null)
    {
        shape.OleFormat.Save(pdfStream);
        // Open save PDF using Aspose.Pdf.Kit
        Aspose.Pdf.Kit.PdfContentEditor edit = new Aspose.Pdf.Kit.PdfContentEditor();
        edit.BindPdf(pdfStream);
        edit.Save(@"Test208\obj_" + i.ToString() + ".pdf");
        i++;
    }
}

I just Open/Save output PDF using Aspose.Pdf.Kit.
Best regards.

Senlee · June 26, 2008, 9:42pm

hi Alexey
Thank you very much.
We will try it out and get back to you.
best Regards 8)
Sen Lee

romank · July 2, 2008, 6:12am

I looked at the issue in code and here is what I can tell:
The embedded object in this OOXML document is not a ready to use PDF document. It is in fact an OLE object that represents a PDF document. In other words, it is a PDF document wrapped into some OLE object wrapper.
When you request Aspose.Words to extract an OLE object, it just does what you ask for - it extracts the OLE object. This includes the outer OLE wrapper with your PDF document inside. The output file as a result is not immmediately a valid PDF document and hence you might not be able to open it in some applications that expect pure PDF and don’t like the OLE wrapper.
The design idea in Aspose.Words was that it is impossible to know how to automatically unwrap all types of OLE wrappers and extract the pure embedded document. The idea was for the user then to use the extracted OLE object on their own. So as far as the code is concerned there is no problem in Aspose.Words.
But from user requests we now see that we need to add unwrapping to our functionality. In one of earlier versions we added automatic unwrapping of Outlook email messages. This time it looks we are going to add automatic unwrapping of PDF documents. We will also look at other possible embedded document types to unwrap.
Here is a note for future requests: If you have a similar issue, just report to us and attach your document - we will add automatic unwrapping to Aspose.Words so what you extract will be the pure document without an OLE wrapper data.
Will release PDF unwrapping later today.

alexey.noskov · July 2, 2008, 8:25am

The issues you have found earlier (filed as 5238) have been fixed in this update.

Senlee · July 2, 2008, 10:13pm

hi Romank
thanks for your update on this.
We have actually tested the workaround from alexey and we have found that the workaround works, but this workaround requires another product call Aspose.pdf.kit.
Romank, you mentioned that you will be able to release the fix yesterday, so may we know whether the fix has been release and how we can get it ? If it is still not immediately available, may i request that you gave us the license to use the Aspose.pdf.kit until you can fix the problem in Aspose.word. let us know soonest. I think this is a fair request, since this issue has been hanging for quite a long time already.
One more query as regards to your inputs, base on what you mentioned, how come it works for pdf view version 8 and not for 7. If there is no automatic unwrapping, it should also not work for version 8 of pdf ??
regards
sen Lee

romank · July 2, 2008, 10:53pm

See Alexey’s previous post - it was fixed and released yesterday.
Regarding Acrobat 8 - Adobe probably made it to read a PDF document wrapped in OLE object because they saw people coming across this from time to time.

Senlee · July 3, 2008, 8:32pm

Hi romank
We have did a test on the new fix that you have provided. So far the test is ok.
Thank you very much for your support.
regards
sen lee