Converting PDF to Aspose.Words -Unknown file format error

Hi ,

I need to merge a number of PDF's into one doc , so i need to convert my PDF's to Aspose.Words in order to do the merge. I'm currently having problems converting one of my PDF's into Aspose.Words , i am using version 11.3.0.0 of Aspose.words. This is my code:

Aspose.Words.Document doc = new Aspose.Words.Document("C:\\alicia\\" + "MeansTest.pdf");

I have attched my PDF , as you can see i get an unknow file format error.

I have tried converting my PDF to a word document with another 3rd party tool , and try to upload this as well with the following code :

Aspose.Words.Document doc = new Aspose.Words.Document("C:\\alicia\\" + "MeansTest.doc");

But this is coming into aspose words corrupt with text illegable.

Can you help please , thank you , I have attached both documents:

Ali

Hi Ali,

Thanks for your query. Your query is more related to Aspose.Pdf product. I am moving this thread to
Aspose.Pdf forum where one of the colleague will
reply you shortly.

Hi Ali,

Thanks for using our products and sharing the sample source code and template documents with us.

AliG:

I need to merge a number of PDF's into one doc , so i need to convert my PDF's to Aspose.Words in order to do the merge.

You can merge the number of PDF's into one PDF using Aspose.Pdf for .NET. Please visit the following documentation links for more details about merging PDF documents.

AliG:

I'm currently having problems converting one of my PDF's into Aspose.Words , i am using version 11.3.0.0 of Aspose.words.

You can also convert the merged PDF into DOC format using Aspose.Pdf for .NET. Please visit the following documentation link for answer of this query.

<A href="</A></P> <P>Please feel free to contact support in case you need any further assistance.</P> <P>Thanks & Regards,</P>

Hi Ali,


Thanks for your interest in our products.

Adding more to Rashid’s comments, please note that Aspose.Words for .NET provides the capability to create/edit/manipulate MS Word files and it also supports the feature to convert Word documents into PDF format. However we have a separate product named Aspose.Pdf for .NET which provides the capability to create/manipulate/edit PDF documents. It also supports the feature to transform PDF files into DOC format. So as per your requirement, you may first concatenate all PDF files into a single document using Aspose.Pdf for .NET and transform the resultant file into DOC format or you may consider transforming individual PDF files into DOC format and then merge them using Aspose.Words for .NET.

The reason you are getting error is because you are trying to open PDF file using Aspose.Words. In case you have any further query, please feel free to contact.

Hi Nayyer & Rashid,

Thank you very much for all your help , i really appreciate it. I am now successfully converting all my word documents and PDF's all into Aspose.Words to be merged into one document complete with Table of Contents and all is working very nicely. Thank you !

However i still just have one outstanding issue that maybe you can help with me , which is converting the attached PDF to Aspose.words format so i can also merge it with my other documents. This PDF is behaving differently to the others PDFs(that are working fine) as it has large images in it , when i attempt to convert this i get scrambled and distorted document.

Please find the PDF i'm having issues with attached , thank you again ,

Ali

Hello Ali,


I have tested the scenario and I am able to reproduce the same problem that images in resultant DOC are corrupted. For the sake of correction, I have logged this problem as PDFNEWNET-33786 in our issue tracking system. We will further look into the details of this problem and will keep you updated on the status of correction. Please be patient and spare us little time. We are sorry for your inconvenience.

As your source PDF files contains images, so as a workaround, you may consider extracting the Images from PDF file using Aspose.Pdf for .NET and then create a DOC file with these images using Aspose.Words for .NET. Please visit the following links for further details on


Hi,

Is there any update on this issue ? We are still having problems collating PDF's , even with the work around you suggested. It seems to be working for some documents now that are made up of Images , but when i attempt to import PDF's that dont contain images the document is being corrupted. At the moment i am trying to collate this PDF and it is coming in to my aspose. words document corrupted .

Please find PDF attached:

Hello Ali,


Thanks for sharing the resource file.

I have tested the scenario using Aspose.Pdf for .NET 7.1.0 where I have tried converting the PDF file into DOC format and as per my observations, the resultant DOC file is properly being created. I am able to open and view the file in MS Word 2007. However I have observed that when I have loaded the same file using Aspose.Words and have tried converting it into XPS for TIFF format, and when I have tried viewing the output generated with Aspose.Words, the documents seem to be corrupted. So the issue seems to be related to Aspose.Words. I am moving this thread to respective forum where my fellow workers taking care of Aspose.Words would be in better position to further comment over this issue. We are sorry for this inconvenience.

I have also tried converting the PDF file into DOC format using upcoming release version of Aspose.Pdf for .NET 7.2.0 and formatting of resultant .DOC file is quite better. However I have observed one issue where all the characters against form fields are appearing in Lower-Case rather than Upper-Case. For the sake of correction, I have separately logged this issue as PDFNEWNET-34053 in our issue tracking system. We will further look into the details of this problem and will keep you updated on the status of correction. Please be patient and spare us little time. For your reference, I have also attached the resultant PDF which I have generated with v7.2.0.

Hi ,

I have both Aspose.Words and Aspose.PDF so if this works with Aspose.PDF then this is ok and i wont use Aspose.Words to convert a PDF into a word document.

Quote:

"I have tested the scenario using Aspose.Pdf for .NET 7.1.0 where I have tried converting the PDF file into DOC format and as per my observations, the resultant DOC file is properly being created. I am able to open and view the file in MS Word 2007"

That's great , can you send me the code you are usings , because i have tried it here and i dont get a good results.

I have tried the following code using the latest version 7.1 of Aspose.PDF:

Aspose.Pdf.Document document = new Aspose.Pdf.Document("c:\\alicia\\Appform.pdf");

document.Save("c:\\alicia\\Appform.doc", Aspose.Pdf.SaveFormat.Doc);

But my resulting document , when i open it in word is corrupt. (Word 2007)

This issue is urgent now and I need to get this issue resolved asap , i have logged this and other bugs related to it over 2 months ago and my client is not happy. We have paid for the Aspose Licences and now we are thinking we will have to buy another tool as Aspose does not appear to be able to deliver our requirments. So please any help would be appreciated.

Ultimately , we need to merge a number of files of different types into one merged document , with a table of contents and page numbers. Currenly ,we can successfully merge docs , RTF's and Txt files and generate the table of contents. But we are having huge issues with our PDFs.

We need to be able to merge these 6 sample PDF's attached , if we cannot do this with Aspose we are going to have to try and find another tool.

Please find the following sample 6 PDF's we need to be able to merge attached:

Thank you for your help ,

Regards,

Ali

AliG:

Hi ,

I have both Aspose.Words and Aspose.PDF so if this works with Aspose.PDF then this is ok and i wont use Aspose.Words to convert a PDF into a word document.

Quote:

"I have tested the scenario using Aspose.Pdf for .NET 7.1.0 where I have tried converting the PDF file into DOC format and as per my observations, the resultant DOC file is properly being created. I am able to open and view the file in MS Word 2007"

That's great , can you send me the code you are usings , because i have tried it here and i dont get a good results.

I have tried the following code using the latest version 7.1 of Aspose.PDF:

Aspose.Pdf.Document document = new Aspose.Pdf.Document("c:\\alicia\\Appform.pdf");

document.Save("c:\\alicia\\Appform.doc", Aspose.Pdf.SaveFormat.Doc);

But my resulting document , when i open it in word is corrupt. (Word 2007)

Hi Ali,

Thanks for contacting support.

I have again tested the scenario while using following code snippet with Aspose.Pdf for .NET 7.1.0 and as per my observations, the resultant Document is properly opening in MS Word 2007. I have also attached the resultant .doc file which I have generated. Please take a look.

AliG:
This issue is urgent now and I need to get this issue resolved asap , i have logged this and other bugs related to it over 2 months ago and my client is not happy. We have paid for the Aspose Licences and now we are thinking we will have to buy another tool as Aspose does not appear to be able to deliver our requirements. So please any help would be appreciated.

Our development team has further investigated the earlier reported issue PDFNEWNET-33786 and we plan to get it resolved in August-2012. However for the newly reported issue, PDFNEWNET-34053, as we just have been able to notice this issue, we need little time to investigate this issue and we will also try to resolve it in upcoming month. Furthermore, if you need get the issues resolved on priority basis, you may consider subscribing for Priority Support or Enterprise Support. The issues logged under Priority Support or Enterprise Support have higher precedence of resolution as compare to normally logged issues. Please visit the following link for more information on Different Support surfaces.

AliG:
Ultimately , we need to merge a number of files of different types into one merged document , with a table of contents and page numbers. Currently ,we can successfully merge docs , RTF's and Txt files and generate the table of contents. But we are having huge issues with our PDFs.

We need to be able to merge these 6 sample PDF's attached , if we cannot do this with Aspose we are going to have to try and find another tool.

Please find the following sample 6 PDF's we need to be able to merge attached:

I have again tested the scenario by using the attached 6 documents and I am able to observe the formatting issue. This information is also added to existing PDFNEWNET-33786. As soon as we have made significant progress towards the resolution of above specified issues, we would be more than happy to update you with the status of correction. Please be patient and spare us little time. We are really sorry your inconvenience.

Hi Ali,

Thanks for your interest in Aspose.Words.

Regarding Doc to XPS/Tiff conversion issue mentioned at this link by my colleague, I have managed to reproduce the same issue at my side. I have logged this issue as WORDSNET-6672 in our issue tracking system. You will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Hi Ali,

I have again tested the conversion of individual PDF files which you have shared earlier and have found issues specific to each document. I have separately created a unique issue ID against each document test and the details are specified below.

FamilyLawform.pdf
During the testing, I have observed that formatting is disturbed and text is appearing on top of each other.
i.e. Date of birth field on page 1 and specially fields on page 3. Its been also noticed that the DOC contents are appearing in Time New Roman whereas the source PDF has contents in ArialMT, Arial-BoldMT, Calibri, Calibri-Bold, Marlett and Times New Roman font. All these issues are logged as PDFNEWNET-34077.

MeansTest.pdf
During the conversion of this file, I have observed following issues.

  1. Text is appearing in Time New Roman instead of Calibri, Microsoft SansSerif and Marlett.
  2. Text inside table cells is appearing on top of Cell border. Notice “Enter Weekly Amount” on page 1.
  3. Light green color is missing from Background of Table Cells.
  4. Yellow background color is missing for row 8 “Spouse / Partner Income” on page 1 and on two rows at page 2.
  5. Table Heading and Header row title is appearing as normal text
    rather than Bold. Notice “STATEMENT OF CAPITAL” at the end of page 2.
  6. Formatting is completely lost on page 3 where Row headings are
    appearing on top of Cell borders and also the Horizontal Green line is
    missing.
  7. Controls are missing on page 4 and also the formatting is totally lost.
All the above issues related to this document are logged as PDFNEWNET-34078.

Receipt Mark Delaney.pdf
During the testing, it’s been observed that the text in DOC file is garbled and appearing on top of table borders. I have logged them as PDFNEWNET-34079.

IntlPIguide.pdf
During the conversion of this file to DOC format, I have noticed following issues.
  1. Extra space between Characters and words is appearing. Notice Table Of contents string and other Headings in Table contents.
  2. Heading is not Bold, blue underline is missing for Headings in Table Of Contents.
  3. Incorrect font is appearing for Table of Contents elements. All contents are appearing as plain Times New Roman.
  4. Notice Footer of DOC file and you will see extra space between characters.
  5. Horizontal lines are missing on page 11, 12, 17 and page 22.
  6. Page 14 and 15 contain strange garbled text in Header and Footer area which is not present in source PDF
  7. and many other formatting issues.
All these problems are logged as PDFNEWNET-34080.

Our development team is further looking into the details of these problems and we will try our level best to resolve all possible problems. As soon as we have made some further progress, we would be more than happy to update you with the status of correction. Please be patient and spare us little time. We are really sorry for your inconvenience.

Hi Codewarrior ,

Thanks for your reply. Is there a work around for this in the meantime , i am against tight deadlines to get this merging functionalty working and i need a solution asap ?

thanks ,

Ali

Hi Ali,

Please note that the PDF to word conversion feature was introduced in recent release version of Aspose.Pdf for .NET and this feature is not very much mature. Nevertheless we are working hard in making this feature stable and robust to generate Word documents identical to PDF files.

I am not entirely certain that either this workaround can be useable, but you may consider converting the pages of PDF file into Image format and then create Word document based on these images. Please visit the following link for further details on

  1. Convert all PDF pages to JPEG Images
  2. In order to place Image inside Doc object, please try using Aspose.Words for .NET. I will ask my fellow worker from respective team to share further details on how to create word document from Image files.

We are sorry for your inconvenience.

Hi Codewarrior,

Thanks for the tip, I have tried this and although the images look better in the document than the PDF’s , unfortunately the performance is terrible , some of my PDF’s could have12 or more pages , and if the user selects to merge 10 documents , that’s a lot of images !! So this isn’t an option for me I’m afraid.

I have also found some code samples on your site for using field codes to include documents in another document. I have tried converting all my PDF’s to word documents and then trying to insert them into an Aspose master document using the INCLUDETEXT insert field method , unfortunately the results are similar to when I attempt to merge my documents from PDF’s into Aspose.Words and they still come out corrupted and garbled.

Aspose.Words.License license = new Aspose.Words.License();
license.SetLicense("Aspose.Words.lic");
string docsDir = @"c:\tmp\";
Document aDoc = new Document(docsDir + "testMod.doc");
DocumentBuilder db = new DocumentBuilder(aDoc);
string objPath = docsDir + "dati.doc";
db.MoveToBookmark("dati");
db.InsertField("INCLUDETEXT \"" + objPath.Replace(@"\", @"\\") + "\"").Update();
aDoc.Save(docsDir + "test.doc");

I am running out of time for using this tool , have you any other ideas I could try ? What timeframe we are looking for a turnaround on getting these issues fixed?

  • PDFNEWNET-34080 (Unresolved);
  • PDFNEWNET-33786 (Unresolved);
  • PDFNEWNET-34077 (Unresolved);
  • PDFNEWNET-34053 (Unresolved);
  • WORDSNET-6672 (Unresolved);
  • PDFNEWNET-34078 (Unresolved);
  • PDFNEWNET-34079 (Unresolved)

Thanks,
Ali

Hi Ali,


When converting PDF files into Word format using Aspose.Pdf for .NET produces garbled text files and when using the above approach to reference/include document inside another document, the same corrupted result will be produced because individual word files have improper formatted contents.

As we just have been able to notice these issues, so our development team needs little time to investigate and figure out the reasons of these issues.

Now concerning to the time-frame when these issues will be resolved, please note that all the issues logged in our issue tracking system are resolved as per schedule and according to the sequence in which they are logged. However if you need to get them resolved on priority basis, you may consider subscribing to Enterprise or Priority Support. Please note that the issues logged under these support models have high precedence in terms resolution as compare to issues logged under normal support model. Please visit the following links for further details on Support Options.

Hi Codewarrior ,

If we purchase the Priority Support what kind of turn around time are we looking at ? Will it be a week ? Or more ? I cant see any guarantee of a fix time in the support contract ?

Thanks ,

Ali

Hi Ali,


Please note that we cannot guarantee/provide accurate time span in which we can guarantee that the particular issue will be resolved because every individual issue has its own complexity and specific environment in which it needs to be replicated and then resolved. However as I have shard earlier, the issues logged under ES/PS support model have high precedence in terms of resolution on other issues in our issue tracking system. The development team first tries to resolve ES/PS issues and then starts working over issues with normal priority.

In the event of any further query, please feel free to contact.

PS, Please note that from my above statement, I did not mean that the issues logged under normal support model have less or no importance for us but, whatever I have stated, it’s related to the priority and order of resolving all the issues. Every customer’s query and every issue logged in our issue tracking system is equally valuable and important for us.

Hi Ali,


Thanks for your patience.

I am pleased to share that the issue reported earlier concerning PDF to DOC conversion for Booklet.pdf has been resolved and its resolution will be included in upcoming release version of Aspose.Pdf for .NET 7.3.0 (which is currently under testing phase and we plan to release it in next few days). Whereas concerning to the issues related to other PDF files, our development team is still working hard on resolving these issues and as soon as we have made some progress towards their resolution, we would be more than happy to update you with the status of correction. Please be patient and spare us little time. We are sorry for your inconvenience.

The issues you have found earlier (filed as PDFNEWNET-33786) have been fixed in Aspose.Pdf for .NET 7.3.0.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.