Hello
I am trying to convert doc file to PDF file.
Doc file being tested has embedded file attachment/objects like image, excel file, jar file and other possible files.
The converted PDF file has just the images of those file icons and the attachments are missing in the generated PDF file.
How to get those files and embed them in the PDF file ?
Thanks
Rahul
I tried to do use insertOleObject method of Document Builder also but it lets you embed the file only when you save it in word doc format. The moment one saves the document in the pdf format, the embedded content only icon is there in the pdf and it is not clickable.
@rahulgupta01,
To ensure a timely and accurate response, please ZIP and attach the following resources here for testing:
- Your input Word document with embedded objects
- Aspose.Words generated output document showing the undesired behavior
- Your expected document which shows the correct output. Please create this document by using Microsoft Word application.
As soon as you get these pieces of information ready, we will start investigation into your issue and provide you code to achieve the same. Thanks for your cooperation.
Please find attached sample document having embedded document (Input docs file in zipped format) along with generated PDF file
.Sample_Pdf.pdf (216.5 KB)
Sample_Pdf.pdf (216.5 KB)
Sample_Test_Doc.zip (1.8 MB)
@rahulgupta01,
Please also ZIP and attach your expected PDF document which shows the correct output. You may please create this document by using Microsoft Word or any other suitable application. We will then investigate the structure of your expected document as to how you want your final output be generated like. Thanks for your cooperation.
Sample Output file should have clickable icons or text which should lead to opening of the embedded file.
One of the sample having icon to click the embedded file is attached.
Sample_Test_Doc.zip (1.8 MB)
@rahulgupta01,
Are you seeing wrong behavior in Aspose.Words generated PDF files? If yes, then the expected final output should be PDF, but you have again attached the Word file. Please ZIP and attach your expected PDF document which shows the desired output.
I am attaching the zip vrsion of the PDF file. Please refer page 1 bottom left for clickable icon example.
Sample_Doc_Out_Attach.zip (3.3 MB)
@rahulgupta01,
I understand what you are asking about. Please see these documents (Docs.zip (2.2 MB)).
But, when you convert this “Sample_Test_Doc.docx” to PDF by using MS Word 2016, it does also not preserve embedded objects as clickable. Can you achieve the same results by using MS Word? If yes, then please provide steps to do the same by using MS Word.
I know that MS word does not also preserve embedded documents when I save the doc as PDF. But we have a requirement that we need to ensure that embedded document are also available in the newly created PDF when we we convert our doc to PDF. So, kindly tell me the procedure as to how we can achieve this. You have the expected output PDF with you.
@rahulgupta01,
We have logged this requirement in our issue tracking system. The ID of this issue is WORDSNET-17087. We will further look into the details of this problem and will keep you updated on the status of this issue.
@rahulgupta01,
Regarding WORDSNET-17087, unfortunately the implementation of the fix of this issue has been postponed till a later date (currently no estimates are available). We will inform you via this thread as soon as this issue is resolved or any estimates are available. We apologize for any inconvenience.
As a workaround maybe it will be possible to embed attachments to Aspose.Words generated PDF output via third-party PDF editing tool (like Aspose.PDF for Java). You could probably get the attachment data and position from the Aspose.Words DOM and then create a File Attachment Annotation in PDF output with Aspose.PDF. If you are interested in such workaround, we probably could investigate it further and provide a code sample.
The issues you have found earlier (filed as WORDSNET-17087) have been fixed in this Aspose.Words for Java 22.11 update also available on Maven.
The same issue is there with C# also. Is there any fix or code available for this? I am also not getting the embeded documents while converting the word to pdf. Below is my code. Can you please help me?
Aspose.Words.Saving.PdfSaveOptions opt;
opt = new Aspose.Words.Saving.PdfSaveOptions();
opt.ExportDocumentStructure = true;
opt.CustomPropertiesExport = Aspose.Words.Saving.PdfCustomPropertiesExport.Standard;
opt.Compliance = Aspose.Words.Saving.PdfCompliance.PdfA2u;
opt.SaveFormat = Aspose.Words.SaveFormat.Pdf;
opt.UseHighQualityRendering = true;
SaveOutputParameters _saveOut = doc.Save(filepath, opt);
@jithin.p You should use PdfSaveOptions.EmbedAttachments property. But please note this is not supported when saving to PDF/A and PDF/UA compliance.
Hi,
Thank you very much. It worked. I have one more question. Is there any option to get the EmbedFonts frpm PDF while appending the PDF document to a word document?
We are using the code to convert from PDF to word document then this word document is appending to another main word document. Because the PDF contains EmbedFonts, it is causing the formatting and alignment issues for this particular word document only. Can you please help me? Below are my codes.
//Convert PDF to Word
Aspose.Pdf.Document myDocPdf = GetDocumentByPathPDF(docPath);
string[] _fileNamArr = docPath.Split('.');
myDocPdf.Save(template_path + _fileNamArr[0] + ".docx", Aspose.Pdf.SaveFormat.DocX);
//Appending this converted Word to another Word document.
ImportFormatOptions importFormatOptions = new ImportFormatOptions();
importFormatOptions.KeepSourceNumbering = true;
importFormatOptions.IgnoreHeaderFooter = false;
temp.FirstSection.PageSetup.SectionStart = SectionStart.NewPage;
cloneBuildinPropeerties(temp, OutPut);
OutPut.AppendDocument(temp, ImportFormatMode.KeepSourceFormatting, importFormatOptions);
Thanks
Jithin V P
@jithin.p Could you please attach your input and output documents here for testing? We will check the issue and provide you more information.
Hi,
Attached the Input PDF and Output word document. Due to data privacy i only published the data from the input PDF to the output word document. Rest of the appended data from another documents in the output word document is being deleted. Please guide me to fix this.
Output.docx (172.6 KB)
Input.pdf (129.9 KB)
Thanks
Jithin V P
@jithin.p To export fonts as embedded into the DOCX document you should set FontInfos.EmbedTrueTypeFonts
property.
Aspose.Pdf.Document myDocPdf = new Aspose.Pdf.Document(@"C:\Temp\in.pdf");
myDocPdf.Save(@"C:\Temp\tmp.docx", Aspose.Pdf.SaveFormat.DocX);
//Appending this converted Word to another Word document.
ImportFormatOptions importFormatOptions = new ImportFormatOptions();
importFormatOptions.KeepSourceNumbering = true;
importFormatOptions.IgnoreHeaderFooter = false;
Document temp = new Document(@"C:\Temp\tmp.docx");
temp.FirstSection.PageSetup.SectionStart = SectionStart.NewPage;
Document doc = new Document();
doc.FontInfos.EmbedTrueTypeFonts = true;
doc.AppendDocument(temp, ImportFormatMode.KeepSourceFormatting, importFormatOptions);
doc.Save(@"C:\Temp\out.docx");