Hi,
- out-aspose.words-13.8.0.docx: It was generated out from “2_PDFtoDOCX.docx” using Aspose.Words 13.8.0
- out-ms-word-2013.docx: It was generated out from “2_PDFtoDOCX.docx” using Microsoft Word 2013. The layout of document elements is the same as can be seen in Aspose.Words’ generated output.
- out-aspose.words-13.8.0.pdf: It was generated out from “out-aspose.words-13.8.0.docx” using Aspose.Words 13.8.0
Thank you for the quick response.
Hi there,
tilal.ahmad:Hi there,Thanks for your Inquiry. You can easily find and replace any text from PDF document using Aspose.Pdf.TextFragmentAbsorber object allows you to find text, matching a particular phrase, from a PDF document. Then accept method of Pages collection return a collection of TextFramgment and you can loop through all the fragments and get their properties like Text its Position FontName and FontSize etc. You can set/change value of any property as well. Please check following documentation link hopefully it will serve the purpose.Please feel free to contact us for any further assistance.Best Regards,
Hi Ka Weng,
Ka Weng:Thank you for the quick response.I updated the reference with the lastest Aspose.Words dll (13.8.0.0) and the result PDF file is still offset to the top. I attached the following files if you can take a look at it:code_snippet.cs - A snippet of how I generate the files below. I think this should be pretty much similar to what you have done.1_pdftodocx.docx - pdf to docx with Aspoes.PDF2_docxtodocx.docx - docx to docx with Aspose.Words3_docxtopdf.pdf - docx to pdf with Aspose.PDFThanks, and hopefully the Aspose.PDF guys would chime in too.
kawenglou:1. If there are multiple text properties (different font type, size, etc), the TextFragment can only represent the first text property of the object. The other text properties are "lost". Correct?Yes you are correct, if you search a phrase it will return text properties of first word of the phrase. You can search an individual word instead a phrase.kawenglou:2. This is regarding performance. Do you guys recommend text modification (such as translating a document) through using search and replace within Aspose.PDF? If I want to go with converting the document from pdf to docx, would there be a big performance hit?Hopefully there wouldn't be any performance issue. However the processing time will depend upon the PDF document contents/size and system resources.kawenglou:3. This is from my original post. When you convert a pdf to docx with Aspose.PDF, it seems to be missing some of the properties. Is it necessary to do a double conversion to get a "true" docx file?I've noticed underline and strike off line is displaced. I've logged an investigation ticket for the issue asPDFNEWNET-35781 in our issue tracking system. We will update you as soon as its resolved.However, you can use Textbox value of RecognitionMode to preserve original look of PDF but , but the edit ability of the resulting document could be limited.............DocSaveOptions saveOptions = new DocSaveOptions();saveOptions.Mode = DocSaveOptions.RecognitionMode.Textbox;..........Best Regards,
awais.hafeez:Hi Ka Weng,Ka Weng:Thank you for the quick response.I updated the reference with the lastest Aspose.Words dll (13.8.0.0) and the result PDF file is still offset to the top. I attached the following files if you can take a look at it:code_snippet.cs - A snippet of how I generate the files below. I think this should be pretty much similar to what you have done.1_pdftodocx.docx - pdf to docx with Aspoes.PDF2_docxtodocx.docx - docx to docx with Aspose.Words3_docxtopdf.pdf - docx to pdf with Aspose.PDFThanks, and hopefully the Aspose.PDF guys would chime in too.Thanks for the additional information. While using the latest version of Aspose.Words i.e. 13.8.0, I managed to reproduce this issue on my side. I have logged this issue in our bug tracking system. The ID of this issue is WORDSNET-8883. Your request has also been linked to this issue and you will be notified as soon as it is resolved. Sorry for the inconvenience.Best regards,
tilal.ahmad:Hi there,Thanks for your feedback.Yes you are correct, if you search a phrase it will return text properties of first word of the phrase. You can search an individual word instead a phrase.Understood. One of the concern I have performing a word-for-word search and replace is, the last instance will overwrite the properties of all other instances. Unless you keep track of the structure yourself.That leads me to the following question, will Aspose allows user to traverse through the file structure of a pdf file? Or is this outside of the project scope.tilal.ahmad:Hopefully there wouldn't be any performance issue. However the processing time will depend upon the PDF document contents/size and system resources.Got it.tilal.ahmad:I've noticed underline and strike off line is displaced. I've logged an investigation ticket for the issue asPDFNEWNET-35781 in our issue tracking system. We will update you as soon as its resolved.However, you can use Textbox value of RecognitionMode to preserve original look of PDF but , but the edit ability of the resulting document could be limited.............DocSaveOptions saveOptions = new DocSaveOptions();saveOptions.Mode = DocSaveOptions.RecognitionMode.Textbox;..........Best Regards,
awais.hafeez:Hi,Thanks for your interest in Aspose products. I will answer your questions related to Aspose.Words.In case Aspose.Words encounters a problem that can be resolved upon loading a document, it recovers that document silently. In your case, during loading your "2_PDFtoDOCX.docx" document in memory by using the latest version of Aspose.Words 13.8.0 and then converting/rendering it to DOCX/PDF formats, I have observed that the latest version of Aspose.Words correctly mimics the behaviour of Microsoft Word 2013. To confirm the correctness of Aspose.Words 13.8.0, I have attached the following three files here for your reference.
- out-aspose.words-13.8.0.docx: It was generated out from "2_PDFtoDOCX.docx" using Aspose.Words 13.8.0
- out-ms-word-2013.docx: It was generated out from "2_PDFtoDOCX.docx" using Microsoft Word 2013. The layout of document elements is the same as can be seen in Aspose.Words' generated output.
- out-aspose.words-13.8.0.pdf: It was generated out from "out-aspose.words-13.8.0.docx" using Aspose.Words 13.8.0
Put simply, in order to fix the problems i.e. introduced during bad DOCX to good DOCX conversion and then good DOCX to final PDF rendering, you please upgrade to the latest Aspose.Words version from here. I hope, this helps.Regarding your Aspose.Pdf related query, my colleagues from Aspose.Pdf component team will answer you shortly.Best regards,
kawenglou:Understood. One of the concern I have performing a word-for-word search and replace is, the last instance will overwrite the properties of all other instances. Unless you keep track of the structure yourself.That leads me to the following question, will Aspose allows user to traverse through the file structure of a pdf file? Or is this outside of the project scope.I'm afraid currently traversing PDF file structure is not supported at the moment. However I've logged a feature request as PDFNEWNET-35795 for the same in our issue tracking system.kawenglou:I would like to clarify: I am talking about the actual structure of the docx files.In the attached screenshots, I am comparing the file structure of a PDFtoDOCX file converted by Aspose.PDF and a DOCXtoDOCX file converted by Aspose.Words. The PDFtoDOCX file is missing some properties and settings xml files. I need these properties and settings files in order to generate a docx in OpenXML.Looking at the documentation, the Save() method allows the SaveOptions object and the ContentDisposition enum. But they don't seem to change the structure of the file itself.Is this the intended behavior? Or am I missing a SaveOptions/SaveFormat somewhere?Thanks for sharing additional information. I've logged your comparison details as PDFNEWNET-35796 for enhancement in PDF to DOC/DOCX conversion. Moreover, currently there are no other settings/properties to overcome highlighted structure difference.Best Regards,
Hi Ka Weng,
DOM
Document badDocx = new Document(@"C:\DocumentSamples\2_PDFtoDOCX.docx");
//re-save with Aspose.Words to generate a good docx
badDocx.Save(@"C:\DocumentSamples\out-aspose.words-13.8.0.docx");
//now load the good docx in Aspose.Words' DOM
Document goodDocx = new Document(@"C:\DocumentSamples\out-aspose.words-13.8.0.docx");
//re-save with Aspose.Words to generate a resultant pdf
goodDocx.Save(@"C:\DocumentSamples\out-aspose.words-13.8.0.pdf");
tilal.ahmad:I'm afraid currently traversing PDF file structure is not supported at the moment. However I've logged a feature request as PDFNEWNET-35795 for the same in our issue tracking system.
tilal.ahmad:Thanks for sharing additional information. I've logged your comparison details as PDFNEWNET-35796 for enhancement in PDF to DOC/DOCX conversion. Moreover, currently there are no other settings/properties to overcome highlighted structure difference.Best Regards,
awais.hafeez:Hi Ka Weng,Thanks for your inquiry. Considering the documents you attached with this post, please try run the following code to be able to generate a good resultant PDF file.//load the bad docx in Aspose.Words' DOMDocument badDocx = new Document(@"C:\DocumentSamples\2_PDFtoDOCX.docx");
//re-save with Aspose.Words to generate a good docx
badDocx.Save(@"C:\DocumentSamples\out-aspose.words-13.8.0.docx");
//now load the good docx in Aspose.Words' DOM
Document goodDocx = new Document(@"C:\DocumentSamples\out-aspose.words-13.8.0.docx");
//re-save with Aspose.Words to generate a resultant pdf
goodDocx.Save(@"C:\DocumentSamples\out-aspose.words-13.8.0.pdf");
Best regards,
Hi Ka Weng,
Thank you for the help, and will definitely keep this bookmarked.
The issues you have found earlier (filed as WORDSNET-8883) have been fixed in this .NET update and this Java update.
This message was posted using Notification2Forum from Downloads module by aspose.notifier.
The issues you have found earlier (filed as PDFNEWNET-35781) have been fixed in Aspose.Pdf for .NET 8.9.0.
This message was posted using Notification2Forum from Downloads module by Aspose Notifier.
Hi there,
Document doc = new Document(inFile);<o:p></o:p>
DocSaveOptions saveOptions = new DocSaveOptions();
saveOptions.Format = DocSaveOptions.DocFormat.DocX;
saveOptions.Mode = DocSaveOptions.RecognitionMode.Flow;
string outFileName = inFile.Replace(".pdf", "_35796.docx");
doc.Save(outFileName, saveOptions);
Please feel free to contact us for any further assistance.
Best Regards,
Hi there,
In reference to your discussion in this thread you want to get access to text properties and that cane be done with TextFragmentAbsorber + TextFragment approach successfully.
Please provide more details, probably i’m missing something so it will help us to analyze and implement your feature request.
Looking forward to your feedback.
Best Regards,