DOCX to PDF conversion issue with RTL layout using Java

I am currently evaluating Aspose Word for Java.


I have a set of Word (DOCX and DOC) and PDF templates that are to be used in mail merge to produce an output document in PDF that is bar coded. As a first step of the evaluation I took a DOCX file and producing PDF output. I now want to use mail merge (merge data with the DOCX file) and produce an output PDF file. As this will be wrapped as a service, the data set will need to be provided as an argument. The examples that I saw used DataTable which was a wrapper for ResultSet. I get the data set as an argument to the service, that may be in the form of CSV, XML, JSON etc, and is nested. Are there examples that allow me to do this in Java ?

For DOCX files, the mail merge can be complex as it includes master / client details (such as an invoice) and the merge needs to be able to handle page flows as a table extends beyond a page. While these documents today are in English, I would need to support alternate languages such as Arabic (including Right To Left). As these documents may be rendered online or batched up for overnight generation and delivery to print house the fidelity of the documents needs to be high.

Thanks

- viraf

Hi Viraf,

Thanks for your inquiry. You need to set the locale for the merge fields in your document using the FieldUpdateCultureSource property. This way you can set the appropriate language settings in the source document without being required to change the default locale for your application.

Please see this page for details: http://docs.aspose.com/display/wordsnet/How+to+Update+Fields

Thanks - I actually was asking a number of questions. Going through the forum I see that RTL in Arabic is an issue. I discovered the IMailMergeDataSource and am currently experimenting with it.

  • When I do a mail merge, I want each record to be saved as a separate output document, rather than having the output be a single document. Is there a way to do that ?
  • I noticed that the performance is quite poor - it takes approx 2.2 seconds on my machine to read a word document, mail merge (1 record) and save it as a PDF. So to generate 100K documents it will take approx 61 hours when I have 1 to 2 hours for this. I am hoping that I am doing something wrong - maybe some benchmarks if you have them would be great.
  • I have noticed that fonts are being embedded in the output PDF, thus the output file size is large (and possibly a contributor to the performance). I tried playing with PdfSaveOptions with no avail.
Thanks

Hi Viraf,

Thanks for sharing the more information.

  1. RTL in Arabic is an issue. We had already logged this feature request as WORDSNET-1980 in our issue tracking system. I have linked this thread to appropriate feature and you will be notified via this forum thread once this feature is available.

  2. You can save document with each records by using IMailMergeDataSource. Please loop through all records and pass each record as data source to ExecuteWithRegions method. Please read this post for your kind reference.

  3. Regarding your last two questions, Please share your template document along with data source (CSV/XML etc) for investigation purposes.

Thanks - I tried using mailmerge option 2 in a loop - processing time for 1000 documents dropped to 32ms on average each. I suspect the time in the first example was largely related to initial load / parsing of the word document. However for single requests 2.2 seconds is still long. Will provide sample document code downstream.


Given that I am creating PDF output, I was wondering if it was more efficient to convert DOCX files to PDF (through an automated process) and then use the PDF template for mailmerge and barcode insertion (In the past I have found filling forms and inserting barcodes on a PDF was very efficient). If this is a viable option :

  • Does the DOCX to PDF conversion convert MERGEFIELDs to Acro FORM fields?
  • Does the DOCX to PDF conversion support Form fields in a DOCX document ?
  • If there an efficient means to convert DOCX to PDF Document (in memory representation rather then streaming and reloading) ?
  • Does the DOCX or PDF Document have a lighter memory footprint (in case I want to cache the template)
  • What is the most efficient means to insert a barcode ? I tried specifying a barcode font for a MERGEFIELD in the DOCX but it did not render correctly. Would it be more efficent to generate the barcode in the DOCX and then save to PDF or create a in-memory representation of the PDF and then insert thee barcode ?
I also just looked at the PDF Kit, and I dont see a way to clone a form template for multiple uses. For example, if I want to do a mail merge, I load a PDF form into memory, fill in fields for a record at a time, and store the result. Reading and processing the file X times could be expensive. Please let me know how to reuse a form template for multiple records.


Hi Viraf,

Thanks for sharing the information. Regarding your first two questions (Docx to PDF conversion), Aspose.Words mimics the same behavior as MS Word do. Please use Aspose.Pdf to work with form fields in PDF. You can post such queries at Aspose.Pdf forum.

Please share some more information about your question 3, 4, 5 and 6.

  1. Does the DOCX to PDF conversion convert MERGEFIELDs to Acro FORM fields?
  2. Does the DOCX to PDF conversion support Form fields in a DOCX document ?
  3. If there an efficient means to convert DOCX to PDF Document (in memory representation rather then streaming and reloading) ?
  4. Does the DOCX or PDF Document have a lighter memory footprint (in case I want to cache the template)
  5. What
    is the most efficient means to insert a barcode ? I tried specifying a
    barcode font for a MERGEFIELD in the DOCX but it did not render
    correctly. Would it be more efficent to generate the barcode in the
    DOCX and then save to PDF or create a in-memory representation of the
    PDF and then insert thee barcode ?
  6. Please let me know how to reuse a form template for multiple records.

The issues you have found earlier (filed as WORDSNET-1980) have been fixed in this .NET update and this Java update.


This message was posted using Notification2Forum from Downloads module by aspose.notifier.
(16)