Create word to pdf

rolama · December 15, 2014, 3:43pm

In creating the word from the pdf texts come with a type of invalid letter “HWJABL+Helvetica” or “RONPLM+Helvetica-Bold” and print out blank.

Code:

com.aspose.pdf.License license= new com.aspose.pdf.License();

// Load the license file into FileStream object

try {

license.setLicense(new java.io.FileInputStream(“C:\lib\Aspose.Pdf.lic”));

} catch (FileNotFoundException e2) {

// TODO Auto-generated catch block

e2.printStackTrace();

}

com.aspose.pdf.Document document = new com.aspose.pdf.Document(filePath);

//Create DocSaveOptions object

com.aspose.pdf.DocSaveOptions saveOptions = new com.aspose.pdf.DocSaveOptions();

//Set the recognition mode as Flow

saveOptions.setMode(com.aspose.pdf.DocSaveOptions.RecognitionMode.Flow);

saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);

//Set the Horizontal proximity as 2.5

saveOptions.setRelativeHorizontalProximity(2.5f);

//Enable the value to recognize bullets during conversion process

saveOptions.setRecognizeBullets(true);

document.save(fileOut, saveOptions);

tilal.ahmad · December 15, 2014, 9:51pm

Hi Mario,

Thanks for your inquiry. We will appreciate it if you please share your sample input document. We will test the scenario at our end and will guide you accordingly.

We are sorry for the inconvenience caused.

Best Regards,

rolama · December 16, 2014, 9:30am

Ok, this is the document input (PDF) and output word.

rolama · December 16, 2014, 12:33pm

If the Word document is opened with WinZip, you can see this by inserting a typeface characters…

Something is wrong creating your jar…

I add the xml of this document…

tilal.ahmad · December 17, 2014, 7:54am

Hi Mario,

Thanks for your patience. Please note that font name generation behavior is by design. PDF file can contain several fonts with same names, so, prefixes are added for guaranty to identify font names uniquely. I have printed the generated DOC file without any issue. Can you please share some more details about the issue your are facing? So we will guide you exactly.

Moreover, The prefixed names are actually subsets embedded into the document. The fonts are subsets in the PDF and also defined with prefixes. The prefixes are different (during document processing new font subsets are generated based on source fonts with new prefixes). It cannot be turned off.

The prefix generation algorithm is simple:
six randomly generated characters][plus sign][font name].

Please feel free to contact us for any further assistance.

Best Regards,

rolama · December 17, 2014, 4:12pm

There is a way to add a default font and change that brings the original pdf?

tilal.ahmad · December 17, 2014, 10:53pm

Hi Mario,

Thanks for your inquiry. I am afraid there is not a specific property for whole PDF document but you can replace font of all TextFragments before conversion as following. Hopefully it will help you to accomplish the task.

com.aspose.pdf.Document document = new
com.aspose.pdf.Document(filePath);<o:p></o:p>

//setting default font

com.aspose.pdf.TextFragmentAbsorber tfa = new com.aspose.pdf.TextFragmentAbsorber();

doc.getPages().accept(tfa);

com.aspose.pdf.TextFragmentCollection tfc = tfa.getTextFragments();

for (com.aspose.pdf.TextFragment tf : (Iterable) tfc)

tf.getTextState().setFont(com.aspose.pdf.FontRepository.findFont("MSGothic"));

//Create DocSaveOptions object

com.aspose.pdf.DocSaveOptions saveOptions = new com.aspose.pdf.DocSaveOptions();

//Set the recognition mode as Flow

saveOptions.setMode(com.aspose.pdf.DocSaveOptions.RecognitionMode.Flow);

saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);

//Set the Horizontal proximity as 2.5

saveOptions.setRelativeHorizontalProximity(2.5f);

//Enable the value to recognize bullets during conversion process

saveOptions.setRecognizeBullets(true);

document.save(fileOut, saveOptions);

Please feel free to contact us for any further assistance.

Best Regards,

rolama · December 18, 2014, 11:33am

works perfect, only one problem, the document is bold and no longer respected, is there anyway to maintain backward bolding?

codewarior · December 21, 2014, 5:31am

Hi Mario,

Do you mean that source/input PDF contains text in bolder formatting but resultant DOC files does not have text in Bold formatting ? If so is the case, then please share the resource file so that we can test the scenario at our end. We are sorry for this inconvenience.