Hiii Aspose Team
I am trying to read each and every node in a source word document and then trying to merge two paragraph based on certain conditions in the newDocument object.
In that case I find that the font for both of it shows Times 12 but the size in the destination seems to be bigger and hence there occurs unnecessary page break for the same.
Can you please help me how can I resolve this issue urgently?
Attached the source code and the source and destination documents respectively for your reference.
In the Page no 2 of the destination document you can see a blank page coming because of the respective increase in the rendering of the fonts in the source and destination document.
Please help me how can I resolve this issue as it is very urgent…
Actually I simply tried copying source formatting to the destination formatting, still the font size shows same but is actually different. I tried the following code and still facing the same issue :-
private static void copyFullDocument() throws Exception {
String MyDir = “\\lngdays-dev069\Render\Gayatri\CC POC\”;
StringBuffer pargraphsText = new StringBuffer();
byte[] buff = new byte[8000];
InputStream is;
try {
is = new FileInputStream(MyDir + “Test-source.docx”);
int bytesRead = 0;
ByteArrayOutputStream bao = new ByteArrayOutputStream();
try {
while ((bytesRead = is.read(buff)) != -1) {
bao.write(buff, 0, bytesRead);
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
byte[] data = bao.toByteArray();
ByteArrayInputStream inStream = new ByteArrayInputStream(data);
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
// String filePath = “\\lngdays-dev069\Render\Gayatri\CC
// POC\Connie1_2016-Ohio-5124 pdf
// 00500000B990MD-with-highlights.docx”;
// The document that the content will be appended to.
Document dstDoc = new Document();
dstDoc.removeAllChildren();
// The document to append.
Document srcDoc = new Document(inStream);
// Append the source document to the destination document.
// Pass format mode to retain the original formatting of the source
// document when importing it.
dstDoc.appendDocument(srcDoc, ImportFormatMode.KEEP_SOURCE_FORMATTING);
// Save the document.
dstDoc.save(MyDir + “Out3.docx”);
} catch (
Exception e) {
// TODO Auto-generated catch block
System.out.println(“Exception is >>>>” + e.getMessage());
e.printStackTrace();
}
}
Please use the attached source documentin previous post as the source document.
Is it microsoft word behaviuor that the same font is getting rendered differently in both the documents ? If it is so how can I get it resolved?
When I ran the same code, in my PC the page number “2” comes in next page .
So any idea why is it happening so and is it word behaviour. If it is work behaviour how can I rectify it in my system.
1) to Run nodes by using Character Styles e.g. a Glyph Style,
2) to the parent of those Run nodes i.e. a Paragraph node (possibly via paragraph Styles)
3) you can also apply direct formatting to Run nodes by using Run attributes (Font). In this case the Run will inherit formatting of Paragraph Style, a Glyph Style and then direct formatting.
// Retrieve all paragraphs in the document.
NodeCollection paragraphs = doc.getChildNodes(NodeType.PARAGRAPH, true);
// Iterate through all paragraphs
for (Paragraph para : (Iterable) paragraphs) {
{
// Check all runs in the paragraph for page breaks and remove them.
for (Run run : para.getRuns()) {
if (run.getText().contains(ControlChar.PAGE_BREAK))
run.setText(run.getText().replace(ControlChar.PAGE_BREAK, ""));
}
}
}
}
Hi Tahir
The source document here is the Document which I converted from PDF to docx using the below code :-
String filesLocation = “C:\DOcuments\462887.481663.Decision.doc.pdf.00500000B1D1F6.pdff”;
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(filesLocation);
// Instantiate Doc SaveOptions instance
DocSaveOptions saveOptions = new DocSaveOptions();
// Set output file format as DOCX
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
saveOptions.setMode(DocSaveOptions.RecognitionMode.Flow);
saveOptions.setMaxDistanceBetweenTextLines(3.5f);
saveOptions.setAddReturnToLineEnd(false);
// Save the file into Microsoft document format
pdfDocument.save(“C:\Documents\462887.481663.Decision.doc.pdf.00500000B1D1F6_converted.docx”,
saveOptions);
Now I am using the same generated docx
as input again and then generating a new document. The font if you see
in the generated document differs from the new document’s font which I
have generated.
As you have mentioned I am already removing page
breaks and copying all font styles as it is. You can check in my
previously attached code.
Attaching the respective pdf file and the generated and new docx files as well.
Can you please help me figure out why the font differs in such case and while conversion from pdf to docx on what basis the fonts are getting created in the docx file.
Hi Gayatri,
Hi fahadadeel
Thank you for replying. I understand that there are different fonts in the source document which is not getting rendered in the new document I am trying to copying.
I know that there are some fonts in the pdf file which is creating problem. What I am now trying to do is trying to extract all the fons from the pdf using FOntCOllection and then saving them to a folder and then converting the Pdf to docx. .Then I install those ttf fonts and then create a new docx document from the generated one which actually gives me correct output.
Here is the code which I am using while converting pdf to docx and extracting the fonts into a folder.:-
private static void singleConvertPdfToDoc() {
// TODO Auto-generated method stub
String filesLocation = “\\lngdays-dev069\Render\Gayatri\CC POC\TestFonts\462887.481663.Decision.doc.pdf.00500000B1D1F6.pdf”;
String fileName = filesLocation.substring(filesLocation.lastIndexOf(’\’) + 1, filesLocation.length());
fileNameOnly = fileName.substring(0, fileName.lastIndexOf(’.’));
testOut = “\\lngdays-dev069\Render\Gayatri\CC POC\TestFonts\out\”.concat(fileNameOnly);
byte[] buff = new byte[8000];
InputStream is;
try {
Path path = Paths.get(testOut);
// if directory exists?
if (!Files.exists(path)) {
try {
Files.createDirectories(path);
} catch (IOException e) {
// fail to create directory
e.printStackTrace();
}
}
String fontCacheFolder = testOut + fileNameOnly + “_fonts_preSaved\”;
String cacheFontFileTemplate = fontCacheFolder + “font%1$s.ttf”;
Path fontOutFolderPath = Paths.get(testOut, fileNameOnly, “_fonts\”);
// Folder that will contain fonts as a result of the conversion
// procedure
String fontOutFolder = fontOutFolderPath.toAbsolutePath().toString();
Path pathFontCacheFolder = Paths.get(fontCacheFolder);
// if directory exists?
if (!Files.exists(pathFontCacheFolder)) {
try {
Files.createDirectories(pathFontCacheFolder);
} catch (IOException e) {
// fail to create directory
e.printStackTrace();
}
}
Path pathFontOutFolder = Paths.get(fontOutFolder);
// if directory exists?
if (!Files.exists(pathFontOutFolder)) {
try {
Files.createDirectories(pathFontOutFolder);
} catch (IOException e) {
// fail to create directory
e.printStackTrace();
}
}
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(filesLocation);
FontAbsorber fa = new FontAbsorber();
fa.visit(pdfDocument);
FontCollection fc = fa.getFonts();
ArrayList fontFiles = new ArrayList();
// Save all the fonts in the cache folder
int fontNum = 0;
for (com.aspose.pdf.Font font : (Iterable<com.aspose.pdf.Font>) fc) {
String cacheFontFile = String.format(cacheFontFileTemplate, Integer.toString(fontNum++));
FileOutputStream out = new FileOutputStream(cacheFontFile);
font.save(out);
fontFiles.add(cacheFontFile);
}
// Instantiate Doc SaveOptions instance
DocSaveOptions saveOptions = new DocSaveOptions();
// Set output file format as DOCX
saveOptions.setFormat(DocSaveOptions.DocFormat.DocX);
saveOptions.setMode(DocSaveOptions.RecognitionMode.Flow);
saveOptions.setMaxDistanceBetweenTextLines(3.5f);
saveOptions.setAddReturnToLineEnd(false);
// Save the file into Microsoft document format
pdfDocument.save(
“\\lngdays-dev069\Render\Gayatri\CC POC\TestFonts\462887.481663.Decision.doc.pdf.00500000B1D1F6_latest.docx”,
saveOptions);
System.out.println(“convertion PDF To DOC is done”);
} catch (
Exception e) {
// TODO Auto-generated catch block
System.out.println(“Exception is >>>>” + e.getMessage());
e.printStackTrace();
}
}
Now my doubt is that is there any way where I could use the extracted fonts from the pdf in the new document which I am creating. I mean to say like I wnt to read those fonts from the folder and then create a new document based on those fonts. I tried the following but it did not work :-
Document newDoc = new Document();
newDoc.removeAllChildren();
FontSettings f = new FontSettings();
f.setFontsFolder(“fontsFolder”, true);
newDoc.setFontSettings(f);
Attaching the source code and source pdf and the generated docx again for your refrence. Please help me as soon as possible because I am stuck in this from very long time.
Hi Gayatri,
Hi Gayatri