Blank page added while convert from html to pdf

Hi,

We are in evaluation phase of the aspose for our client to generate pdf files using html.

We are trying to convert html to pdf using aspose words.
I observed that there is one extra page is getting added at the end which is blank. the content is shown in the first page and its complete, but another page is added after the content.

Please find the attached input and output files for the same.

code snippet used:
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(saveAsposeWordDoc(mergedHtml.trim(), userId, propertiesMap));

//mergedHtml is the html content we got from UI.
public static String saveAsposeWordDoc(String mergedHtml, String userId, Map<String, String> propertiesMap) throws Exception{
String returnFilePath =“C:\Aspose\pdf\outputPDF.pdf”;
try {
InputStream is =
new ByteArrayInputStream(mergedHtml.getBytes(“UTF-8”));
com.aspose.words.Document doc = new com.aspose.words.Document(is);
doc.getFirstSection().getPageSetup().setTopMargin(30);//this is required for first page
com.aspose.words.DocumentBuilder builder = new com.aspose.words.DocumentBuilder(doc);
builder.getPageSetup().setBottomMargin(210);//we need to leave this space at bottom
builder.getDocument().save(returnFilePath);
} catch (Exception e) {
e.printStackTrace();
}
return returnFilePath;
}

can you please check this and let me know if I miss anything to set page dimentions.

Thanks.
Shivaji

Hi Shivaji,

Thanks for your inquiry. You are setting page bottom margin to 210. You are getting this issue due to page margins.

Please note that Aspose.Words mimics the same behavior as MS Word does. If you open the same html in MS Word and set the same top and bottom margins, you will get two pages in output. However, there should be some contents at second page also. This is an issue.

Could you please share some detail about your requirement? We will then provide you more information about your query and log this issue according to your requirements.

Thank you Tahir, for reply.

Actually we want to leave some space at the bottom of page, where we will display stamp and signature with aligned right and left respectively.
always first page will have stamp, and at the end of the content we will display the signature, if the content is fit in the single page, then signature and stamp should be displayed in the same page.

Here, at the end of the content there we see the space, but still its getting generated one extra page as blank page as explained.

Please let us know as how can we fix this issue.

Thanks,
Shivaji.

Hi Shivaji,

Thanks for sharing the detail. I have tested the scenario and have managed to reproduce the same issue at my side. For the sake of correction, I have logged this problem in our issue tracking system as WORDSNET-12560. I have linked this forum thread to the same issue and you will be notified via this forum thread once this issue is resolved.

We apologize for your inconvenience.

Hi,

We are using aspose.words for java, to generate our pdf file, not the .NET.

Please make sure, its fixed in the java component also, as i observed the issue tracking is with .net. wanted to clarify this.

Thanks in advance!
Shivaji

Hi Shivaji,

Thanks for your inquiry. The same issues will be fixed in Aspose.Words for .Net and Java.

Please
note that the latest version of Aspose.Words for Java is completely
auto-ported from .NET, i.e. we do not write code for Aspose.Words for
Java; it is generated out automatically from C# code of Aspose.Words for
.NET. So there should not be any significant difference in
functionalities between Java and .NET versions because the code is
mostly the same.

Please let us know if you have any more queries.

Thank you for clarifying this.

Hi,

Can you please provide me workaround for this. can I remove the page which is added at the end.

I was trying to remove it using the below snippet, but its not working as expected.

Page page =
pdfDocument.getPages().get_Item(pdfDocument.getPages().size());
OperatorCollection collection = page.getContents();
if(collection.size()<=0 || page.getAnnotations().size()<=0){
pdfDocument.getPages().delete(pdfDocument.getPages().size());
}

here getAnnotations().size() giving as empty always. where collection.size() returns 20, hence its not working as expected.
I dont understand why the content is showing something, how to identify this and remove the page.

Please help on this regard and let me know if more details required.

Thanks.
Shivaji

Hi Shivaji,


Thanks for your inquiry. I would like to share with you that your output document will have two pages after setting the top margin to 30 and bottom margin to 210. Please load the input html in MS Word and set the top and bottom margins. The output document will be looked like as shared in attached image.
shivaji_dole:
I was trying to remove it using the below snippet, but its not working as expected.

Page page =
pdfDocument.getPages().get_Item(pdfDocument.getPages().size());
OperatorCollection collection = page.getContents();
if(collection.size()<=0 || page.getAnnotations().size()<=0){
pdfDocument.getPages().delete(pdfDocument.getPages().size());
}
Your query is related to Aspose.Pdf. I am moving this forum thread to Aspose.Total forum. My colleagues from Aspose.Pdf team will reply you shortly.

shivaji_dole:

I was trying to remove it using the below snippet, but its not working as expected.

Page page =
pdfDocument.getPages().get_Item(pdfDocument.getPages().size());
OperatorCollection collection = page.getContents();
if(collection.size()<=0 || page.getAnnotations().size()<=0){
pdfDocument.getPages().delete(pdfDocument.getPages().size());
}

here getAnnotations().size() giving as empty always. where collection.size() returns 20, hence its not working as expected.
I dont understand why the content is showing something, how to identify this and remove the page.
Hi Shivaji,

Can you please share why you are using OperatorCollection object ? is it because you need to check if the page contains any objects i.e. Image, Text etc. If so is the case, then you may follow the instructions specified over following link to Find whether PDF file contains images or text only. When using these instructions, you may consider creating a temporary PDF file only containing last page and then use that file with above stated code. Furthermore, in order to delete the page, Page.delete(…) method should work.

PS, the code on above article is in .NET but it gives a clear idea on what approach should be followed.

The issues you have found earlier (filed as WORDSNET-4794) have been fixed in this Aspose.Words for .NET 22.8 update also available on NuGet.