We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Convert PDF to HTML page produce huge space where page break should be

Hi,

I am using a temp license to evaluate the product.
When I convert a multi page pdf file to html page, I get big space where page breaks should be.
I am using a java program to call the api:

com.aspose.pdf.Document doc = new com.aspose.pdf.Document(“pdffile.pdf”);
// Instantiate HTML Save options object
HtmlSaveOptions newOptions = new HtmlSaveOptions();

// Enable option to embed all resources inside the HTML
newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;

// This is just optimization for IE and can be omitted
newOptions.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;
newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;
// Output file path
String outHtmlFile = “pdftohtml.html”;
// Save the output file
doc.save(outHtmlFile, newOptions);


Can you please help me fix this?

Thanks

Hi Sarmir,


Thanks for using our API’s.

Can you please share the resource/input PDF document, so that we can test the conversion in our environment. We are sorry for this inconvenience.

Hi Sarmir,


Thanks for sharing the resource file.

I have tested the scenario using latest release of Aspose.Pdf for .NET 11.3.0 and I am unable to notice any big space related issue in resultant PDF file. For your reference, I have also attached the output generated over my end.

[Java]

com.aspose.pdf.Document doc = new
com.aspose.pdf.Document(“c:/pdftest/document.pdf”);<o:p></o:p>

// Instantiate HTML Save options object

HtmlSaveOptions newOptions = new HtmlSaveOptions();

// Enable option to embed all resources inside the HTML

newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedAllIntoHtml;

// This is just optimization for IE and can be omitted

newOptions.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;

newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsEmbeddedPartsOfPngPageBackground;

newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;

// Output file path

String outHtmlFile = "c:/pdftest/document_pdftohtml.html";

// Save the output file

doc.save(outHtmlFile, newOptions);

Thanks for your response.
Actually in the html you sent me, we see the same issue that I brought up in the first place.
Notice in the attached segment from the html, the huge space between the line that ends with the word ‘bonds,’ and the line that starts with ‘SB 114 (Liu) 4/7/15’.
If you look at the pdf file, that is where a page break is, the html didn’t treat this as a single empty line.
When we looked at the generated html, we think the issue may be is related to the way the pdf pages are translated and the corresponding div size allocated to contain the text in the pdf page. So we anticipate that if the pdf page has a single line of text, the space generated will be much bigger than our example.
Please advise if there is any workaround since we need to have a single html page generated for however many pdf pages we have in a single pdf file.

Thanks

Charge the committee to determine whether it is “necessary or desirable” to issue the

bonds,

SB 114 (Liu) 4/7/15

Page 2 of 8

Hi Sarmir,


Thanks for sharing the details.

I have tested the scenario and I am able to
notice the same problem. For the sake of correction, I have logged this problem
as PDFNEWJAVA-35612 in our issue tracking system. We will
further look into the details of this problem and will keep you updated on the
status of correction. Please be patient and spare us little time. We are sorry
for this inconvenience.

Thanks for logging the issue.
This issue for us is causing us to postpone the license purchase until it is resolved.
Do you have an estimate when this can be resolved, and what is your typical issue resolution turnaround time?

Hi Sarmir,


Thanks for your patience.

The time taken for the resolution of issue depends upon the complexity and severity of that problem and as we recently have noticed the problem in this thread, so its pending for review and unless the product team has analyzed this problem, we may not be able to share the possible timelines regarding its resolution. However as soon as we have some definite updates regarding the resolution, we will let you know.