Generate HTML from PDF using Aspose.PDF for Java - String Index Out Of Bounds Exception while saving PDF

pariyani · May 7, 2020, 12:39pm

Hi Support Team,

I am trying to generate HTML from a PDF File. While generate it first gave me an exception that a font file was missing so i replaced that particular font file. After doing that i am getting the following exception.
Exception in thread “main” java.lang.StringIndexOutOfBoundsException: String index out of range: 15
at java.lang.String.charAt(String.java:658)
at com.aspose.pdf.internal.l4u.lj.lI(Unknown Source)
at com.aspose.pdf.internal.l4u.lj.lj(Unknown Source)
at com.aspose.pdf.internal.l4u.lj.lI(Unknown Source)

The code which i am using is attached below.

  Document doc = new Document(ConvertPDFtoXLSX.class.getClassLoader().getResourceAsStream("test.pdf"));

      // Instantiate HTML Save options object
      HtmlSaveOptions newOptions = new HtmlSaveOptions();
      // Enable option to embed all resources inside the HTML
      newOptions.PartsEmbeddingMode = HtmlSaveOptions.PartsEmbeddingModes.EmbedCssOnly;
      // This is just optimization for IE and can be omitted
      newOptions.LettersPositioningMethod = LettersPositioningMethods.UseEmUnitsAndCompensationOfRoundingErrorsInCss;
      newOptions.RasterImagesSavingMode = HtmlSaveOptions.RasterImagesSavingModes.AsExternalPngFilesReferencedViaSvg;
      // newOptions.FontSavingMode = HtmlSaveOptions.FontSavingModes.SaveInAllFormats;

      TextFragmentAbsorber absorber = new TextFragmentAbsorber(new TextEditOptions(TextEditOptions.FontReplace.RemoveUnusedFonts));
      // accept the absorber for all the pages
      doc.getPages().accept(absorber);
      // Get the extracted text fragments into collection
      TextFragmentCollection textFragmentCollection = absorber.getTextFragments();

      // Loop through the fragments
      for (TextFragment textFragment : (Iterable<TextFragment>) textFragmentCollection) {
         String fontName = textFragment.getTextState().getFont().getFontName();
         if (fontName.equals("font0000000024642c3f")) {
            textFragment.getTextState().setFont(FontRepository.findFont("Verdana"));
         }
      }

      // Output file path
      String outHtmlFile = "Single_output.html";
      // Save the output file
      doc.save(outHtmlFile, newOptions);

The PDF file which i am using is here
test.pdf (28.0 KB)

Thanks for your help.

Best regards,
Imran Pariyani

Adnan.Ahmad · May 7, 2020, 8:19pm

@pariyani,

Thanks for contacting support.

We were able to notice the issue that you have mentioned and logged it as PDFJAVA-39401 in our issue tracking system. We will definitely look into its details and keep you posted with the status of its rectification. Please be patient and spare us some time.

We are sorry for the inconvenience.

aspose.notifier · November 24, 2020, 1:38am

The issues you have found earlier (filed as PDFJAVA-39401) have been fixed in Aspose.PDF for Java 20.11.