I have multiple file types and I need to convert them to text to count lines and sentences.
For PPT, I converted them to HTML ex. “KM Sem.ppt”->“KM Sem.html”
And for PDF, I converted them to DOC ex. “KM ABA.pdf”->“KM ABA.doc”
Then, I extracted the converted files (HTML and DOC) to text.
But I found some converted files got different layout depend on OS, Windows and Linux.
So when I extracted text from the converted files and counted lines and sentences, the result would be different too.
Here is my PDF sample file : KM ABA.pdf (267.6 KB)
And my code for ConvertPptToHtml :
com.aspose.slides.Presentation pres = new com.aspose.slides.Presentation(sInputFile);
pres.save(sTempFolder + sFileName + “.html”, com.aspose.slides.SaveFormat.Html);
for ConvertPdfToDoc :
com.aspose.pdf.Document pdfDoc = new com.aspose.pdf.Document(sInputFile);
pdfDoc.save(sTempFolder + sFileName + “.doc”, com.aspose.pdf.SaveFormat.Doc);