We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

Pdf.extractor problem with the pdf file attached

I have 2 pdf files in the Directory
When I run this sub routine:

Sub pdf_extractor(ByVal fname)
'Instantiate PdfExtractor object
Dim extractor As PdfExtractor = New PdfExtractor()
'Set Password for input PDF file

extractor.Password = “”
'Bind the input PDF document to extractor
extractor.BindPdf(“d:/inetpub/engrAdmin/larry/templates/” & fname)

'Save the extracted text to a text file
'create text file
Dim rnd As New Random()
Dim rndNumber = rnd.Next(99, 1000)
Dim fileName As String = “text” & CStr(rndNumber) & “.txt”

Dim strPath As String = “d:/inetpub/engrAdmin/larry/notes/”
Try
'Extract text from the input PDF document
extractor.ExtractText()
File.CreateText(strPath & fileName)
extractor.GetText(“d:/inetpub/engrAdmin/larry/notes/text.txt”)
Catch ex As Exception
Dim strerr As String = ex.ToString
Dim msg As String = “An error has occurred with this file:” & fname & “
” & strerr

Response.Write(msg)

End Try
Response.Write("the pdf extractor is finished processing at " & Today())
End Sub

I get this error with the second pdf file (mse.pdf) . I do not understand why the first pdf file gets extracted but not the second one does not and get this error:

System.ArgumentException: The output char buffer is too small to
contain the decoded characters, encoding ‘Unicode (UTF-8)’ fallback
‘System.Text.DecoderReplacementFallback’.
Parameter name: chars at System.Text.Encoding.ThrowCharsOverflow() at
System.Text.Encoding.ThrowCharsOverflow(DecoderNLS decoder, Boolean
nothingDecoded) at System.Text.UTF8Encoding.GetChars(Byte* bytes, Int32
byteCount, Char* chars, Int32 charCount, DecoderNLS baseDecoder) at
System.Text.DecoderNLS.GetChars(Byte* bytes, Int32 byteCount, Char*
chars, Int32 charCount, Boolean flush) at
System.Text.DecoderNLS.GetChars(Byte[] bytes, Int32 byteIndex, Int32
byteCount, Char[] chars, Int32 charIndex, Boolean flush) at
System.Text.DecoderNLS.GetChars(Byte[] bytes, Int32 byteIndex, Int32
byteCount, Char[] chars, Int32 charIndex) at
System.IO.BinaryReader.InternalReadOneChar() at Ӓ.Ὺ.₱.Read() at Ӓ.ₒ.₩()
at Ӓ.△.▸() at Ӓ.△.᜵() at Ӓ.:snowman:.⛉(Stream ࡴ, String ॽ, Int32 ῑ, Int32 ῒ) at
Aspose.Pdf.Kit.PdfExtractor.ExtractText() at
read_pdf.pdf_extractor(Object fname)

Hi,

Thank you for considering Aspose.

I test the attached Pdf file with Aspose.Pdf.Kit 3.1.0.0 and am not able to reproduce the error.

BTW, you needn't call the "File.CreateText(string)" since the method PdfExtractor.GetTxt(string) would automatically create the .txt file.

Thanks,

Hi Felix

Thank you for testing mse.pdf...that was created by Adobe Acrobat 7.0.

I am not running aspose.pdf.kit 3.1.0.0

I am running a version that is over a year old. I am not sure what version I have?

Could it be I need the newer version of the program? please let me know.

Every time I have a pdf file created by Adobe Acrobat 7.0 I get this error:

System.ArgumentException: The output char buffer is too small to contain the decoded characters, encoding 'Unicode (UTF-8)' fallback 'System.Text.DecoderReplacementFallback'. Parameter name: chars at System.Text.Encoding.ThrowCharsOverflow() at System.Text.Encoding.ThrowCharsOverflow(DecoderNLS decoder, Boolean nothingDecoded) at System.Text.UTF8Encoding.GetChars(Byte* bytes, Int32 byteCount, Char* chars, Int32 charCount, DecoderNLS baseDecoder) at System.Text.DecoderNLS.GetChars(Byte* bytes, Int32 byteCount, Char* chars, Int32 charCount, Boolean flush) at System.Text.DecoderNLS.GetChars(Byte[] bytes, Int32 byteIndex, Int32 byteCount, Char[] chars, Int32 charIndex, Boolean flush) at System.Text.DecoderNLS.GetChars(Byte[] bytes, Int32 byteIndex, Int32 byteCount, Char[] chars, Int32 charIndex) at System.IO.BinaryReader.InternalReadOneChar() at Ӓ.Ὺ.₱.Read() at Ӓ.ₒ.₩() at Ӓ.△.▸() at Ӓ.△.᜵() at Ӓ.⛄.⛉(Stream ࡴ, String ॽ, Int32 ῑ, Int32 ῒ) at Aspose.Pdf.Kit.PdfExtractor.ExtractText() at read_pdf.pdf_extractor(Object fname)

jaydean edson

Hello Jaydean,

I have also tested your code for "mse.pdf" and it worked fine at my end. I tested it using Aspose.Pdf.kit version 3.1.0.0

We would recommend you to switch to the latest version of Aspose.Pdf.Kit, as it can easily support files created through Adobe Acrobat 7.0