We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

从pdf提取文字到txt文件时报错,请问下是否支持中文内容的提取?

从pdf提取文字到txt文件时报错,示例代码如下:
private static void ExtractText() throws IOException
{
String inputPath=Environment.getExternalStorageDirectory().getAbsolutePath()+"/page_1.pdf";
String outputPath=Environment.getExternalStorageDirectory().getAbsolutePath()+"/extractedtext.txt";

	com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document(inputPath);
	//create TextAbsorber object to extract text
	com.aspose.pdf.TextAbsorber textAbsorber = new com.aspose.pdf.TextAbsorber();
	//accept the absorber for all the pages
	pdfDocument.getPages().accept(textAbsorber);
	//pdfDocument.getPages().get_Item(2).accept(textAbsorber);
	//get the extracted text
	String extractedText = textAbsorber.getText();

	// create a writer and open the file
	java.io.FileWriter writer = new java.io.FileWriter(new java.io.File(outputPath));
	writer.write(extractedText);
	// write a line of text to the file
	//tw.WriteLine(extractedText);
	// close the stream
	writer.close();
}

错误如下:
2022-03-08 10:09:52.835 24218-24218/com.aspose.pdf.examples E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.aspose.pdf.examples, PID: 24218
java.lang.RuntimeException: Unable to start activity ComponentInfo{com.aspose.pdf.examples/com.aspose.pdf.examples.MainActivity}: class com.aspose.pdf.engine.io.serialization.PdfSerializationException: Culture Name: zh-CN-#Hans is not a supported culture
com.aspose.pdf.engine.data.PdfArray$z1.deserialize(Unknown Source:185)
com.aspose.pdf.engine.io.serialization.PdfSerializer.deserialize(Unknown Source:44)
com.aspose.pdf.engine.data.PdfDictionary$z1.deserialize(Unknown Source:220)
com.aspose.pdf.engine.io.serialization.PdfSerializer.deserialize(Unknown Source:44)
com.aspose.pdf.engine.data.PdfTrailer.m1(Unknown Source:255)
com.aspose.pdf.engine.data.PdfTrailer.m1(Unknown Source:230)
com.aspose.pdf.engine.data.PdfTrailer$XrefSerializer.deserialize(Unknown Source:11)
com.aspose.pdf.engine.io.serialization.PdfSerializer.deserialize(Unknown Source:44)
com.aspose.pdf.engine.io.PdfReader.m1021(Unknown Source:368)
com.aspose.pdf.engine.io.PdfReader.(Unknown Source:197)
com.aspose.pdf.engine.io.PdfReader.(Unknown Source:9)
com.aspose.pdf.internal.p41.z1.m289(Unknown Source:2)
com.aspose.pdf.engine.io.PdfFile.(Unknown Source:3)
com.aspose.pdf.internal.p41.z1.m291(Unknown Source:2)
com.aspose.pdf.engine.PdfDocument.open(Unknown Source:7)
com.aspose.pdf.engine.PdfDocument.(Unknown Source:12)
com.aspose.pdf.ADocument.init(Unknown Source:12)
com.aspose.pdf.ADocument.(Unknown Source:46)
请问下是否支持中文内容的提取

@Matt1986

您能否在此处附上您的输入 PDF 文件以进行测试? 我们将调查该问题并为您提供更多信息。

apose-pdf-android-test.zip (141.0 KB)

附件是Android 工程,用的pdf jar版本为:aspose-pdf-20.11-android.via.java.jar,由于文件太大无法上传去掉了,需要放入libs目录下

HelloWorld.pdf (580.3 KB)
附件为测试文件

你好:

测试pdf以及测试demo已经回复在论坛下方留言,请帮忙尽快回复,谢谢

@Matt1986

我们从您的代码中调用了方法 ExtractText 并且没有遇到共享问题。 请确保您最后使用的是相同的 PDF 文件。