We're sorry Aspose doesn't work properply without JavaScript enabled.

Free Support Forum - aspose.com

OutOfMemoryError on searching anchor in PDF

The issue is tested on java 8 environment with Aspose.PDF 19.3 (currently in production) and 21.5 (latest available).
The result is the same - OOM error is thrown.

Sample code and document are below.

Please, provide any workaround and/or register this behavior as a bug.

import com.aspose.pdf.Document;
import com.aspose.pdf.TextFragmentAbsorber;
import com.aspose.pdf.TextFragmentCollection;
import com.aspose.pdf.facades.PdfExtractor;
import org.apache.commons.lang.StringUtils;

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

public class FindAnchorOnPdf {
    private static String ANCHOR = "МЕСТО_ДЛЯ_ПЕЧАТИ";

     * @param args
     * 0 - путь до файла, в котором будем искать "якорь"
     * 1 - признак активации кода проверки на наличие текста (если есть параметр, то активируем)
    public static void main(String[] args) throws IOException {
        if(0 == args.length || 0 == args[0].length()) {
            System.out.println("Need set one argument - path to file");
        File sourceFile = new File(args[0]);

        if (!sourceFile.exists()) {
            System.out.println("File do not exist: " + sourceFile.getAbsolutePath());
        try (InputStream pdfStream = new FileInputStream(sourceFile)) {
            Document document = new Document(pdfStream);
            TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(ANCHOR);

            if (2 == args.length) {
                //проверяем, есть ли во вложении текст для поиска.
                try (final ByteArrayOutputStream docStream = new ByteArrayOutputStream()) {
                    PdfExtractor pdfExtractor = new PdfExtractor();
                    //Если текста нет, то перед нами скан и дальнейшие проверки не имеют смысла, выходим из метода.
                    if (StringUtils.isEmpty(docStream.toString().replaceAll("\\r?\\n?\0", ""))) {

            TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
            System.out.println(String.format("Find %1$d anchors", textFragmentCollection.size()));

link for testing pdf https://we.tl/t-ImAUYsEv4e


I can not reproduce the issue as it prints “Find 0 anchors” after running the code. Can you please share your system details.

We tested on OS Windows 7 Ent SP1, java 8 with the following parameters:

-Xms100M -Xmx200M -Xloggc:<path to log file> -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=10000k -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<path to store heap dump>

it is assumed that our system should work stable when more than 100 users could work with big files simultaneously, that’s why we selected this synthetic environment (only 100mb for jvm).


The specified memory is not enough so please increase it for smooth processing.