PDF Document Resources

MirAddy · January 10, 2019, 8:25pm

Hello,

I have java code to open a PDF and extract the text while preserving the newline characters (converting them to xml complaint versions). The code executes and returns the correct data, however there seem to be resources holding onto the PDF. I’ve tried document.dispose() and document.close() methods, but that doesn’t seem to be working. Is there a way to fix this? we’re using aspose-pdf-18.9.1.jar.

Thank you!

String filePath=“C:/Temp/file.pdf”;
Document document = new com.aspose.pdf.Document(filePath);

 	Map<String,Integer> combineSegment = new LinkedHashMap<String , Integer>();
 	
 	ParagraphAbsorber absorber = new ParagraphAbsorber();
	//absorber.visit(document);
	PdfAnnotationEditor editor = new PdfAnnotationEditor();

	editor.bindPdf(filePath);

	PageCollection pgColl = editor.getDocument().getPages();

	for(int i = 0; i<pgColl.size(); i++){
		absorber.visit(editor.getDocument().getPages().get_Item(i+1));
	}

	List<PageMarkup> pageMarkups = absorber.getPageMarkups();
	// Paragraph Counter
	Integer count = 0;
	for (PageMarkup markup : pageMarkups) {

		List<MarkupSection> sections = markup.getSections();
		
		for (MarkupSection section : sections) {

			List<MarkupParagraph> paragraphs = section.getParagraphs();
			
			for (MarkupParagraph paragraph : paragraphs) {

				count = count + 1;
				List<I27<TextFragment>> lines = paragraph.getLines();
				for (List<TextFragment> line : lines) {
					
					for (TextFragment fragment : line) {
						// Iterate on paragraph and combine the lines in same paragraph
						// Map Contains position of text fragment as key to paragraph number as value
							combineSegment.put(fragment.getBaselinePosition().toString(), count);
							
					}
				}
			lines.clear();

			}
		}
	}
 	
	
	StringBuilder line = new StringBuilder();
    StringBuilder output = new StringBuilder();
    TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber();
    //Set Extraction Options as Raw This will give new line as space.
    textFragmentAbsorber.setExtractionOptions(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw));
    // Accept the absorber 
    document.getPages().accept(textFragmentAbsorber);
    // Get the extracted text fragments

    com.aspose.pdf.TextFragmentCollection textFragmentCollection = textFragmentAbsorber.getTextFragments();
    for (TextFragment fragment : textFragmentCollection) {
    		Position baselinePosition = fragment.getBaselinePosition();
    		// If key not present then it's a new line 
    		if(!combineSegment.containsKey(baselinePosition.toString())) {
    			// If line buffer is not empty that means we are processing lines in a paragraph and these new lines should not be printed.
    			if(line.toString().isEmpty()) {
    				// We get space for new line in this version of API.
    				if(fragment.getText().trim().isEmpty())
    					output.append("&#13;");
    				else
    					output.append(fragment.getText());    			}
    		} else {
    			
    			// If key present then remove the position from map and get the paragraph number on which we are working
    			Integer remove = combineSegment.remove(baselinePosition.toString());
    			
    			//If more text present for same paragraph then continue appending
    			boolean contains = combineSegment.values().contains(remove);
    			if(contains) {
    				line.append(fragment.getText());
    				continue;
    			} else {
    			//If all the data for particular paragraph is extracted then display the data and reset the buffer for next paragraph 	
    				line.append(fragment.getText());
    				output.append(line.toString());
    				line = new StringBuilder();
    			}
    		
    		}
    		
        }

    String extractedText = output.toString();
   document.close();
	document.dispose();
    return extractedText;

Farhan.Raza · January 11, 2019, 7:17am

@MirAddy

Thank you for contacting support.

Would you please elaborate a little more about holding of resources that how are you noticing this, while sharing some screenshots, environment details and sample files so that we may try to reproduce and investigate it in our environment. Before sharing requested data, please ensure using Aspose.PDF for Java 18.12.