The extraction of images based on the paragraph node fig caption as keyword for the extraction process.
The extraction read the next sibling and extract the images.
But the new source document having previous sibling as figure caption and also having consecutive images .
So, the extraction process skip some images.
please, help me to extract the images using fig caption as previous sibling .
The sample code Test.zip (34.7 KB)
The input document source.zip (1.2 MB)
The actual output ActualOutput.zip (1.2 MB)
The expected output expected_output.zip (1.1 MB)
Thanks & regards,