Extract entire text from a Presentation...in PHP

Hi,

I am trying to use your tutorial to extract all the text from a 2003 PPT file…using PHP.

I am getting stuck trying to rewrite the Java example in PHP…Does the tutorial exist in PHP? If not, can you help me please?

I am trying to learn how to convert Java into PHP - I have tried using the wrapperClasses but there seems to be certain functions that are unavailable in the php library? So, I have reverted to using Java in PHP.

regards,
Rob

Presentation
pres =
new
Presentation(
new
FileInputStream(
“c:\source.ppt”));

//iterate all slides

int lastSlidePosition = pres.getSlides().getLastSlidePosition();

for (int pos = 1; pos <= lastSlidePosition; pos++)

{

Slide sld = pres.getSlideByPosition(pos);

//iterate all shapes

int shapesCount = sld.getShapes().size();

for (int shpIdx = 0; shpIdx < shapesCount; shpIdx++)

{

com.aspose.slides.Shape shp=sld.getShapes().get(shpIdx);

//Get the paragraphs from textholder or textframe

Paragraphs paras = null;

//Check if shape holds a textholder

if (shp.getPlaceholder() != null && shp.isTextHolder() == true)

{

TextHolder thld = (TextHolder)shp.getPlaceholder();

paras = thld.getParagraphs();

}

else

{

if (shp.getTextFrame() != null)

{

paras = shp.getTextFrame().getParagraphs();

}//if

}//else

//Print the text on Console

if (paras != null)

{

int parasCount = paras.size();

for (int paraIdx = 0; paraIdx < parasCount; paraIdx++)

{

Paragraph para = paras.get(paraIdx);

//print the text on console

System.out.println(para.getText());

}

}//end if

}//end for

}//end for

Hi Rob,

Thanks for your interest in Aspose.Slides.

Please use the following Java/PHP code snippet to extract text from presentation file.

try
{
//Instantiate a Presentation object that represents a PPT file
$pres= new Java(“com.aspose.slides.Presentation”, “C:\demo.ppt”);

$lastSlidePosition = $pres->getSlides()->getLastSlidePosition();


for ($pos = 1; $pos <= $lastSlidePosition; $pos++)
{
$sld = $pres->getSlideByPosition($pos);

//iterate all shapes
$shapesCount = $sld->getShapes()->size();

for ($shpIdx = 0; $shpIdx < $shapesCount; $shpIdx++)
{
$shp = $sld->getShapes()->get($shpIdx);
$paras = null;

//Check if shape holds a textholder
if ($shp->getPlaceholder() != null && $shp->isTextHolder() == true)
{
//Get the place holder as an Object instance
$obj = $obj->getPlaceholder();

//First type of place holder. It is TextHolder
if($obj instanceof TextHolder)
{
//Cast object into TextHolder object
$txtHolder = $obj;
$paras = $txtHolder->getParagraphs();
//iterateParagraphs(paras, portionList);
}
else if($obj instanceof Placeholder) //Second type of place holder(Shape)
{
//Cast the object into Placeholder object
$placeHolder = $obj;
// getShapeRef() returns the Shape object which contains real properties of a Placeholder.
$paras = $placeHolder->getShapeRef()->getTextFrame()->getParagraphs();
//iterateParagraphs(paras, portionList);
}
}
else
{
if ($shp->getTextFrame() != null)
{
$paras = $shp->getTextFrame()->getParagraphs();
}//if
}//else

//Print the text on Console
if ($paras != null)
{
$parasCount = $paras->size();
for ($paraIdx = 0; $paraIdx < $parasCount; $paraIdx++)
{
$para = $paras->get($paraIdx);
//print the text on console
echo $para->getText();
}
}//end if
}
}


}
catch(JavaException $ex)
{
echo $ex;
}

Hi,

Thanks for the reply and the code, but it doesn’t work…

type Exception report

message

description The server encountered an internal error () that prevented it from fulfilling this request.

exception

javax.servlet.ServletException: java.lang.RuntimeException: PHP Warning:  java_require() not supported anymore. Please use <a href="http://php-java-bridge.sourceforge.net/pjb/webapp.php>tomcat or jee hot deployment instead in /var/lib/tomcat6/webapps/slides/java/Java.inc on line 1656
PHP Notice: Object of class java_InternalJava could not be converted to int in /var/lib/tomcat6/webapps/slides/test_extractText_java.php on line 16
PHP Notice: Object of class java_InternalJava could not be converted to int in /var/lib/tomcat6/webapps/slides/test_extractText_java.php on line 23
PHP Notice: Undefined variable: obj in /var/lib/tomcat6/webapps/slides/test_extractText_java.php on line 32
PHP Fatal error: Call to a member function getPlaceholder() on a non-object in /var/lib/tomcat6/webapps/slides/test_extractText_java.php on line 32
php.java.servlet.fastcgi.FastCGIServlet.handle(FastCGIServlet.java:499)
php.java.servlet.fastcgi.FastCGIServlet.doGet(FastCGIServlet.java:521)
javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
php.java.servlet.PhpCGIFilter.doFilter(PhpCGIFilter.java:126)

root cause

java.lang.RuntimeException: PHP Warning:  java_require() not supported anymore. Please use <a href="http://php-java-bridge.sourceforge.net/pjb/webapp.php>tomcat or jee hot deployment instead in /var/lib/tomcat6/webapps/slides/java/Java.inc on line 1656
PHP Notice: Object of class java_InternalJava could not be converted to int in /var/lib/tomcat6/webapps/slides/test_extractText_java.php on line 16
PHP Notice: Object of class java_InternalJava could not be converted to int in /var/lib/tomcat6/webapps/slides/test_extractText_java.php on line 23
PHP Notice: Undefined variable: obj in /var/lib/tomcat6/webapps/slides/test_extractText_java.php on line 32
PHP Fatal error: Call to a member function getPlaceholder() on a non-object in /var/lib/tomcat6/webapps/slides/test_extractText_java.php on line 32
php.java.servlet.fastcgi.FastCGIServlet.parseBody(FastCGIServlet.java:409)
php.java.servlet.fastcgi.FastCGIServlet.execute(FastCGIServlet.java:433)
php.java.servlet.fastcgi.FastCGIServlet.handle(FastCGIServlet.java:481)
php.java.servlet.fastcgi.FastCGIServlet.doGet(FastCGIServlet.java:521)
javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
php.java.servlet.PhpCGIFilter.doFilter(PhpCGIFilter.java:126)</pre><pre><br></pre><pre>Seems that there is an error on line 32...I'm using version 2.4 on Debian.</pre><pre><br></pre><pre>regards,</pre><pre>Rob</pre></span></div>

Also,

I have noticed that…

$slideCount = $pres->getSlides()->getLastSlidePosition();
for ($pos = 1; $pos <= $slideCount; $pos++)

$slideCount is NOT a variable - it seems to be an object, and as such, cannot be used as a value in the FOR loop

regards,
Rob

also…

should…
//Get the place holder as an Object instance
$obj = $obj->getPlaceholder();

Be…?
$obj = $shp->getPlaceholder();

if so, it still doesn’t work…I just get a blank page

regards,
Rob

Hi Again…

it seems by using:
$obj = $shp->getPlaceholder();

that $obj is neither…
$obj instanceof TextHolder
or
$obj instanceof Placeholder

regards,
Rob

as a test…

var_dump($obj instanceof TextHolder);
var_dump($obj instanceof Placeholder);

gives:
bool(false)
bool(false)

Hi Rob,

Please use Aspose.Slides for Java 2.4.0 to work with PHP/Java. If you still face problem then please share your presentation file with us for investigation purpose.

We apologize for your inconvenience.

Hi,

I am using Aspose.Slides.Java 2.4.0

I have tested the above code with different slide presentations and get the same results.

Can you tell me if your code was wrong?:
$obj = $obj->getPlaceholder();
which should be:
$obj = $shp->getPlaceholder();

regards,
Rob

Here is a simple ppt file that I have used to test with…Any ppt file I use FAILS.


Can you comment on all of the other above posts regarding questions to the accuracy of the code you suggested please?

Best regards,
Rob

…and judging by the number of views this post has received - there seems to be a lot of other people interested in seeing a solution.


Kind regards,
Rob

Hi Rob,

Please use Aspose.Slides for Java 2.4.0 to work with PHP/Java. I have done some modification in code, please see below.

try
{
//Instantiate a Presentation object that represents a PPT file
$pres= new Java(“com.aspose.slides.Presentation”, “C:\eslideManager.ppt”);

//get last position of presentation
$lastSlidePosition = $pres->getSlides()->getLastSlidePosition();

//iterate all slides
for ($pos = 1; $pos <= $lastSlidePosition; $pos++)
{
$sld = $pres->getSlideByPosition($pos);

//iterate all shapes
$shapesCount = $sld->getShapes()->size();
//iterate all shapes
for ($shpIdx = 0; $shpIdx < $shapesCount; $shpIdx++)
{
$shp = $sld->getShapes()->get($shpIdx);

if ($shp->getPlaceholder() != null && $shp->isTextHolder() == true)
{
//Get the place holder as an Object instance
$obj = $shp->getPlaceholder();

$paras = $obj->getParagraphs();
}

//Print the text
if ($paras != null)
{
$parasCount = $paras->size();
for ($paraIdx = 0; $paraIdx < $parasCount; $paraIdx++)
{
$para = $paras->get($paraIdx);
//print the text
echo $para->getText();
echo “
”;
}
}//end if
}
}


}
catch(JavaException $ex)
{
echo $ex;
}

Hi Tahir,

Thank you for the reply, but I was hoping for a little more information…I did post a lot of questions and findings to you…
So,
I was right in saying that the code should be:
$obj = $shp->getPlaceholder();

BUT, you have removed the ‘$obj instanceof’ etc - Why?
Do I not need to test for either TextHolder or PlaceHolder any more?

Yes, the code you have posted works with the very simple ppt i sent you, but it does NOT work with any other ppts (sorry I cannot post the ppt that won’t work as it contains commercially sensitive material) - I need it to work with ANY .ppt file that might get thrown at it…

I am trying to develop a server based solution for a client, who will expect their users to attempt to use any PowerPoint (.ppt) with the system.

I would really appreciate some help here in understanding what the issue is and how to write an efficient routine that works, and works without fail.

Yes, I am evaluating Aspose.Slide for Java (and YES i am using version 2.4.0), but unless I can be confident that it will work FULLY, and that I can get good/helpful tech support, then I cannot use it commercially.

Please help,

Kind regards,
Rob

Hi Rob,

I have modified the code related with instanceof object. I recommend you to visit our documentation of Aspose.Slides for Java. I hope this answers your queries.

try
{
//Instantiate a Presentation object that represents a PPT file
$pres= new Java(“com.aspose.slides.Presentation”, “C:\eslideManager.ppt”);

//get last position of presentation
$lastSlidePosition = $pres->getSlides()->getLastSlidePosition();
echo $pres;
//iterate all slides
for ($pos = 1; $pos <= $lastSlidePosition; $pos++)
{
$sld = $pres->getSlideByPosition($pos);

//iterate all shapes
$shapesCount = $sld->getShapes()->size();

//iterate all shapes
for ($shpIdx = 0; $shpIdx < $shapesCount; $shpIdx++)
{
$shp = $sld->getShapes()->get($shpIdx);

$paras = null;

if ($shp->getPlaceholder() != null && $shp->isTextHolder() == true)
{
//Get the place holder as an Object instance
$obj = $shp->getPlaceholder();
if ($obj instanceof Java) {

$type = “” . $obj->getClass();

if ($type == “class com.aspose.slides.TextHolder”) {
$paras = $obj->getParagraphs();
}
else if ($type == “class com.aspose.slides.Placeholder”) {
$paras = $placeHolder->getShapeRef()->getTextFrame()->getParagraphs();
}
}
}

//Print the text
if ($paras != null)
{
$parasCount = $paras->size();
for ($paraIdx = 0; $paraIdx < $parasCount; $paraIdx++)
{
$para = $paras->get($paraIdx);
//print the text
echo $para->getText();
echo “
”;
}
}//end if
}
}

}
catch(JavaException $ex)
{
echo $ex;
}

Hi Tahir,

Many thanks for the new code…but unfortunately there is still a problem.

if ($obj instanceof Java) {

$type = “” . $obj->getClass();

getClass() fails with an error because $obj is null - I did a test using:
if ($obj instanceof Java) {
if (java_is_null($obj) != true)
{
$type = “”.$obj->getClass();
if ($type == “class com.aspose.slides.TextHolder”) {
$paras = $obj->getParagraphs();
}
else if ($type == “class com.aspose.slides.Placeholder”) {
$paras = $placeHolder->getShapeRef()->getTextFrame()->getParagraphs();
}
} else {
echo “Java is NULL
”;
}
}
And for every shape in the slides, I get "Java is NULL"

Why is this? The ppt files I am testing do not have anything special in them just text and images.

Kind regards,
Rob

Hi Tahir,

here is a test ppt that I am using - no longer contains any commercially sensitive content.

This file FAILS with your new code

Best regards,
Rob

Hi Rob,

I have successfully extracted text from shared PPT file with following code snippet. Please use the following code and let me know If you still face any problem.

try
{
//Instantiate a Presentation object that represents a PPT file
$pres= new Java(“com.aspose.slides.Presentation”, “C:\test1.ppt”);

//get last position of presentation
$lastSlidePosition = $pres->getSlides()->getLastSlidePosition();
echo $pres;
//iterate all slides
for ($pos = 1; $pos <= $lastSlidePosition; $pos++)
{
$sld = $pres->getSlideByPosition($pos);

//iterate all shapes
$shapesCount = $sld->getShapes()->size();

//iterate all shapes
for ($shpIdx = 0; $shpIdx < $shapesCount; $shpIdx++)
{
$shp = $sld->getShapes()->get($shpIdx);

$paras = null;

if ($shp->getPlaceholder() != null && $shp->isTextHolder() == true)
{
//Get the place holder as an Object instance
$obj = $shp->getPlaceholder();
if ($obj instanceof Java) {

$type = “” . $obj->getClass();

if ($type == “class com.aspose.slides.TextHolder”) {
$paras = $obj->getParagraphs();
}
else if ($type == “class com.aspose.slides.Placeholder”) {
$paras = $placeHolder->getShapeRef()->getTextFrame()->getParagraphs();
}
}
}
else
{
if ($shp->getTextFrame() != null)
{
$paras = $shp->getTextFrame()->getParagraphs();
}//if
}

//Print the text
if ($paras != null)
{
$parasCount = $paras->size();
for ($paraIdx = 0; $paraIdx < $parasCount; $paraIdx++)
{
$para = $paras->get($paraIdx);
//print the text
echo $para->getText();
echo “
”;
}
}//end if
}
}

}
catch(JavaException $ex)
{
echo $ex;
}

Hi Tahir,

Did you use the test1.ppt I sent you?
It doesn’t work for me - here is the output:

[[o:Exception]:“java.lang.Exception: Invoke failed: [[o:Request$PhpNull]]->getClass. Cause: java.lang.NullPointerException: cannot call “getClass()” on a Java null object. A previous Java call has returned a null value, use java_is_null($jvalue) to check. VM: 1.6.0_24@http://java.sun.com/” at: #-5 php.java.bridge.JavaBridge.Invoke(JavaBridge.java:999) #-4 php.java.bridge.Request.handleRequest(Request.java:417) #-3 php.java.bridge.Request.handleRequests(Request.java:500) #-2 php.java.bridge.http.ContextRunner.run(ContextRunner.java:145) #-1 php.java.bridge.ThreadPool$Delegate.run(ThreadPool.java:60) #0 /var/lib/tomcat6/webapps/slides/java/Java.inc(232): java_ThrowExceptionProxyFactory->getProxy(12, ‘java.lang.Boole…’, ‘T’, true) #1 /var/lib/tomcat6/webapps/slides/java/Java.inc(360): java_Arg->getResult(true) #2 /var/lib/tomcat6/webapps/slides/java/Java.inc(366): java_Client->getWrappedResult(true) #3 /var/lib/tomcat6/webapps/slides/java/Java.inc(560): java_Client->getResult() #4 /var/lib/tomcat6/webapps/slides/java/Java.inc(1752): java_Client->invokeMethod(11, ‘getClass’, Array) #5 /var/lib/tomcat6/webapps/slides/java/Java.inc(1851): java_JavaProxy->__call(‘getClass’, Array) #6 /var/lib/tomcat6/webapps/slides/java/Java.inc(1999): java_AbstractJava->__call(‘getClass’, Array) #7 /var/lib/tomcat6/webapps/slides/test_extractText_java.php(43): Java->__call(‘getClass’, Array) #8 /var/lib/tomcat6/webapps/slides/test_extractText_java.php(43): java_JavaProxyProxy->getClass() #9 {main}]

It seems that ->getClass() fails because $obj is a NULL Java object…?

Please test your code with the attached PPT

regards,
Rob

Hi Rob,

I had successfully extracted text from shared presentation file with the same code. I have extracted the text again from your PPT file. Please see the attachment.

Hi Tahir,

I copied and used your code as is - only changing the path to the ppt file and it does NOT work for me - error as above (Java null object for $obj when ->getClass() ).

This isn’t helping at all…What version of Aspose.Slide for Java are you using? 2.4.0?
There must be something different in our implementations…I am using Debian, are you?

Please let me know what versions of Aspose, Java, Java PHP Bridge etc you are using…There must be something different.

regards,
Rob