Aspose Slides is giving unexpected results for a particular PowerPoint slide that we have.
We are using the following code to get text from power point slides (so that we can search them for specific words).
Dim pptxPresentation As Presentation = New Presentation(_strFilePath)
Dim textFramesPPTX As ITextFrame() = Util.SlideUtil.GetAllTextFrames(pptxPresentation, True)
For i As Integer = 0 To textFramesPPTX.Length - 1
For Each port As IPortion In From para In textFramesPPTX(i).Paragraphs From port1 In para.Portions Select port1
sb.Append(port.Text)
sb.AppendLine()
Next
Next
We found a problem with the linked Powerpoint which has only one word in it.
The word Associate for some reason gets split into the following pieces which then creates problems with our Search.
The code words with most slides but we seem to have some hidden characters here that Aspose is picking up.
NOTE: If I copy and paste the word into a brand new Powerpoint document, the same problem exists with the new document but… If I delete the word in the slide and retype the same word over it, the problem is fixed. It appears there are some hidden characters that Aspose is picking up.
The problem was discovered by a user and I am hoping that there is some way of getting to to work on his slide instead of asking him to retype (in case it also occurs on other documents).
Any help would be greatly appreciated.
Thanks in advance.
Sanjay
I have worked with the presentation file shared by you using Aspose.Slides for .NET 19.11 and have been able to reproduce the issue. An issue with ID SLIDESNET-41594 has been created in our issue tracking system to further investigate and resolve the issue. This thread has been linked with the issue so that you may be notified once the issue will be fixed.
Generally, in such cases the following line of code merge all the portions together in a paragraph if they have same formatting. Apparently, the text seems to have same formatting but following call is too failing.
pres.JoinPortionsWithSameFormatting();
We will work over this issue and will share feedback once it will be addressed.
A quick question… is there any single line of code to copy out all text from a Powerpoint file (text can be anywhere in slide including comments, headers etc).
Sorry that was poorly worded. I just need to get all the text from the Powerpoint into a string variable. I will then use it to search for specific words or patterns.
We are currently looping through different components of the slides but I was wondering if there was a GetAllText command.
I have observed your requirements and suggest you to please visit this documentation article for extracting text on presentation paragraph level. I hope this will be helpful.
I like to share that the issue is still unresolved and is pending in issues queue at the moment. However, I have already shared the workaround approach with you that you can adopt for the moment. i.e. Instead of comparing text on portion level, please perform comparison on Paragraph level. This way you will be able to achieve results.
I suggest you to please try using following sample code on your end for the time being till the issue gets resolved. The code is identifying string on paragraph level and then replacing that on portion level.
Public Shared Sub TestReplaceText2()
Dim stToFind As String = "Associate"
Dim stToAppend As String = "New Text"
Dim _strFilePath As String = "C:\data\"
Dim pptxPresentation = New Presentation(_strFilePath & "Associate_Aspose.pptx")
Dim textFramesPPTX As ITextFrame() = Aspose.Slides.Util.SlideUtil.GetAllTextFrames(pptxPresentation, False)
Dim i As Integer = 0, loopTo As Integer = textFramesPPTX.Length - 1
While i <= loopTo
For Each para As IParagraph In textFramesPPTX(i).Paragraphs
If para.Text.Equals(stToFind) Then
Dim portionCount As Integer = para.Portions.Count
For j As Integer = 1 To portionCount - 1
para.Portions.Remove(para.Portions(1))
Next
End If
para.Portions(0).Text = para.Portions(0).Text & " " & stToAppend
Next
i += 1
End While
pptxPresentation.Save(_strFilePath & "NewSaved.pptx", Aspose.Slides.Export.SaveFormat.Pptx)
End Sub