Word splits into multiple portions of text on extraction (C# .NET)

Aspose Slides is giving unexpected results for a particular PowerPoint slide that we have.

We are using the following code to get text from power point slides (so that we can search them for specific words).

        Dim pptxPresentation As Presentation = New Presentation(_strFilePath)
        Dim textFramesPPTX As ITextFrame() = Util.SlideUtil.GetAllTextFrames(pptxPresentation, True)

        For i As Integer = 0 To textFramesPPTX.Length - 1
            For Each port As IPortion In From para In textFramesPPTX(i).Paragraphs From port1 In para.Portions Select port1
                sb.Append(port.Text)
                sb.AppendLine()
            Next
        Next

We found a problem with the linked Powerpoint which has only one word in it.

The word Associate for some reason gets split into the following pieces which then creates problems with our Search.

“Ass” & vbCrLf & “o” & vbCrLf & “c” & vbCrLf & “ia” & vbCrLf & “t” & vbCrLf & “e”

The code words with most slides but we seem to have some hidden characters here that Aspose is picking up.

NOTE: If I copy and paste the word into a brand new Powerpoint document, the same problem exists with the new document but… If I delete the word in the slide and retype the same word over it, the problem is fixed. It appears there are some hidden characters that Aspose is picking up.

The problem was discovered by a user and I am hoping that there is some way of getting to to work on his slide instead of asking him to retype (in case it also occurs on other documents).

Any help would be greatly appreciated.
Thanks in advance.
Sanjay

@singhsy,

I have worked with the presentation file shared by you using Aspose.Slides for .NET 19.11 and have been able to reproduce the issue. An issue with ID SLIDESNET-41594 has been created in our issue tracking system to further investigate and resolve the issue. This thread has been linked with the issue so that you may be notified once the issue will be fixed.

Generally, in such cases the following line of code merge all the portions together in a paragraph if they have same formatting. Apparently, the text seems to have same formatting but following call is too failing.

pres.JoinPortionsWithSameFormatting();

We will work over this issue and will share feedback once it will be addressed.

@mudassir.fayyaz
Thank you for looking into this.

A quick question… is there any single line of code to copy out all text from a Powerpoint file (text can be anywhere in slide including comments, headers etc).

Thanks
Sanjay

Can you please elaborate where you want to copy that text i mean from one presentation to another presentation.

Sorry that was poorly worded. I just need to get all the text from the Powerpoint into a string variable. I will then use it to search for specific words or patterns.

We are currently looping through different components of the slides but I was wondering if there was a GetAllText command.

Thanks
Sanjay

@singhsy,

I have observed your requirements and suggest you to please visit this documentation article for extracting text on presentation paragraph level. I hope this will be helpful.

Quick check to see if there is an update on this issue (When working with the file that I had linked to).

Users were asking for a solution and I was wondering if there has been an update that I can hopefully download to fix this problem.

Thanks in advance
Sanjay

@singhsy,

I like to share that the issue is still unresolved and is pending in issues queue at the moment. However, I have already shared the workaround approach with you that you can adopt for the moment. i.e. Instead of comparing text on portion level, please perform comparison on Paragraph level. This way you will be able to achieve results.

@mudassir.fayyaz
We tried your suggestions but unfortunately they all still gave the same result i.e. they break the word up into several pieces.

@singhsy,

I suggest you to please try using following sample code on your end for the time being till the issue gets resolved. The code is identifying string on paragraph level and then replacing that on portion level.

Public Shared Sub TestReplaceText2()
    Dim stToFind As String = "Associate"
    Dim stToAppend As String = "New Text"
    Dim _strFilePath As String = "C:\data\"
    Dim pptxPresentation = New Presentation(_strFilePath & "Associate_Aspose.pptx")
    Dim textFramesPPTX As ITextFrame() = Aspose.Slides.Util.SlideUtil.GetAllTextFrames(pptxPresentation, False)
    Dim i As Integer = 0, loopTo As Integer = textFramesPPTX.Length - 1

    While i <= loopTo

        For Each para As IParagraph In textFramesPPTX(i).Paragraphs

            If para.Text.Equals(stToFind) Then
                Dim portionCount As Integer = para.Portions.Count

                For j As Integer = 1 To portionCount - 1
                    para.Portions.Remove(para.Portions(1))
                Next
            End If

            para.Portions(0).Text = para.Portions(0).Text & " " & stToAppend
        Next

        i += 1
    End While

    pptxPresentation.Save(_strFilePath & "NewSaved.pptx", Aspose.Slides.Export.SaveFormat.Pptx)
End Sub

The issues you have found earlier (filed as SLIDESNET-41594) have been fixed in this update.