Spaces after strings between spans nested in divs crashes PDF generation

When this string

<DIV align=center> <SPAN>abc <SPAN >&nbsp;</SPAN>def </SPAN> </DIV>

is used to create an HtmlFragment it breaks the PDF generation process.

If the spaces after abc and def are removed, generation of the document works.

Here is the code that breaks, followed by the same code with a ‘while’ loop that lets generation happen:

protected static HtmlFragment GetHtmlFragment(string text)
{
if (text == null)
{
text = “”;
}
text = text.Replace("\r", “”);
text = text.Replace("\n", " “);
text = text.Replace(”¿", “”);

        //remove XML tags if any
        text = Regex.Replace(text, "(?s)<!--.*?-->", " ");

        text = Regex.Replace(text, @"url\((['""].*?[""'])\)", "");
        text = Regex.Replace(text, @"url['""].*?[""']", "");
        text = Regex.Replace(text, @"url\((.*?)\)", "");
        text = Regex.Replace(text, @"face=['""].*?[""']", "");
        text = Regex.Replace(text, @"</select.*?>", "");
        text = Regex.Replace(text, @"<select.*?>", "");
        text = Regex.Replace(text, @"</img.*?>", "");
        text = Regex.Replace(text, @"<img.*?>", "");
        text = Regex.Replace(text, @"</font.*?>", "");
        text = Regex.Replace(text, @"<font.*?>", "");
        text = Regex.Replace(text, @"</b.*?>", "");
        text = Regex.Replace(text, @"<b.*?>", "");

        var fragment = new HtmlFragment(text);
        fragment.TextState = new TextState();
        return fragment;
    }

if the above string is processed, later on during the generation of the document we get a crash.

Here is the same code with a loop that arbitrarily removes spaces if the string contains '<SPAN>':

    protected static HtmlFragment GetHtmlFragment(string text)
    {
        if (text == null)
        {
            text = "";
        }
        text = text.Replace("\r", "");
        text = text.Replace("\n", " ");
        text = text.Replace("¿", "");

        //remove XML tags if any
        text = Regex.Replace(text, "(?s)<!--.*?-->", " ");

        text = Regex.Replace(text, @"url\((['""].*?[""'])\)", "");
        text = Regex.Replace(text, @"url['""].*?[""']", "");
        text = Regex.Replace(text, @"url\((.*?)\)", "");
        text = Regex.Replace(text, @"face=['""].*?[""']", "");
        text = Regex.Replace(text, @"</select.*?>", "");
        text = Regex.Replace(text, @"<select.*?>", "");
        text = Regex.Replace(text, @"</img.*?>", "");
        text = Regex.Replace(text, @"<img.*?>", "");
        text = Regex.Replace(text, @"</font.*?>", "");
        text = Regex.Replace(text, @"<font.*?>", "");
        text = Regex.Replace(text, @"</b.*?>", "");
        text = Regex.Replace(text, @"<b.*?>", "");
        //arbitrarily remove spaces in this use case to prove malfunction
        if(text.Contains("SPAN"))
        {
            while(text.Contains(" "))
            {
                text = text.Replace(" ", string.Empty);
            }
        }
        var fragment = new HtmlFragment(text);
        fragment.TextState = new TextState();
        return fragment;
    }

If you need a console project created that proves this malfunction I could create one. Let me know if this code can be duplicated on your end and then the string above duplicates the malfunction. I think it will.

If not, let me know.

@Ohio_Mike

Thank you for contacting support.

We have worked with the data shared by you, and have been able to notice a problem with GetHtmlFragment method. It throws a NullReferenceException on Document.Save method. Please clarify if you are facing the same issue which you have mentioned as crashing of PDF generation. Please elaborate, so that we may proceed to help you out.

I thought I had responded, we were wondering why we heard nothing back.

What you got on your end confirms what we’re seeing on ours. Only spaces AFTER a string between the opening and closing tags will cause the issue.

There’s nothing more to elaborate, we’re all seeing the same malfunction.

Have you done any further testing?

@Ohio_Mike

Thank you for getting back to us.

We had requested you to share your feedback with us to ensure being on the same page because malfunction and crashing of PDF generation was mentioned in the previous post. That is why we requested for an explicit confirmation about the problem so that we may properly take care of your concerns. Also, part of code for PDF generation was missing so we found it imperative to request for confirmation.

Furthermore, a ticket with ID PDFNET-44992 has been logged in our issue management system for further investigation and resolution. The ticket ID has been linked with this thread so that you will receive notification as soon as the ticket is resolved.

We are sorry for the inconvenience.

The issues you have found earlier (filed as PDFNET-44992) have been fixed in Aspose.PDF for .NET 22.7.