Replace short text with long text

Hi


I’m currently running a trial version og Aspose PDF to see if we can use it for our future document handling system…

First of all I’m trying to do a simple search replace…

I have a placeholder in an existing PDF lest call it ##TextPart1

Now I do a replace of “##TextPart1” but need to replace it with a large text with no line breaks…

This works almost, since all the text is in one line, and no line breaks are made at the page margin…
This means that the text disappears outside the document.

I’m using the code from last reply in this Post :



Is there any way to get around this. Since I need to use Aspose for handling replacements in PDF template files if it is possible… And I have to see replacements as one liners for which automatic line break is occurring…

Thanks in advance.


Hi Allan,


Thanks for your inquiry. We will appreciate it if you please share your sample code and document here. It will help us to address your issue exactly.

We are sorry for the inconvenience caused.

Best Regards,

Hi Allan,


Thanks for using our API’s.

As per my understanding, when performing text replace, the large string is being truncated at page margin (Right margin) instead it wraps to subsequent lines ? Please acknowledge and also please share your sample files and code snippet so that we can test the scenario in our environment.

We are sorry for this inconvenience.

See attached input PDF and program.cs


Taken straight from your examples…

The long text just keeps running outside the right margin…

If I make the long text with line breaks I also want the text below ‘##TextForm.1’ to move downwards even onto a new page. Now the new paragraph is placed on top of the existing text…

I know that I initially thought that PDF’s react like a Word doc, and I now know this is not the case. I just need to know if what I want is possible with Aspose Pdf, otherwise I must look for another solution to this, perhaps working with Word instead.

But since this is for a browser based system I really think that PDF is the best choice…




Hi Allan,


Thanks for sharing the resource files.

<span style=“font-size:10.0pt;font-family:“Arial”,“sans-serif””>I
have tested the scenario and I am able to reproduce the same problem. For the
sake of correction, I have logged it in our issue tracking system as PDFNEWNET-38246. We
will investigate this issue in details and will keep you updated on the status
of a correction. <o:p></o:p>

We apologize for your inconvenience.

Any news here?

Hi Oliver,


Thanks for your inquiry. I am afraid the subjected issue is still not resolved as product team is busy in investigating/resolving other issues in the queue. However we have recorded your concern and will notify you as soon as issue is resolved.

We are sorry for the inconvenience caused.

Best Regards,
Hi,

Any update?

Hi Oliver,


Thanks for your patience.

The issue reported earlier is still pending for review and is not yet resolved. However I have intimated the product team to share possible updates regarding its resolution and as soon as we have some further updates, we will let you know.

We are sorry for this delay and inconvenience.

@dr_oli

Thanks for your patience.

We are glad to inform you that your earlier reported issue PDFNET-38246 has been investigated and as per the investigation results, replacing of a text fragment with text paragraph is a good solution for scenarios of processing tag-words. It gives more control under replaced text. However one important thing should be taken into account during creating text paragraphs- it is necessary to set paragraph position (or rectangle) before appending any lines to the paragraph (Nevertheless, we have created PDFNET-43659 to investigate is it possible to remove this restriction in future.).

Please consider the following (corrected) code snippet with latest version Aspose.Pdf for .NET 17.12:

Document doc = new Document(myDir + "input.pdf");

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("##TextForm.1");

//accept the absorber for all the pages
doc.Pages.Accept(textFragmentAbsorber);

//get the text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;

//loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
    //create new text fragment with updated string that contains all necessary newline markers
    string updatedText = textFragment.Text.Replace("##TextForm.1", "dslkjf dsaælk sdaflkjd sfælkds fælskdf" +
                                                                    " æasdkf dasælfj dsalkf aæsdlk asædlk æasdlk" +
                                                                    " æsdalk æasdlk æsdalkæsdljf afjæsadfjæafjæsakfj" +
                                                                    " æslfjslfjdælkjsæ dkjsækfjdkfjsdlfj uuuuu");
    TextFragment updatedFragment = new TextFragment(updatedText);

    //set new text fragment properties if necessary
    updatedFragment.TextState.Font = FontRepository.FindFont("Arial");
    updatedFragment.TextState.FontSize = textFragment.TextState.FontSize;
    updatedFragment.TextState.LineSpacing = 6.32f;

    //set old fragment text to empty string
    textFragment.Text = String.Empty;

    //create TextParagraph object
    TextParagraph par = new TextParagraph();

    //set paragraph position
    par.Position = new Position(textFragment.Position.XIndent, textFragment.Position.YIndent
        - updatedFragment.TextState.FontSize - updatedFragment.TextState.LineSpacing);
    // Specify word wraping mode
    par.FormattingOptions.WrapMode = TextFormattingOptions.WordWrapMode.ByWords;

    //add new TextFragment to paragraph
    par.AppendLine(updatedFragment);

    //add the TextParagraph using TextBuilder
    TextBuilder textBuilder = new TextBuilder(textFragment.Page);
    textBuilder.AppendParagraph(par);
}
// Save resulting PDF document.
doc.Save(myDir + "38246_out.pdf");

38246_out.pdf (193.2 KB)

In case of any further assistance, please feel free to contact us.

Hi,

This topic sounds quite interesting to me testing the similar situation.
The (corrected) code snippet above worked almost OK for me except in some cases updatedText are overlapping to the next lines in the paragraph.
In order to avoid overlapping I believe I need to adjust the position more precisely with the correct properties and calculations of the fontSize, YIndent and LineSpacing, etc.
Would someone kindly advise me how to achieve nicer replacement from short source text to longer text without overlapping?
I’m using Aspose.PDF, Version=18.2.0.0.

Thank you very much in advance.

@KDSSHO

Thanks for contacting support.

Would you please share your sample PDF document and mention the text needs to be replaced. We will test the scenario in our environment and address it accordingly.

Thank you for your prompt reply. Please find the attached 2 files.
Input with shorter texts 2017112144212.pdf (105.0 KB)
Output with replaced longer texts 2017112144212_translated.pdf (206.5 KB)

What I want to do is to translate all the Japanese texts in 2017112144212.pdf to English texts 2017112144212_translated.pdf, keeping as similar layout as possible to the input file.
You will easily find the “overlapping” I mentioned in the middle of the output file.

As for my sample app, I initialized TextFragmentAbsorber just as follows, in order to search all the text.

TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(".+"); 

And then feed textFragment.Text to our external Japanese - English translation module as follows.

string updatedText = textTranslator.doTranslation(textFragment.Text.ToString());

Font I assigned to the updatedFragment was “MS UI Gothic”.
All the other operations are just like the above snippet.

I really appreciate your support!

@KDSSHO

Thanks for sharing further details.

We have managed to replicate the issue in our environment. However, it would be helpful if you can share your Japanese - English translation code snippet with us, so that we can log an issue with all details in our issue tracking system. We have translated the Japanese using following method, but it did not decode some special characters which seems to be decoded in output PDF document shared by you.

public static string TranslateText(string input, string langpair)
{
 string result = String.Empty;
 WebClient webClient = new WebClient();
 string url = String.Format("http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}", input, langpair);
 result = webClient.DownloadString(url);
 int bas = result.IndexOf("TRANSLATED_TEXT='") + "TRANSLATED_TEXT='".Length;
 int bit = result.Substring(bas).IndexOf("';var");
 result = result.Substring(bas, bit);
 return System.Web.HttpUtility.HtmlDecode(result).Trim();
}

2017112144212_translated.pdf (335.5 KB)

An output PDF generated in our environment has been attached for your reference.

Hi

Glad to hear that.

I’m afraid but I can not disclose Ja-En translation module itself because it is confidential. However I hope the following snippet (inside of textTranslator.doTranslation() method I showed you above) helps you just a bit more to see how the input and output strings are treated.

public string doTranslation(string strSrcText)
{
    byte[] bySrcText = System.Text.Encoding.UTF8.GetBytes(strSrcText);
    byte[] byDstText = new byte[strSrcText.Length * 10 + 256];

   // some setups here

   // external "confidential" translation module API,
   Translate(byDstText, bySrcText, "jaen");

   // some wrap-ups here

    return System.Text.Encoding.UTF8.GetString(byDstText);
}

Just in case you need my actual sample app code as follows, which is merely a copy of “(corrected) code snippet.”

public void translatePDF(string inputfilename, string outputfilename)
{
    // Open an existing PDF file
    Document doc = new Document(inputfilename);
    
    // Instantiate ParagraphAbsorber
    TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(".+");

    // Set text search option to specify regular expression usage
    textFragmentAbsorber.TextSearchOptions = new TextSearchOptions(true);

    // Accept the absorber for all the pages
    doc.Pages.Accept(textFragmentAbsorber);

    // Get the extracted text fragments
    TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
    
    // Loop through the fragments
    foreach (TextFragment textFragment in textFragmentCollection)
    {
        string translatedText = textFragment.Text;

        translatedText = textTranslator.doTranslation(textFragment.Text.ToString());
        
        TextFragment updatedFragment = new TextFragment(translatedText);

        //set new text fragment properties if necessary
        updatedFragment.TextState.Font = FontRepository.FindFont("MS UI Gothic");
        updatedFragment.TextState.FontSize = textFragment.TextState.FontSize; //* (float)0.8;
                
        //updatedFragment.TextState.BackgroundColor = textFragment.TextState.BackgroundColor; // this throws an exception. I don't know why.
        updatedFragment.TextState.ForegroundColor = textFragment.TextState.ForegroundColor;
        updatedFragment.TextState.StrokingColor = textFragment.TextState.StrokingColor;
        updatedFragment.TextState.LineSpacing = textFragment.TextState.LineSpacing;
        //updatedFragment.TextState.LineSpacing = 6.32f;

        //set old fragment text to empty string
        textFragment.Text = String.Empty;

        //create TextParagraph object
        TextParagraph par = new TextParagraph();

        //set paragraph position
        //par.Position = new Position(textFragment.Position.XIndent, textFragment.Position.YIndent - updatedFragment.TextState.FontSize - updatedFragment.TextState.LineSpacing);
        par.Position = new Position(textFragment.Position.XIndent, textFragment.Position.YIndent); // trying to reproduce the original text position
        // Specify word wraping mode
        par.FormattingOptions.WrapMode = TextFormattingOptions.WordWrapMode.ByWords;

        //add new TextFragment to paragraph
        par.AppendLine(updatedFragment);

        //add the TextParagraph using TextBuilder
        TextBuilder textBuilder = new TextBuilder(textFragment.Page);
        textBuilder.AppendParagraph(par);

    }
    doc.Save(outputfilename);
    return;
}

I think there’s no problem for the investigation on your side if you change my

textTranslator.doTranslation(textFragment.Text.ToString());

to your Google Translate version below.

TranslateText(textFragment.Text.ToString(), "ja|en");

I couldn’t get the file.
It says “Sorry, this file is private. Only visible to topic owner and staff members.” when I hit the link.

@KDSSHO

Thanks for writing back.

We have logged an issue as PDFNET-44712 in our issue tracking system with all relevant details i.e code snippet, input/output documents. We will further investigate the issue and keep you informed with the status of its rectification.

We would also like to share with you that text replacement operations are expected to be improved and enhanced in upcoming release Aspose.PDF for .NET 18.6. However, we will look into the details of recently logged issue as well - if this can be resolved in upcoming release. As soon as we have some definite updates regarding its resolution we will let you know. Please spare us little time.

Please make sure that you are properly logged in into website before downloading the attachment. OR try logging in again after logout. In case the issue still persists, please let us know. We will further look into this as well.

We are sorry for the inconvenience.

Thank you!

Is this topic below relevant to the planned update?

I regret that I have not succeeded yet. I once logged out, then logged in again and tried, but same message still appears.

@KDSSHO

Thanks for getting back to us.

Referred topic is regarding performance of the API while text replacement operations. Please note that the performance of the API varies in different type of scenarios. However, you will definitely receive respective updates in other topic regarding API performance.

Please download PDF from this link as per your convenience.

Thank you I finally received it!

And I found this helpful.

@KDSSHO

Thanks for your kind feedback.

As soon as we have some definite updates regarding issue resolution, we will surely inform you. Please spare us little time.

We are sorry for the inconvenience.