Hi
Hi Allan,
Hi Allan,
See attached input PDF and program.cs
Hi Allan,
<span style=“font-size:10.0pt;font-family:“Arial”,“sans-serif””>I
have tested the scenario and I am able to reproduce the same problem. For the
sake of correction, I have logged it in our issue tracking system as PDFNEWNET-38246. We
will investigate this issue in details and will keep you updated on the status
of a correction. <o:p></o:p>
We apologize for your inconvenience.
Any news here?
Hi Oliver,
Hi Oliver,
@dr_oli
Thanks for your patience.
We are glad to inform you that your earlier reported issue PDFNET-38246 has been investigated and as per the investigation results, replacing of a text fragment with text paragraph is a good solution for scenarios of processing tag-words. It gives more control under replaced text. However one important thing should be taken into account during creating text paragraphs- it is necessary to set paragraph position (or rectangle) before appending any lines to the paragraph (Nevertheless, we have created PDFNET-43659 to investigate is it possible to remove this restriction in future.).
Please consider the following (corrected) code snippet with latest version Aspose.Pdf for .NET 17.12:
Document doc = new Document(myDir + "input.pdf");
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber("##TextForm.1");
//accept the absorber for all the pages
doc.Pages.Accept(textFragmentAbsorber);
//get the text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
//loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
//create new text fragment with updated string that contains all necessary newline markers
string updatedText = textFragment.Text.Replace("##TextForm.1", "dslkjf dsaælk sdaflkjd sfælkds fælskdf" +
" æasdkf dasælfj dsalkf aæsdlk asædlk æasdlk" +
" æsdalk æasdlk æsdalkæsdljf afjæsadfjæafjæsakfj" +
" æslfjslfjdælkjsæ dkjsækfjdkfjsdlfj uuuuu");
TextFragment updatedFragment = new TextFragment(updatedText);
//set new text fragment properties if necessary
updatedFragment.TextState.Font = FontRepository.FindFont("Arial");
updatedFragment.TextState.FontSize = textFragment.TextState.FontSize;
updatedFragment.TextState.LineSpacing = 6.32f;
//set old fragment text to empty string
textFragment.Text = String.Empty;
//create TextParagraph object
TextParagraph par = new TextParagraph();
//set paragraph position
par.Position = new Position(textFragment.Position.XIndent, textFragment.Position.YIndent
- updatedFragment.TextState.FontSize - updatedFragment.TextState.LineSpacing);
// Specify word wraping mode
par.FormattingOptions.WrapMode = TextFormattingOptions.WordWrapMode.ByWords;
//add new TextFragment to paragraph
par.AppendLine(updatedFragment);
//add the TextParagraph using TextBuilder
TextBuilder textBuilder = new TextBuilder(textFragment.Page);
textBuilder.AppendParagraph(par);
}
// Save resulting PDF document.
doc.Save(myDir + "38246_out.pdf");
38246_out.pdf (193.2 KB)
In case of any further assistance, please feel free to contact us.
Hi,
This topic sounds quite interesting to me testing the similar situation.
The (corrected) code snippet above worked almost OK for me except in some cases updatedText are overlapping to the next lines in the paragraph.
In order to avoid overlapping I believe I need to adjust the position more precisely with the correct properties and calculations of the fontSize, YIndent and LineSpacing, etc.
Would someone kindly advise me how to achieve nicer replacement from short source text to longer text without overlapping?
I’m using Aspose.PDF, Version=18.2.0.0.
Thank you very much in advance.
Thanks for contacting support.
Would you please share your sample PDF document and mention the text needs to be replaced. We will test the scenario in our environment and address it accordingly.
Thank you for your prompt reply. Please find the attached 2 files.
Input with shorter texts 2017112144212.pdf (105.0 KB)
Output with replaced longer texts 2017112144212_translated.pdf (206.5 KB)
What I want to do is to translate all the Japanese texts in 2017112144212.pdf to English texts 2017112144212_translated.pdf, keeping as similar layout as possible to the input file.
You will easily find the “overlapping” I mentioned in the middle of the output file.
As for my sample app, I initialized TextFragmentAbsorber just as follows, in order to search all the text.
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(".+");
And then feed textFragment.Text to our external Japanese - English translation module as follows.
string updatedText = textTranslator.doTranslation(textFragment.Text.ToString());
Font I assigned to the updatedFragment was “MS UI Gothic”.
All the other operations are just like the above snippet.
I really appreciate your support!
@KDSSHO
Thanks for sharing further details.
We have managed to replicate the issue in our environment. However, it would be helpful if you can share your Japanese - English translation code snippet with us, so that we can log an issue with all details in our issue tracking system. We have translated the Japanese using following method, but it did not decode some special characters which seems to be decoded in output PDF document shared by you.
public static string TranslateText(string input, string langpair)
{
string result = String.Empty;
WebClient webClient = new WebClient();
string url = String.Format("http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}", input, langpair);
result = webClient.DownloadString(url);
int bas = result.IndexOf("TRANSLATED_TEXT='") + "TRANSLATED_TEXT='".Length;
int bit = result.Substring(bas).IndexOf("';var");
result = result.Substring(bas, bit);
return System.Web.HttpUtility.HtmlDecode(result).Trim();
}
2017112144212_translated.pdf (335.5 KB)
An output PDF generated in our environment has been attached for your reference.
Hi
Glad to hear that.
I’m afraid but I can not disclose Ja-En translation module itself because it is confidential. However I hope the following snippet (inside of textTranslator.doTranslation() method I showed you above) helps you just a bit more to see how the input and output strings are treated.
public string doTranslation(string strSrcText)
{
byte[] bySrcText = System.Text.Encoding.UTF8.GetBytes(strSrcText);
byte[] byDstText = new byte[strSrcText.Length * 10 + 256];
// some setups here
// external "confidential" translation module API,
Translate(byDstText, bySrcText, "jaen");
// some wrap-ups here
return System.Text.Encoding.UTF8.GetString(byDstText);
}
Just in case you need my actual sample app code as follows, which is merely a copy of “(corrected) code snippet.”
public void translatePDF(string inputfilename, string outputfilename)
{
// Open an existing PDF file
Document doc = new Document(inputfilename);
// Instantiate ParagraphAbsorber
TextFragmentAbsorber textFragmentAbsorber = new TextFragmentAbsorber(".+");
// Set text search option to specify regular expression usage
textFragmentAbsorber.TextSearchOptions = new TextSearchOptions(true);
// Accept the absorber for all the pages
doc.Pages.Accept(textFragmentAbsorber);
// Get the extracted text fragments
TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
// Loop through the fragments
foreach (TextFragment textFragment in textFragmentCollection)
{
string translatedText = textFragment.Text;
translatedText = textTranslator.doTranslation(textFragment.Text.ToString());
TextFragment updatedFragment = new TextFragment(translatedText);
//set new text fragment properties if necessary
updatedFragment.TextState.Font = FontRepository.FindFont("MS UI Gothic");
updatedFragment.TextState.FontSize = textFragment.TextState.FontSize; //* (float)0.8;
//updatedFragment.TextState.BackgroundColor = textFragment.TextState.BackgroundColor; // this throws an exception. I don't know why.
updatedFragment.TextState.ForegroundColor = textFragment.TextState.ForegroundColor;
updatedFragment.TextState.StrokingColor = textFragment.TextState.StrokingColor;
updatedFragment.TextState.LineSpacing = textFragment.TextState.LineSpacing;
//updatedFragment.TextState.LineSpacing = 6.32f;
//set old fragment text to empty string
textFragment.Text = String.Empty;
//create TextParagraph object
TextParagraph par = new TextParagraph();
//set paragraph position
//par.Position = new Position(textFragment.Position.XIndent, textFragment.Position.YIndent - updatedFragment.TextState.FontSize - updatedFragment.TextState.LineSpacing);
par.Position = new Position(textFragment.Position.XIndent, textFragment.Position.YIndent); // trying to reproduce the original text position
// Specify word wraping mode
par.FormattingOptions.WrapMode = TextFormattingOptions.WordWrapMode.ByWords;
//add new TextFragment to paragraph
par.AppendLine(updatedFragment);
//add the TextParagraph using TextBuilder
TextBuilder textBuilder = new TextBuilder(textFragment.Page);
textBuilder.AppendParagraph(par);
}
doc.Save(outputfilename);
return;
}
I think there’s no problem for the investigation on your side if you change my
textTranslator.doTranslation(textFragment.Text.ToString());
to your Google Translate version below.
TranslateText(textFragment.Text.ToString(), "ja|en");
I couldn’t get the file.
It says “Sorry, this file is private. Only visible to topic owner and staff members.” when I hit the link.
Thanks for writing back.
We have logged an issue as PDFNET-44712 in our issue tracking system with all relevant details i.e code snippet, input/output documents. We will further investigate the issue and keep you informed with the status of its rectification.
We would also like to share with you that text replacement operations are expected to be improved and enhanced in upcoming release Aspose.PDF for .NET 18.6. However, we will look into the details of recently logged issue as well - if this can be resolved in upcoming release. As soon as we have some definite updates regarding its resolution we will let you know. Please spare us little time.
Please make sure that you are properly logged in into website before downloading the attachment. OR try logging in again after logout. In case the issue still persists, please let us know. We will further look into this as well.
We are sorry for the inconvenience.
Thank you!
Is this topic below relevant to the planned update?
I regret that I have not succeeded yet. I once logged out, then logged in again and tried, but same message still appears.
@KDSSHO
Thanks for getting back to us.
Referred topic is regarding performance of the API while text replacement operations. Please note that the performance of the API varies in different type of scenarios. However, you will definitely receive respective updates in other topic regarding API performance.
Please download PDF from this link as per your convenience.
Thank you I finally received it!
And I found this helpful.
Thanks for your kind feedback.
As soon as we have some definite updates regarding issue resolution, we will surely inform you. Please spare us little time.
We are sorry for the inconvenience.