All you technically have to do is create a PDF document and then use document.FreeMemory() and document.Dispose() and watch the fact that the memory is not released at all. You can even set document = null; after and it still doesn’t release memory. Also you can see my incredibly excessive and somewhat dangerous use of GC.Collect() and GC.WaitForPendingFinalizers() everywhere. This is to try and force garbage collection which still does not release the memory.
If you really want to make the system work use the code I sent previously that is finding and replacing text and watch the fact that it doesn’t release it’s memory. This is becoming incredibly urgent for us. At this point I have been looking for any other solution to this problem. I’m starting to look into other libraries to make this work because it is causing havoc with our customers that we service. Just so you know my code is very messy right now because I have re-written it over and over again attempting to find a workaround to the problem.
In the below code I have added comments so you can see where the problems are. Keep in mind my code is so crazy where I’m creating a document and saving out pages and then attempting to process individual pages to work around this horrible problem.
private async Task ReplaceAttachmentWithSignatures(CompileAttachment compileAttachment)
{
var savePath = GetSavePath();
var tempPath = Path.Combine(savePath, $"temp.pdf");
var flattenedFile = Path.Combine(savePath, "FlattenedPdf.pdf");
var filePath = Path.Combine(savePath, "ToBeFlattenedPdf.pdf");
await _azureProvider.DownloadToFileAsync(compileAttachment.AzurePdfPath, filePath);
try
{
//In this using the tempDocument never releases it's memory until randomly the GC.Collect() in the finally runs.
//Randomly meaning that can run several times and maybe the 6th time I see the memory finally go down again.
using (var tempDocument = new Aspose.Pdf.Document(filePath))
{
File.Delete(filePath);
//pdfForm is the same as above
using (var pdfForm = new Form())
{
ELSLogHelper.InsertInfoLog(ELSLogHelper.AsposeLogMessage("Open"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
pdfForm.BindPdf(tempDocument);
pdfForm.FlattenAllFields();
pdfForm.Save(flattenedFile);
ELSLogHelper.InsertInfoLog(ELSLogHelper.AsposeLogMessage("Save"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
}
}
ELSLogHelper.InsertInfoLog(ELSLogHelper.AsposeLogMessage("Open"), MethodBase.GetCurrentMethod()?.Name, MethodBase.GetCurrentMethod().DeclaringType?.Name, Environment.StackTrace);
//Again document is the same as above.
using (var document = new Aspose.Pdf.Document(flattenedFile))
{
File.Delete(flattenedFile);
foreach (var page in document.Pages)
{
var pdfFilePath = Path.Combine(savePath, $"{page.Number}.pdf");
//newDocument is the same as above and I have watched as this ran over and over again and despite all my calls to relase memory and dispose it it just maintains the memory.
var newDocument = new Aspose.Pdf.Document();
newDocument.Pages.Add(page);
newDocument.Optimize();
newDocument.Save(pdfFilePath);
newDocument.FreeMemory();
newDocument.Dispose();
newDocument = null;
GC.Collect();
GC.WaitForPendingFinalizers();
}
}
var filesToProcess = Directory.GetFiles(savePath, "*.pdf", SearchOption.TopDirectoryOnly).OrderBy(x => Convert.ToInt32(Path.GetFileNameWithoutExtension(x))).ToArray();
ConcatanatePdfFiles(filesToProcess, compileAttachment);
}
catch (Exception ex)
{
var logManagerModel = new LogManagerModel
{
Exception = ex,
ExceptionData = new Dictionary<string, string>()
{
{ "Message", $"Failed to replace attachment with magic tag value." },
{ "CallerMemberName", $"{typeof(BaseCompiler).FullName}" },
{ "CallerMethodName", $"{MethodBase.GetCurrentMethod()?.Name}" },
{ "CallerLineNumber", $"{new StackTrace(ex, true).GetFrame(0).GetFileLineNumber()}" }
}
};
_customerCallContext.LogManager.Error(logManagerModel);
File.Delete(tempPath);
GC.Collect();
GC.WaitForPendingFinalizers();
throw;
}
finally
{
await _azureProvider.SaveAzureFileAsync(compileAttachment.AzurePdfPath, File.ReadAllBytes(tempPath));
File.Delete(tempPath);
GC.Collect();
GC.WaitForPendingFinalizers();
}
}
private void ProcessFilesInDirectory(string fileToProcess, CompileAttachment compileAttachment)
{
var savePath = GetSavePath();
var newPdfDocument = new Aspose.Pdf.Document(fileToProcess);
var pdfFilePath = Path.Combine(savePath, $"{newPdfDocument.Pages[1].Number}.pdf");
var magicTag = compileAttachment.MagicTags.FirstOrDefault(x => x.Tag == "{{item.number}}");
HandlePdfRightAlignmentText(newPdfDocument.Pages[1], compileAttachment);
if (compileAttachment.MagicTags != null)
{
var editor = new PdfContentEditor();
//Replace all the matching keys in the text
//****THIS IS THE HUGE PROBLEM HERE****
//This consumes enormouse amounts of memory and despite all my attempts below it never gives the memory back.
//I have a 90'ish mb file that turns into 11GB during this process.
editor.BindPdf(newPdfDocument);
editor.ReplaceTextStrategy.ReplaceScope = ReplaceTextStrategy.Scope.ReplaceAll;
editor.ReplaceText(magicTag.Tag, magicTag.Value ?? "");
editor.Document.Optimize();
editor.Save(pdfFilePath);
editor.Document.FreeMemory();
editor.Document.Dispose();
editor.Dispose();
GC.Collect();
GC.WaitForPendingFinalizers();
}
}
private void ConcatanatePdfFiles(string[] filesToProcess, CompileAttachment compileAttachment)
{
var savePath = GetSavePath();
var tempPath = Path.Combine(savePath, $"temp.pdf");
try
{
foreach (var fileToProcess in filesToProcess)
{
ProcessFilesInDirectory(fileToProcess, compileAttachment);
}
//pdfFileEditor does the same thing. I watch when this runs and my memory climbs and despite setting pdfFileEditor = null the memory is never given back.
var pdfFileEditor = new PdfFileEditor();
pdfFileEditor.CloseConcatenatedStreams = true;
pdfFileEditor.UseDiskBuffer = true;
pdfFileEditor.Concatenate(filesToProcess, tempPath);
pdfFileEditor = null;
filesToProcess.ForEach(x =>
{
File.Delete(x);
});
}
catch (Exception ex)
{
var logManagerModel = new LogManagerModel
{
Exception = ex,
ExceptionData = new Dictionary<string, string>()
{
{ "Message", $"Failed to process each page of {compileAttachment.Name}; FileId: {compileAttachment.DocumentId}." },
{ "CallerMemberName", $"{typeof(BaseCompiler).FullName}" },
{ "CallerMethodName", $"{MethodBase.GetCurrentMethod()?.Name}" },
{ "CallerLineNumber", $"{new StackTrace(ex, true).GetFrame(0).GetFileLineNumber()}" }
}
};
_customerCallContext.LogManager.Error(logManagerModel);
filesToProcess.ForEach(x =>
{
File.Delete(x);
});
File.Delete(tempPath);
GC.Collect();
GC.WaitForPendingFinalizers();
throw;
}
finally
{
GC.Collect();
GC.WaitForPendingFinalizers();
}
}
public void HandlePdfRightAlignmentText(Aspose.Pdf.Page page, CompileAttachment compileAttachment)
{
// Create TextAbsorber object to find all instances of the input search phrase
// Regex pattern like [[ any text {{magictag}} ]]
Aspose.Pdf.Text.TextFragmentAbsorber textFragmentAbsorber = new Aspose.Pdf.Text.TextFragmentAbsorber(@"\[+\[+\!+[^\[\]]+\!+\]+\]");
//pragraph alignment
textFragmentAbsorber.TextReplaceOptions.ReplaceAdjustmentAction = Aspose.Pdf.Text.TextReplaceOptions.ReplaceAdjustment.WholeWordsHyphenation;
//enabling regex search
Aspose.Pdf.Text.TextSearchOptions textSearchOptions = new Aspose.Pdf.Text.TextSearchOptions(true);
textFragmentAbsorber.TextSearchOptions = textSearchOptions;
// Accept the absorber for all the pages
page.Accept(textFragmentAbsorber);
//converting the magic tags to key value pair dic
Dictionary<string, string> dic = new Dictionary<string, string>();
if (compileAttachment.MagicTags != null)
{
dic = compileAttachment.MagicTags.DistinctBy(x => new { x.Tag, x.Value }).ToDictionary(x => x.Tag, y => y.Value);
}
// Get the extracted text fragments
Aspose.Pdf.Text.TextFragmentCollection textFragmentCollection = textFragmentAbsorber.TextFragments;
// Loop through the fragments
foreach (Aspose.Pdf.Text.TextFragment textFragment in textFragmentCollection)
{
// Replace the [[! !]] from the text
var replacedText = textFragment.Text.Replace(@"[[!", "").Replace(@"!]]", "");
textFragment.HorizontalAlignment = Aspose.Pdf.HorizontalAlignment.Right;
textFragment.TextState.Underline = false;
foreach (var k in dic.Keys.ToList())
{
if (replacedText.Contains(k))
{
var splittedText = replacedText.Split(new string[] { k }, StringSplitOptions.None);
var composedValue = string.Join("", splittedText) + "" + dic[k];
textFragment.TextState.HorizontalAlignment = Aspose.Pdf.HorizontalAlignment.Right;
var isUnderLine = false;
if (composedValue.Contains("u=1"))
{
composedValue = composedValue.Replace("u=1", "");
isUnderLine = true;
}
var indent = CustomerConfig.For(_customerCallContext).PdfXIndent;
double pdfXIndent = 45;
if (!string.IsNullOrEmpty(indent))
{
pdfXIndent = Convert.ToDouble(indent);
}
textFragment.Text = composedValue;
textFragment.TextState.Underline = isUnderLine;
textFragment.TextState.HorizontalAlignment = Aspose.Pdf.HorizontalAlignment.Right;
textFragment.Position = new Aspose.Pdf.Text.Position(page.Rect.LLX +
(page.Rect.Width - textFragment.Rectangle.Width - pdfXIndent), textFragment.Position.YIndent);
break;
}
}
}
}
Also when you call document.Convert() it CONSUMES HUGE MEMORY again. I took screenshot so you can see how much memory this application is consuming when I call document.Convert()
image.png (157.4 KB)
Here is the file that causes HUGE memory consumption even when I’m processing a single page at a time and the editor won’t release.
022024-Ordinance-24-attachment.pdf
Let me know if you need more examples. I’d be happy to provide anything additional to help get this resolved it is killing us right now. I have been spun up on this issue for over two weeks looking for a solution/workaround.