Hi Team,
I see a warning message while converting PDF to Word document
** (process:1): WARNING **: 18:17:27.711: Requested 0 bytes. Maximum size for region is 262144 bytes.
The conversion code is run from AWS EKS. I have no idea what does the warning means and does it mean something wrong with the conversion? Please share your ideas and solution.
ASPOSE.PDF version 22.10.0
mcr.microsoft.com/dotnet/aspnet:6.0
Thanks
Raj
@rajkumar.vedhasiva
Such warning may occur while dealing with memory stream like reading files from databases, etc. It does not mean that conversion is not done correctly. You can also share your PDF document with us and we can try to replicate the issue in our environment if you are noticing any anomaly in the generated .doc/.docx files.
Sure. Before sharing the file, I encountered a serious issue of memory consumption increase after document conversion the objects are not deallocated and results in restart of k8s pod. I’m sharing code used for conversion
public byte[] PdfToWordDocument(byte[] byteValue)
{
using MemoryStream docStream = new();
{
using Stream pdfStream = new MemoryStream(byteValue);
{
using Document pdf = new(pdfStream, true);
{
DocSaveOptions so = new()
{
Format = DocSaveOptions.DocFormat.DocX,
Mode = DocSaveOptions.RecognitionMode.Flow
};
pdf.OptimizeSize = true;
pdf.Optimize();
pdf.OptimizeResources();
pdf.Save(docStream, so);
byte[] result = docStream.GetBuffer();
pdf.FreeMemory();
pdf.Dispose();
return result;
}
}
}
}
The kube pod configuration is
resources:
limits:
memory: “1G”
cpu: “500m”
Please share your feedback and resolution techniques.
@rajkumar.vedhasiva
Is it happening with every PDF? Can you please share sample PDF for our reference as well? We will test the scenario in our environment and address it accordingly.
The pdf file is confidential and classified. I’m not allowed to share.
I’m sharing the infrastructure details and how the program is setup.
Running a .NET 6 worker service to read the pdf byte[] from database and use ASPOSE.PDF api to convert it to docx byte[] push the converted content to datastore.
The service is configured to sleep for few seconds and poll the database for any new pdf byte[] content.
Hosted the worker service image in AWS EKS with CPU 500mi and 1G memory. The memory size constantly grows and at one point the pod gets re-provisioned. Attached image
image.png (111.7 KB)
The above image is taken from DataDog kubernetes pod.
@rajkumar.vedhasiva
An investigation ticket as PDFNET-52939 has been logged in our issue tracking system for the sake of further analysis against this case. We will look into the details of the ticket and keep you posted with the status of ticket resolution. Please be patient and spare us some time.
We are sorry for the inconvenience.
Any update available on the defect?
@rajkumar.vedhasiva
The ticket has recently been logged in our issue management system and it will be investigated/resolved on a first come first serve basis. As soon as we make some progress towards its fix, we will update you via this forum thread. Please be patient and spare us some time.
We are sorry for the inconvenience.
Any update on the ticket.
@rajkumar.vedhasiva
We are afraid that the earlier logged ticket could not get resolved due to other pending issues in the queue. Nevertheless, we will surely inform you once we make significant progress towards ticket resolution. Please be patient and spare us some time.
We are sorry for the inconvenience.
I totally understood your situation.
Please answer, Any plans to move out of System.Drawing dependency in Aspose.PDF? We are facing tough challenges with workloads running on AWS Lambda with linux dependency. If yes, any tentative release month (or) date
@rajkumar.vedhasiva
We would like share with you that we have launched Aspose.PDF.Drawing API that you can install through NuGet. It has dependency upon Aspose.Drawing component instread of System.Drawing.Common. We have launched it for non-Windows environments where System.Drawing.Common is no more supported. You can uninstall Aspose.PDF and replace it with Aspose.PDF.Drawing. Once things are finalized, we will integrate it permanently in Aspose.PDF.
Thank you asad for the details.
The Aspose.PDF.Drawing has intermittent CPU spikes in windows server ec2. Any issues reported like the same.
@rajkumar.vedhasiva
Since the API is under testing and beta phase, some issues are expected and we are welcoming our customers to report them if they facing. Please share some sample files and code snippet along with screenshots of memory spikes. We will log it for investigation and share the ID with you.