Aspose Words in .NET showing High CPU and High Memory while creating Aspose.Words.Document object

Hi team,

I am trying to extract text content for a 100 mb file which using Aspose.Words using below code

public override string GetFileContent(string filePath)
{
    string extractedText;
    // Open document
    try
    {
        var doc = new Aspose.Words.Document(filePath);
        extractedText = doc.ToString(SaveFormat.Text);
    }
    catch (Exception ex)
    {
        extractedText = "[Error]" + ex.ToString();
    }

    return extractedText;
} 

When executing the line var doc = new Aspose.Words.Document(filePath); the CPU and memory shoots up extensively.

I tried to optimize the memory by using below code by breaking the file in chunks -

public override string GetFileContent(string filePath)
{
    StringBuilder extractedText = new StringBuilder();
    const int bufferSize = 4 * 1024 * 1024; // 4 MB chunks (adjust based on your needs)

    try
    {
        // Open the file stream for reading
        using (var fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read))
        {
            byte[] buffer = new byte[bufferSize];
            int bytesRead;

            // Read the file in chunks
            while ((bytesRead = fileStream.Read(buffer, 0, buffer.Length)) > 0)
            {
                // Create a memory stream for the current chunk
                using (var chunkStream = new MemoryStream(buffer, 0, bytesRead))
                {
                    // Load the document from the chunk stream
                    var doc = new Aspose.Words.Document(chunkStream);
                    //extractedText.Append(doc.ToString(SaveFormat.Text));
                    // Process the document and extract text
                    foreach (Aspose.Words.Section section in doc.Sections)
                    {
                        foreach (Aspose.Words.Paragraph paragraph in section.Body.Paragraphs)
                        {
                            extractedText.Append(paragraph.GetText());
                        }
                    }
                }
            }
        }
    }
    catch (Exception ex)
    {
        return "[Error] " + ex.Message;
    }

    return extractedText.ToString();
}

The above approach provides some optimization in terms of memory but CPU utilization still remains high.
Can you suggest any other alternatives. I am using the 23.7 version of Aspose Total

@charu.sharma Memory and CPU utilization fully depends on input document file size, format and complexity. Aspose.Words always allocates more memory that actual document size. This is expected. Please see our documentation for more information:
https://docs.aspose.com/words/net/memory-requirements/
For reducing memory usage upon processing extremally large documents, you can try using LoadOptions.TempFolder , SaveOptions.TempFolder and SaveOptions.MemoryOptimization properties.

@alexey.noskov - is there any way where I can create an object of Aspose page wise ? Meaning that for instance I want to read content of only first 10 pages, in this case I do not want to create Aspose object for the entire file which might make my text extraction process faster.
Please guide.

@charu.sharma No, unfortunately, there is no way to load document part by part. Aspose.Words can load only the whole document into the DOM.

Is there any possibility for such functionality to be available in the future ?

@charu.sharma I am afraid, it is not likely such functionality will be available in the future.