Convert Word, PDF to TXT | Define Maximum Length of Characters in Line of Text File (C# .NET or Java)

I have document configured with few content and then want to save that as txt format with fix length.
Requirement is to have 60 char in each length. I tried it using java but not able to restrict it to fix length.
how it can be done.

@amar651,

Please ZIP and attach your source PDF/Word document and your expected TXT file showing the desired output here for our reference. You can create this expected Text file manually by using Notepad etc. We will then start further investigation into your particular scenario/issue and provide you more information.

Data-TEXT60.zip (875 Bytes)
I have added the output file. So input file is one doc file which is having these content and want to print like attached text file.

@amar651,

We have logged your requirement in our issue tracking system. Your ticket number is WORDSNET-22090. We will further look into the details of this requirement and will keep you updated on the status of the linked ticket.

ok sure thanks.

@amar651,
You can use available features of Aspose.Words for getting a desired result. For example, you can use this code to get a txt file you attached from a related doc file.

Document doc = new Document(@"C:\Temp\in.docx");
const int maxChars = 60;
using (MemoryStream ms = new MemoryStream())
{
    // Save the document into MemoryStream in TXT format.
    doc.Save(ms, SaveFormat.Text);
    // Reset stream position since after saving stream position is at the end.
    ms.Position = 0;

    // Save the resulting string into a file, use stream writer to write to the file.
    using (FileStream fs = File.Create(@"C:\Temp\out.txt"))
    using (StreamWriter sw = new StreamWriter(fs))
    using (StreamReader reader = new StreamReader(ms))
    {
        do
        {
            string textLine = reader.ReadLine();
            // If the line is blank or contains only spaces, then write a newline and go to the next iteration 
            if (textLine.Trim().Length == 0)
            {
                sw.WriteLine();
                continue;
            }
            string[] words = textLine.Split(' ');

            int newLineLength = 0;
            foreach (string word in words)
            {
                // Do not add whitespace at the beginning.
                if (newLineLength != 0)
                {
                    sw.Write(" ");
                    newLineLength++;
                }

                if (newLineLength + word.Length < maxChars)
                {
                    sw.Write(word);
                    newLineLength += word.Length;
                }
                else
                {
                    sw.WriteLine();

                    sw.Write(word);
                    newLineLength = word.Length;
                }
            }
            if (newLineLength != 0)
                sw.WriteLine();

        }
        while (!reader.EndOfStream);
    }
}

This won’t work in my case it seems. Because in my case it might be output like:

Name : ABC XYZ
<blank> Pvt Ltd
Location : UK

I am trying to wrtie in Java.

@amar651,
It seems like you want to keep whitespaces at the beginning of the line. In that case you will need to add the following condition in the line 28 of the previous example:

if (word == "")
{
    sw.Write(" ");
    newLineLength++;
    continue;
}

Besides, you will need to change the way to add whitespaces between words, because otherwise the the following code will put an extra whitespace every time line starts not from words, but from whitespaces.

The final C# code will look like this:

Document doc = new Document(@"C:\Temp\in.docx");
const int maxChars = 60;
using (MemoryStream ms = new MemoryStream())
{
    // Save the document into MemoryStream in TXT format.
    doc.Save(ms, SaveFormat.Text);
    // Reset stream position since after saving stream position is at the end.
    ms.Position = 0;

    // Save the resulting string into a file, use stream writer to write to the file.
    using (FileStream fs = File.Create(@"C:\Temp\out.txt"))
    using (StreamWriter sw = new StreamWriter(fs))
    using (StreamReader reader = new StreamReader(ms))
    {
        do
        {
            string textLine = reader.ReadLine();
            // If the line is blank or contains only spaces, then write a newline and go to the next iteration 
            if (textLine.Trim().Length == 0)
            {
                sw.WriteLine();
                continue;
            }
            string[] words = textLine.Split(' ');

            int newLineLength = 0;
            foreach (string word in words)
            {
                //This is the condition we added
                if (word == "")
                {
                    sw.Write(" ");
                    newLineLength++;
                    continue;
                }

                if (newLineLength + word.Length < maxChars)
                {
                    sw.Write(word + " ");
                    newLineLength += (word.Length + 1);
                }
                else
                {
                    sw.WriteLine();

                    sw.Write(word + " ");
                    newLineLength = word.Length + 1;
                }
            }
            if (newLineLength != 0)
                sw.WriteLine();

        }
        while (!reader.EndOfStream);
    }

This code won’t delete excess whitespaces, if the line will contain more than 60 char. For example, the line

                                                           Management Liability

which starts from 60 whitespaces, will look like

                                                           
Management Liability

in resulting file.
But in the case of your file, it works good.

Hi @awais.hafeez, If you can guide me on any alternate solution so that I can try that.

@amar651,

Regarding WORDSNET-22090, there isn’t any workaround available at the moment; but, we have logged your concern in our issue tracking system and will keep posted here on any further updates. We apologize for any inconvenience.

The issues you have found earlier (filed as WORDSNET-22090) have been fixed in this Aspose.Words for .NET 21.6 update and this Aspose.Words for Java 21.6 update.

@amar651,

Regarding WORDSNET-22090, it is to inform you that the following new public property was added into the TxtSaveOptions class:

The following use-case explains how to limit maximum characters per line in output document of TXT format:

Document doc = new Document("C:\\Temp\\word.docx");

TxtSaveOptions saveOptions = new TxtSaveOptions();
saveOptions.setMaxCharactersPerLine(60);

doc.save("C:\\Temp\\text file with limited maximum characters in lines.txt", saveOptions);