Loading from plain-text file into Aspose.Words.Document

Hello Aspose Team,

I have several plain-text files with FF (FormFeed ALT + 012) as a new page symbol, I need to convert them into another format, for instance PDF and make sure new pages are inserted accordingly.

If I use a string to store the content of the file (string BodyText = File.ReadAllText(sourceFile) ) and then a DocumentBuilder to write that text into a Document everything works as expected,
FormFeeds are rendered correctly and content shows in its corresponding page.
However, when working with big input files (100 MB +) I get an exception at

builder.Writeln(BodyText);
wordDoco.Save(FileName, Aspose.Words.SaveFormat.Pdf); // OutOfMemoryException here
If I load the input text file directly into the Document (Document wordDoco = new Document(sourceFile)) object and then save as PDF it correctly creates the file but it ignores the NewPage-FormFeed symbols.
I tried using loadOPtions with different Types and Encodings but I have not managed to get it to write FormFeeds into the output document.

Any help would be greatly appreciated.

Thanks

Hi Giovanni,


Thanks for your inquiry. I suggest you please use the second way i.e. load text file directly in Document constructor. Have you tried latest version of Aspose.Words 15.8.0?
http://www.aspose.com/community/files/51/.net-components/aspose.words-for-.net/default.aspx

In case the problem still remains, please ZIP and attach your input file and output Pdf file showing the undesired behavior here for testing. We will investigate the issue on our end and provide you more information.

Best regards,

Hello Awais,


I tried using both DocumentBuilder and Document Constructor.

When I used DocumentBuilder FF (formfeed) is identified and a new pages are inserted in the resulting PDF (see attachment TextInputFile.txt.Builder.PDF) but if a use a big 100MB file, it fails OutOfMemoryException.

When I used Document contructor then FF (formfeed) is not identified and all text gets put in one page (see attchment TextInputFile.txt.Constructor.PDF). This works OK for big files but FF are not identified.

I need to work with Big Files and have new pages everytime a FF is find in the input document.

public void ConvertUsingDocuBuilder(string SourceFileName, string DestFileName)
{
string BodyText = File.ReadAllText(SourceFileName);

LoadAsposeWordLicense();
Aspose.Words.Document wordDoco = new Aspose.Words.Document();
Aspose.Words.DocumentBuilder builder = new DocumentBuilder(wordDoco);
builder.Writeln(BodyText);
wordDoco.Save(DestFileName, Aspose.Words.SaveFormat.Pdf);
}

public void ConvertUsingDocoConstructor(string SourceFileName, string DestFileName)
{
LoadAsposeWordLicense();
Aspose.Words.Document wordDoco = new Aspose.Words.Document(SourceFileName);
wordDoco.Save(DestFileName, Aspose.Words.SaveFormat.Pdf);
}

Thank you for your help!


Hi Giovanni,


Thanks for your inquiry. After an initial test with Aspose.Words for .NET 15.8.0, I was unable to reproduce this issue on my side (please see attached 15.8.0.pdf). I would suggest you please upgrade to the latest version of Aspose.Words. You can download it from the following link. I hope, this helps.

I used the following code to generate 15.8.0.pdf on my end:

Document doc = new Document(MyDir + @“TextInputFile.txt”);

doc.Save(MyDir + @"15.8.0.pdf");


Best regards,
Hi Awais.

I used the latest version and it worked fine with small text input files.
Now when I tried big text input files (100 MB+) it keeps failing on doc.Save method with OutOfMemoryException.

I tried using the SaveOptions to use a Temporary folder but the temp folder i can't see any temp file being written to temp folder and it is still failing OutOfMemoryException:

LoadAsposeWordLicense();
Document wordDoco = new Document(SourceFileName);
SaveOptions mySaveOPtion = new PdfSaveOptions();
mySaveOPtion.TempFolder = MyAsposeTemp;
mySaveOPtion.SaveFormat = SaveFormat.Pdf;
wordDoco.Save(DestFileName, mySaveOPtion); //Fails Here OutOfMemoryException

Is there anything you can suggest? Am I missing a setting to make it work?

Thank you,
Gio

Hi Giovanni,


Thanks for your inquiry. The SaveOptions.TempFolder property works only when saving to a DOC or DOCX file. It will have no effect during saving to PDF.

Provided the limited RAM, the document you are trying to convert to PDF is simply too large. Please note that when you render (to PDF) a document Aspose.Words needs to build two model in the memory – one for document and the other for rendered document. That is why Aspose.Words utilizes more memory when you render a document than when you simply save a document in flow formats (such as DOC/DOCX).

You should also note that usually Aspose.Words needs few times more memory than document size to build model of the document in memory. For example if your document’s size is 1 MB, Aspose.Words needs 10-20 MB of RAM to build its DOM in memory. Multiplier depends on format because some formats are more compact than others. For example DOCX format is more compact than DOC and RTF, and DOC is more compact than RTF.

So I am not sure the issue you reported can be resolved in Aspose.Words. I would advise you to use few small documents instead of one huge document.

Best regards,