.net core c# beginner, needing to parse word doc


#1

I have a project where I would like to have a form upload a word doc to then parse it into clauses/paragraphs (HTML) that will be stored in SQL in the same clause/paragraph parts as HTML. I am currently tasked to write a program similar to one written before me where the programmer chose to use mammoth and store it to a noSQL solution. Where do i start with aspose.words to achieve this? thanks so for your time.

I would also like to use the aspose pdf product to do the same thing.


#2

@Fett10,

For this case, Aspose.Words for .NET API’s code is pretty straightforward. You may first store HTML strings of every Paragraph found in Word document in a ArrayList and then store list items in database.

ArrayList htmls = new ArrayList();
Document doc = new Document("E:\\temp\\in.docx");

HtmlSaveOptions opts = new HtmlSaveOptions(SaveFormat.Html);
opts.PrettyFormat = true;
opts.ExportImagesAsBase64 = true; // etc
foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))
{
    htmls.Add(para.ToString(opts));
}

Please also refer to Aspose.Words’ documentation.

Please post your Aspose.PDF related queries in Aspose.PDF forum where you will be guided appropriately.


#3

thank you very much, how would i use new Document() to add the uploaded document as a stream from ```
[HttpPost]
public async Task PostFormData([FromForm] IFormFile file)
{

using (var sr = new StreamReader(file.OpenReadStream()))
{
var content = await sr.ReadToEndAsync();

for example?

thanks again for your reply


#4

@Fett10,

The Document class represents a document loaded into memory. Document has several overloaded constructors allowing you to create a blank document or to load it from a file or stream. For more details, please refer to the following article:

Open Word Document from a Stream