.net core c# beginner, needing to parse word doc


I have a project where I would like to have a form upload a word doc to then parse it into clauses/paragraphs (HTML) that will be stored in SQL in the same clause/paragraph parts as HTML. I am currently tasked to write a program similar to one written before me where the programmer chose to use mammoth and store it to a noSQL solution. Where do i start with aspose.words to achieve this? thanks so for your time.

I would also like to use the aspose pdf product to do the same thing.



For this case, Aspose.Words for .NET API’s code is pretty straightforward. You may first store HTML strings of every Paragraph found in Word document in a ArrayList and then store list items in database.

ArrayList htmls = new ArrayList();
Document doc = new Document("E:\\temp\\in.docx");

HtmlSaveOptions opts = new HtmlSaveOptions(SaveFormat.Html);
opts.PrettyFormat = true;
opts.ExportImagesAsBase64 = true; // etc
foreach (Paragraph para in doc.GetChildNodes(NodeType.Paragraph, true))

Please also refer to Aspose.Words’ documentation.

Please post your Aspose.PDF related queries in Aspose.PDF forum where you will be guided appropriately.


thank you very much, how would i use new Document() to add the uploaded document as a stream from ```
public async Task PostFormData([FromForm] IFormFile file)

using (var sr = new StreamReader(file.OpenReadStream()))
var content = await sr.ReadToEndAsync();

for example?

thanks again for your reply



The Document class represents a document loaded into memory. Document has several overloaded constructors allowing you to create a blank document or to load it from a file or stream. For more details, please refer to the following article:

Open Word Document from a Stream