Hi Team, Getting below error while reading data of word
“: ‘Image file cannot be written to disk. When saving the node ImagesFolder should be specified or custom streams should be provided via ImageSavingCallback or ExportImagesAsBase64 option should be true. Please see documentation for details.’”
Code:
// Get bookmarks parent SDTs
StructuredDocumentTag startTag_From = (StructuredDocumentTag)bk_From.BookmarkEnd.GetAncestor(NodeType.StructuredDocumentTag);
StructuredDocumentTag endTag_To = (StructuredDocumentTag)bk_To.BookmarkStart.GetAncestor(NodeType.StructuredDocumentTag);
List<Node> extractedNodes = ExtractContentHelper.ExtractContent(startTag_From, endTag_To, false);
Document dstDoc = ExtractContentHelper.GenerateDocument(doc, extractedNodes);
StringBuilder sb = new StringBuilder();
sb.Append(dstDoc.ToString(new HtmlSaveOptions() { PrettyFormat = true }));
@pravinghadge
The error message you are encountering indicates that when saving images from a Word document, you need to specify a folder for the images or provide a custom stream through an ImageSavingCallback
. Alternatively, you can set the ExportImagesAsBase64
option to true, which allows images to be embedded directly in the output as Base64 strings.
To resolve this issue, you can follow these steps:
-
Specify an Images Folder: Ensure that you create a directory for the images and specify its path in your save options. This folder must exist and be empty before saving.
-
Use ImageSavingCallback: If you want more control over how images are saved, you can implement the ImageSavingCallback
interface to handle the saving process.
-
Set ExportImagesAsBase64: If you prefer to embed images directly in the HTML output, set the ExportImagesAsBase64
property to true in your HtmlSaveOptions
.
Here is an example of how to implement the first option by specifying an images folder:
// Create a directory for images
String imagesDir = "path/to/your/images/directory";
File imagesFolder = new File(imagesDir);
if (imagesFolder.exists()) {
// Optionally clear the folder if it already exists
for (File file : imagesFolder.listFiles()) {
file.delete();
}
} else {
imagesFolder.mkdirs(); // Create the directory if it does not exist
}
// Set up save options
HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.setImagesFolder(imagesDir);
saveOptions.setExportImagesAsBase64(false); // Set to true if you want to embed images
// Save the document
doc.save("output.html", saveOptions);
Make sure to replace "path/to/your/images/directory"
with the actual path where you want to save the images. This should help you avoid the error and successfully save your document with images.
If you need further assistance or have additional questions, feel free to ask!
Thank you for reply
After making code changes getting below error:
Image file cannot be written to disk. When saving the node ImagesFolder should be specified or custom streams should be provided via ImageSavingCallback or ExportImagesAsBase64 option should be true
@pravinghadge You should do exactly what is suggested in the exception message, i.e. specify folder where images will be saved:
Document doc = new Document("C:\\Temp\\in.docx");
HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.Html);
options.ImagesFolder = "C:\\Temp\\";
string html = doc.ToString(options);
Or specify ImageSaving Callback:
Document doc = new Document("C:\\Temp\\in.docx");
HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.Html);
options.ImageSavingCallback = new ImageSavingCallback();
string html = doc.ToString(options);
public class ImageSavingCallback : IImageSavingCallback
{
public void ImageSaving(ImageSavingArgs args)
{
// Save the image to nowhere.
using (MemoryStream ms = new MemoryStream())
args.ImageStream = ms;
}
}
Or export images as embedded base64:
Document doc = new Document("C:\\Temp\\in.docx");
HtmlSaveOptions options = new HtmlSaveOptions(SaveFormat.Html);
options.ExportImagesAsBase64 = true;
string html = doc.ToString(options);
Hi @alexey.noskov ,
Document which i am trying to read has excel embedded into it.
I want to skip that document at the time of reading
Attaching document for reference
TestRAC_Aspose.zip (39.1 KB)
Bookmark bk_From = doc.Range.Bookmarks["KeyFinancialIndicators"];
Bookmark bk_To = doc.Range.Bookmarks["ApplicableCriteria"];
StructuredDocumentTag startTag_From = (StructuredDocumentTag)bk_From.BookmarkEnd.GetAncestor(NodeType.StructuredDocumentTag);
StructuredDocumentTag endTag_To = (StructuredDocumentTag)bk_To.BookmarkStart.GetAncestor(NodeType.StructuredDocumentTag);
List<Node> extractedNodes = ExtractContentHelper.ExtractContent(startTag_From, endTag_To, false);
Document dstDoc = ExtractContentHelper.GenerateDocument(doc, extractedNodes);
StringBuilder sb = new StringBuilder();
HtmlSaveOptions saveOptions = new HtmlSaveOptions();
saveOptions.ExportImagesAsBase64 = true; // Set to true if you want to embed images
sb.Append(dstDoc.ToString(new HtmlSaveOptions() { PrettyFormat = true }));
@pravinghadge You can use the following code to remove embedded OLE objects from the document:
Document doc = new Document(@"C:\Temp\in.docx");
// Remove embedded OLE objects.
foreach(Shape s in doc.GetChildNodes(NodeType.Shape, true))
{
if (s.OleFormat != null)
{
s.Remove();
}
}
doc.Save(@"C:\Temp\out.docx");