Our project requirement requires the following:
- Building a word document using HTML chunks (which are user generated so are dynamic in nature)
- The HTML chunks point to images (Relative path) that is located behind a URL (ie; …/image.aspx?ID=3)
- The page image.aspx checks session/cookies to validate user before sending the image over…
While looking to use ASPOSE and looking at the forums, I understand that ASPOSE has issues with Relative paths which I am able to handle (by doing a search and replace) but would like to know the following:
-
Does ASPOSE have any mechanism to attach cookies before calling insertHTML so that any subsequent image resolving/downloading can send the provided cookies for each request?
-
If No, any suggestions on how to get around this problem? Short of manually downloading the images (through custom code) and storing them locally and mapping the URLs to point to local disk.
-
Another issue I see is that when there is an issue with downloading an image, the whole method crashes with an error. Just wondeirng is there a mechanism to ignore such errors so that ASPOSE continues to insert the HTML even if the images cannot be downloaded?
Thanks in advance for any suggestions or solutions…
Hi
Thanks for your inquiry. Since, Aspose.Words supports base64 as image source, you can create your own method to get images and replace image path in src attribute with base64 representation of the image. Here is a very simple code that demonstrates the technique:
[Test]
public void Test001()
{
// Get Html string.
string html = File.ReadAllText(@"Test001\in.html");
// Create a regular expression that will help us to find image SRCs.
Regex urlRegex = new Regex("src\\s*=\\s*[\"']+(http(s)?://([\\w-]+\\.)+[\\w-]+(/[\\w- ./?%&=]*)?)[\"']+");
// Serch for SRCs.
MatchCollection matchs = urlRegex.Matches(html);
foreach (Match match in matchs)
{
// Replace urls with embedded base64 images.
html = html.Replace(match.Groups[1].Value, GetBase64(match.Groups[1].Value));
}
// Now you can insert HTML into the document. All images are embedded into the HTML string.
DocumentBuilder builder = new DocumentBuilder();
builder.InsertHtml(html);
builder.Document.Save(@"Test001\out.doc");
}
private string GetBase64(string imageUrl)
{
string base64Data = "";
try
{
// Prepare the web page we will be asking for
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(imageUrl);
request.Method = "GET";
request.ContentType = "image/jpeg";
request.UserAgent = "Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0";
// Execute the request
HttpWebResponse response = (HttpWebResponse)request.GetResponse();
// We will read data via the response stream
Stream resStream = response.GetResponseStream();
// Write content into the MemoryStream
BinaryReader resReader = new BinaryReader(resStream);
// Build base64 string.
base64Data = string.Format("data:image/jpeg;base64,{0}",
Convert.ToBase64String(resReader.ReadBytes((int)response.ContentLength)));
}
catch (Exception)
{
}
return base64Data;
}
I hope such approach could help you to achieve what you need.
Best regards,
Thanks for the suggestion and taking the time to respond.
I will look over your solution but the nature of our project is a bit more atypical - in that, the html chunks when put together will be referencing a large amount of content (on average about 300 pages of printable material) that point to on average 100+ images that are usually each half a page in width/height. Given the size inflation with base64, I doubt this would be a good solution for us… but we will give it a try to see the memory usage of this…
We just wanted to know if there were any short-cut solutions before we attempt the downloading of images with URL remapping to local disk… Since there seems to be none, we will probably take that approach.
One additional question given the nature of our application, does ASPOSE word have any memory/size/performance limitations when dealing with such large amount of content/images?
Thanks again for your quick response.
Hi
Thanks for your inquiry. You can also save images to disk and replace urls in your HTML with local paths. The approach will be the same as I suggested.
No Aspose.Words does not have any limitations. The only limitation is amount of memory on your side.
Hi ,
Is this possible in java version . I tried this using following code. But it didnt work .
Code:
------
String html =readFileAsString("input.html");
DocumentBuilder builder = new DocumentBuilder();
builder.insertHtml(html);
builder.getDocument().save("imgOut.doc");
Am i doing anything wrong ?
Thanks
Hi Anbu,
Thanks for your inquiry. Aspose.Words for Java does not support base64 data in HTML yet. This feature will be supported in the next version of Aspose.Words for Java, that is coming soon.
Best regards.