I am using Aspose.words to export a certain file in either HTML, PDF, or Word depending on the request. I want to be able to Cache these HTML files for speedy response.
All of these HTML files are using the same template (so are the Word and PDF exports as well). When using HtmlExportCSSStyleSheetType.External, it will create a new CSS file, correct?
Is there any way (other than continiously writing to the same filename) to have them all reference this one file if it exists?
Jon
Hello!
Thank you for your interesting question.
Several HTML files sharing the same CSS style sheet show up quite a common case. Currently is only possible to specify the same file name for the CSS style sheet using this property:
https://reference.aspose.com/words/net/aspose.words.saving/htmlsaveoptions/cssstylesheetfilename/
You cannot suppress writing the file itself. In your case CSS file will be overwritten several times what is not good. The most natural solution could be implementing the same mechanism that we already have for saving images. When HTML is saved, supplementary images are written as separate files. You can override default behavior via this event:
https://reference.aspose.com/words/net/aspose.words.saving/htmlsaveoptions/imagesavingcallback/
Analogically we could provide a new event for saving CSS style sheet to non-standard location, namely to stream specified by the caller. If a stream ignores all input data in its implementation then CSS won’t be written anywhere. But of course it will be generated every time.
We cannot fully suppress generation even if we provide a special option for that. Any implementation at least should build a map from model style names to CSS classes. In any case such an option can be provided. For instance, we can add the fourth enumerator to this enumeration:
https://reference.aspose.com/words/net/aspose.words.saving/cssstylesheettype/
How could it be called? Maybe ExternalReferenceOnly, simply ReferenceOnly, ExternalClassesOnly or anything else? Please share your suggestions since you are the first who asked for this improvement and I tend to implement it. Thank you in advance.
Regards,
Im not sure I follow what the fourth enum would be for that method.
one way is to follow External from CSSStyleSheetType, but supress actually writing the file. the only problem there, is I don’t know how you could easily tell all your files to use that without having to go through each and replacing the CSS filename.
Or maybe the CSS Export /document could have a few more options, like css external filename, document which css file to use, etc.
What I am currently doing is just using Embeded, since it really reduces the file size over the default. I pass these back to another page, where I will be adding some extra header information. It would be able to reuse this CSS file for both - so formatting would be the same for my web page, as well as the returned HTML from Aspose.Words.
I think a hack to get around it right now would possibly send it back as External, and use a regex to replace that string with a link to a saved one. If only then I could suppress writing out any CSS, that would work.
I actually have figured out something I can do in the meantime to strip out a bunch of code, but have a question. At the top of my HTML, I get this when using embeded
STYLE type=text/css body { font-family:‘Times New Roman’; font-size:12pt; } p { margin-bottom:0pt; margin-left:0pt; margin-right:0pt; margin-top:0pt; } table { margin-bottom:0pt; margin-top:0pt; } .Normal0 { font-size:12pt; } /STYLE
but then on almost every peice of text, I still get span style="font-size:10pt; ". Is there any way to force it to not put these spans in, or at least set a CSS class? if not, ill have to parse through all the HTML and replace these with something else. At least my template won’t change much.
Hello!
Let me explain how I see your task. You have several documents with identical set of styles, most probably derived from the same template, convert them to HTML and would like to make them reference a single style sheet. When converting the first document, style sheet should be written with External option. But for all subsequent documents we should somehow suppress the style sheet output (or overwrite it). After exporting multiple documents you might want to change (or add) something in the common style sheet.
Why do we need another option? We can customize output CSS file name but we cannot currently suppress writing itself. I mean that the new enumerator would provide this: export HTML as with External CSS but without actual writing the CSS file. HTML will have class references and a link to a CSS file. Does this suit your needs?
Regarding your second post. In the mode with Embedded or External CSS Aspose.Words anyway outputs direct formatting applied right to paragraphs or runs as style attribute on the corresponding nodes. This redundancy comes from your source document. If you attach it I can take a look on how it can be redesigned. You can create a new style in the document and assign it to all those runs. Well-designed Microsoft Word documents usually have only minimum of direct formatting.
Regards,
You are correct on the first question. If anything, just suppress writing the CSS file, then at least we dont have to worry about the disk writes. And yes, these are all coming from a standard template. I have attached the template so you can help with the formatting.
So yes, that would suit my needs. I plan to take this one general CSS file, and then could apply other styles to it across the web project. The reason we are building the webpage in HTML is that we would like one universal template to provide quick web access, plus PDF and Word export.
Hello Jon!
Thank you for clarification. You can preprocess your template with this code:
private static void RemoveRedundantFormatting()
{
Document doc = new Document("Jon_Lumpkin/incident_crime_report.doc");
// Set 10pt default for Normal style.
doc.Styles[StyleIdentifier.Normal].Font.Size = 10;
// Create a new style for bold text.
Style boldStyle = doc.Styles.Add(StyleType.Character, "Bold");
boldStyle.Font.Bold = true;
// Create a new style for underlined text.
Style underlinedStyle = doc.Styles.Add(StyleType.Character, "Underlined");
underlinedStyle.Font.Underline = Underline.Single;
// Traverse all runs, clear formatting and reference Bold and Underlined styles where needed.
NodeCollection runs = doc.GetChildNodes(NodeType.Run, true);
foreach(Run run in runs)
{
Font font = run.Font;
bool isBold = font.Bold;
bool isUnderlined = font.Underline == Underline.Single;
font.ClearFormatting();
if (isBold)
font.Style = boldStyle;
if (isUnderlined)
font.Style = underlinedStyle;
}
// Save both DOC and HTML.
doc.Save(doc.OriginalFileName.Replace(".doc", "_out.doc"));
doc.SaveOptions.ExportPrettyFormat = true;
doc.SaveOptions.HtmlExportCssStyleSheetType = CssStyleSheetType.Embedded;
doc.Save(doc.OriginalFileName.Replace(".doc", "_out.html"));
}
Of course this is very restrictive: works if only source document contains nothing but bold and italic formatting. It creates two new styles and refers to them where appropriate. I have tested with your template and got what were needed, output HTML without redundancy.
Regarding external CSS mode I have registered a new issue. You’ll be notified when it’s fixed. Please also note that you can reference one style sheet from another with @import at-rule. Using this approach you can avoid editing output style sheet every time you generate it. Several independent CSS style sheets are normally referenced from some root CSS.
Regards,
So I can run this code before I do anything on that file?
Is there any way to do something like this AFTER the file is built (i have a bunch of templates that build a ‘master’ document, if you will. It would be nice to go back through that document once built to clean out formatting before exporting it.
It’s better to do this once with the template that you attached. In this case all populated documents won’t be redundant. You can also post-process all merged documents. But I think it’s better to do things once if that’s possible.
If im reading this right, Id only have to do this one time EVER, as it will give me a new ‘template’? And then use this new template and it shouldnt have the 's throughout?
As a quick test I ran this against a ‘finished’ file, and it strips out all of the bold formatting (everything has a text. Thats not a problem, just wondering if thats expected.
Secondly, another thing I notice is this for paragraph breaks:
Not sure if this is something we can fix from preprocessing everything. Finally, just need an answer to my previous questions as well - do I run this once for all my templates, or do I need to do it every time a report is generated?
Empty paragraphs are output specifically. This logic doesn’t depend on inheritance of the font size. It is output unconditionally. We can consider changing this but the issue is very minor. I remember some other issues that showed up when we omitted this font size. If this is really important you can remove it using regular expressions from the resulting HTML.
Yes, you can process your template once and get a new template free of redundancy. Nothing should be done when reports are generated in this case. If you have other templates please check carefully whether they use any formatting other than bold and italic since my naïve code won’t preserve it. Just add appropriate code analogically. And of course you can do the same by hand editing templates in Microsoft Word.
Regards,
Thank you for your reply. I will just check with regex those extra paragraphs.
As for cleaning this up in Word - how would one do that by hand? clear all formatting, then create styles and use them?
Hello!
In Microsoft Word there is a facility to create a new style from selected fragment. If you have some existing text in a document you can create a style using its formatting. You can also create required styles from scratch, assign them where appropriate and then edit however they look better. If document elements with different semantics have different styles it’s easier to change their appearance independently from others. That’s why it’s recommended to have all formatting in styles and avoid direct formatting.
Regards,
The issues you have found earlier (filed as WORDSNET-2748) have been fixed in this .NET update and this Java update.
This message was posted using Notification2Forum from Downloads module by aspose.notifier.