Beautify HtmlAgilityPack
As can be read on the internet: HtmlAgilityPack is not for beautiful, aka human readable, html files.
“[…] it’s a ‘by design’ choice.” []
So everyone redirects you to some other library.
Now, I am a bit stubborn. I want to use HtmlAgilityPack and I want to have indented, human-readable html files. The magic is within text nodes in the DOM. So, I wrote two utility functions to help me out.
First, to get rid of all unwanted whitespaces. This one might be a bit aggressiv, but it was ok for me:
static private void removeWhitespace(HtmlNode node) { foreach (HtmlNode n in node.ChildNodes.ToArray()) { if (n.NodeType == HtmlNodeType.Text) { if (string.IsNullOrWhiteSpace(n.InnerHtml)) { node.RemoveChild(n); } } else removeWhitespace(n); } }
And, second, to create white spaces for line breaks and indentions:
internal static void beautify(HtmlDocument doc) { foreach (var topNode in doc.DocumentNode.ChildNodes.ToArray()) { switch (topNode.NodeType) { case HtmlNodeType.Comment: { HtmlCommentNode cn = (HtmlCommentNode)topNode; if (string.IsNullOrEmpty(cn.Comment)) continue; if (!cn.Comment.EndsWith("\n")) cn.Comment += "\n"; } break; case HtmlNodeType.Element: { beautify(topNode, 0); topNode.AppendChild(doc.CreateTextNode("\n")); //doc.DocumentNode.InsertAfter(doc.CreateTextNode("\n"), topNode); } break; case HtmlNodeType.Text: break; default: break; } } } private static bool beautify(HtmlNode node, int level) { if (!node.HasChildNodes) return false; var children = node.ChildNodes.ToArray(); bool onlyText = true; foreach (var c in children) { if (c.NodeType != HtmlNodeType.Text) onlyText = false; } if (onlyText) return false; string nli = "\n" + new string('\t', level); foreach (var c in children) { node.InsertBefore(node.OwnerDocument.CreateTextNode(nli), c); if (c.NodeType == HtmlNodeType.Element) { if (c.HasChildNodes) { if (beautify(c, level + 1)) { c.AppendChild(c.OwnerDocument.CreateTextNode(nli)); } } } } return true; }
As you might see, the code is pretty hacky. But, it works for me. Maybe, it also works for you, or it can be a starting point.
Leave a Reply