logo

Write-Only Memory:
The Blog of Kevin Thomasson

DIY, electronics, programming / whatever.

QuasiXml - an XML-ish parser for .NET

I decided to write my own XML parser when faced with the task to load non-well-formed XML files into an object model in C#.

Why?

You might rightfully wonder why on earth I wasted time on creating an XML parser when .NET framework already has the capability to parse XML perfectly in several ways. My answer:

The ideal thing to do is obviously to get the producer of the markup to fix existing "wellformdness" issues, but in reality this is not always an option.

However, in my case it is very much an option to deal with the issues of bad XML because it is me who produce the markup myself—but I don't want to! I am producing markup in a simplified HTML syntax for a flat file based content management system. This markup is in its turn embedded in a XML file containing meta data. At a later stage, the metadata is extracted and the markup are transformed into well formed XHTML. It is not possible to handle such a file using System.Xml, nor is it possible (or at least practical) to do using a library such as HTML Agility Pack.

One important reason for writing my own parser are the issues I have experienced when trying to insert character entities into a XmlDocument instance.

Strictly speaking, it is not an XML-parser. It's only an XML-ish parser capable of parsing XML to some extent, hence the name, QuasiXML.

Features

Example usage

This simple C# example demonstrates how the string "..." can be replaced with the HTML character entity … in a text node.

            string markup = 
			@"<root>
                <element>This is a text ... </element>
            </root>";

			//Parse markup
            var root = new QuasiXmlNode();
            root.OuterMarkup = markup;
			
			//Modify object model
            QuasiXmlNode textNode = root["element"].Children[0];
			textNode.Value = textNode.Value.Replace("...", "&hellip;");

The following code snippet demonstrates how all links in an XHTML document can be modified:

            ...
			var root = new QuasiXmlNode();
            root.OuterMarkup = markup;

            var links = root.Descendants.Where(node => node.Name == "a" &&
                node.Attributes.ContainsKey("href")).ToList();
				
			foreach(QuasiXmlNode node in links)
				node.Attributes["href"] = Foo(node);

Codeplex

Visit http://quasixml.codeplex.com/ to download binaries and source code if you want to try it out.

Comments