XML Renderer in Clojure

September 08, 2009 By: erik Category: Geeky, Programming, Reviews 2,677 views

Rate this post:
1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading...

clojure logoI’ve spent the past few days playing around with Clojure. Clojure is an implementation of Lisp, the most powerful programming language, that compiles to byte code that runs on the Java Virtual Machine. I won’t go into just how awesome that is, but there are many technical reasons why this platform decision is equivalent to standing on the shoulders of giants.

Clojure comes with a built-in library for parsing XML files into Clojure data structures, but, for the life of me, I could absolutely not find any implementations that went the other way, to render XML from the Clojure structure that the default parser creates. So I wrote one…in 25 lines of code.

Update: I did find a function in the clojure contrib lazy-xml.clj that will emit XML nodes to a stream, but it’s (gasp!) not remotely functional.

(def *always-open* #{:div :script :textarea})

(defn render-attributes [attributes]
  (when attributes
    (apply str
      (for [[key value] attributes]
        (str \space (name key) "=\"" value \")))))

(defn render [node]
    (if (string? node)
      (.trim node)
      (let [tag (:tag node)
            children (:content node)
            has-children? (not-empty children)
            open? (or has-children? (contains? *always-open* tag))
            open-tag (str \< (name tag)
                        (render-attributes (:attrs node))
                        (if open? \> "/>"))
            close-tag (when open? (str "</" (name tag) \>))]
        (str
          open-tag
          (apply str (when has-children?
                       (for [child children]
                         (render child))))
          close-tag))))

There is a little extra HTML-specific logic in there to not close <script> or <div> or <textarea> tags, as some web browsers can’t handle those cases.

So if you have an HTML file that looks like this…
<?xml version="1.0" encoding="ISO-8859-1"?>
<html>
 <head>
  <title>Testing Title</title>
  <style type="text/css">
   .some-class { font-weight: bold; }
  </style>
  <script type="text/javascript" src="myjs.js"></script>
 </head>
 <body>
  <p id="message" class="some-class">
   This is a totally awesome test!
  </p>
 </body>
</html>

And you run the following command at the REPL…
(println (render (clojure.xml/parse "index.html")))
…you should get this back:
<html><head><title>Testing Title</title><style type="text/css">.some-class { font-weight: bold; }</style><script type="text/javascript" src="myjs.js"></script></head><body><p id="message" class="some-class">This is a totally awesome test!</p></body></html>
While valid XML, it’d be nice to have it prettily formatted. To do this, we must add a little complexity to keep track of depth and indentation.

(def *always-open* #{:div :script})

(defn render-attributes [attributes]
  (when attributes
    (apply str
      (for [[key value] attributes]
        (str \space (name key) "=\"" value \")))))

(defn render
  ([node] (render node 0 false))
  ([node pretty?] (render node 0 pretty?))
  ([node depth pretty?]
   (let [indent (when pretty? (apply str (repeat depth "  ")))]
     (if (string? node)
       (str indent (.trim node) (when pretty? "\n"))
       (let [tag (:tag node)
         children (:content node)
         has-children? (not-empty children)
         always-open? (contains? *always-open* tag)
         open? (or has-children? (contains? *always-open* tag))
         open-tag (str indent \< (name tag)
                    (render-attributes (:attrs node))
                    (if open? \> "/>"))
         close-tag (when open?
                     (str (when (not always-open?) indent)
                       "</" (name tag) \>))]
       (str
         open-tag
         (when (and pretty? (not always-open?)) "\n")
         (apply str (when has-children?
                 (for [child children]
                   (render child (inc depth) pretty?))))
         close-tag
         (when (and pretty? (> depth 0)) "\n")))))))

This has bumped us up to 34 lines of code with proper formatting. Now if we call:
(println (render (clojure.xml/parse "index.html")))
We still get the same unformatted html back because it defaults to non-pretty formatting. But if we request pretty formatting…
(println (render (clojure.xml/parse "index.html") true))
We get back this:
<html>
  <head>
    <title>
      Testing Title
    </title>
    <style type="text/css">
      .some-class { font-weight: bold; text-align:inherit; }
    </style>
    <script type="text/javascript" src="myjs.js"></script>
  </head>
  <body>
    <p id="message" class="some-class">
      This is a totally awesome test!
    </p>
  </body>
</html>

Perfect!

What’s amazing is that, after a few days of working with Clojure, and a very small background in Lisp, this code is perfectly readable. I could not figure out a way to go through the tree with Clojure’s tail recursion idiom, so I had to use stack recursion. For most XML files, stack recursion is going to be just fine.

I am still not convinced that Clojure deserves a place in any of my business applications, but it definitely has the best shot of any Lisp dialect I’ve seen, mainly because of its trivial interoperation with Java.

If any Clojure ninjas out there would like to suggest improvements to my algorithm, I’m all ears.

 
  • You should add escaping.
    Btw, Enlive (http://github.com/cgrand/enlive/tree/master) has a html serializer (emit*) but doesn’t auto-indent (whitespace/original formatting is preserved if present).

    • You’re right. This won’t be ready for any live environment without escaping and good entity validation.

      My goal was specifically to use the data structures generated by the default clojure.xml/parse. It seems that no one had done that yet…perhaps this is because the clojure.xml/parse function is only ever used/useful for reading XML as data for third party interoperability and serious XHTML-specific code should do its own parsing. I see you’re using TagSoup as a parser.

    • Oh, wait! I see now that your Enlive project does actually parse to the same {:tag :attrs :content} structure that the clojure.xml/parse gives. Nice!

      • The structures used in Enlive are indeed a superset of those returned by clojure.xml/parse, so Enlive can work with both. I started with clojure.xml/parse and switched later to some specific parsing code.

  • Rob J.

    `(with-out-str (xml/emit nodes))` will do the trick. Or, if you want an element and not a document, use `emit-element` instead. Kinda stupid they both print to stdout by default imho.