We love open source and we invest in continuous learning. We give back our knowledge to the community.

Parsing an OPML With Ruby

Comments

And Ruby just doesn’t stop surprising us!! In the past we have to deal with XML files and parse them, incredibly easy task using Hpricot library. Now the turn was for OPML(OPML) (Outline Processor Markup Language) files. In case you are not familiar with this type of files, its most common use is to exchange lists of web feeds between web feed aggregators.

We found this function to parse the OPML document recursively preserving its structure in the desktop weblog(desktop weblog), that does the job of extracting the feeds, and modified it a bit. Now it returns a hash containing the title of the articles as keys, and its links as values.

Here’s the function:

1
2
3
4
5
6
7
8
9
10
11
12
def self.parse_opml(opml_node, parent_names=[])
    feeds = {}
    opml_node.elements.each('outline') do |el|
      if (el.elements.size != 0)
        feeds.merge!(parse_opml(el, parent_names + [el.attributes['text']]))
      end
      if (el.attributes['xmlUrl'])
        feeds[el.attributes['title']] = el.attributes['xmlUrl']
      end
    end
    return feeds
  end

All you have to do is call it this way:

1
2
3
4
require 'rexml/Document'

opml = REXML::Document.new(File.read('my_feeds.opml'))
feeds = parse_opml(opml.elements['opml/body'])

Pretty easy, huh? Try it out and leave your comments…

Comments