Ruby

Parsing an OPML with Ruby

A simple recursive Ruby function to parse OPML files and extract web feed URLs, preserving the document structure using REXML.

1 min read
Sebastián Martínez
Sebastián Martínez Published on Feb 20, 2009

And Ruby just doesn’t stop surprising us!! In the past we have to deal with XML files and parse them, incredibly easy task using Hpricot library. Now the turn was for OPML (Outline Processor Markup Language) files. In case you are not familiar with this type of files, its most common use is to exchange lists of web feeds between web feed aggregators.

We found this function to parse the OPML document recursively preserving its structure in the desktop weblog, that does the job of extracting the feeds, and modified it a bit. Now it returns a hash containing the title of the articles as keys, and its links as values.

Here’s the function:

def self.parse_opml(opml_node, parent_names=[])
  feeds = {}
  opml_node.elements.each('outline') do |el|
    if (el.elements.size != 0)
      feeds.merge!(parse_opml(el, parent_names + [el.attributes['text']]))
    end
    if (el.attributes['xmlUrl'])
      feeds[el.attributes['title']] = el.attributes['xmlUrl']
    end
  end
  return feeds
end

All you have to do is call it this way:

require 'rexml/Document'

opml = REXML::Document.new(File.read('my_feeds.opml'))
feeds = parse_opml(opml.elements['opml/body'])

Pretty easy, huh? Try it out and leave your comments…

Article you may like

View All

Let’s Build Together.

Ready to partner with a team that cares as much about your success as you do?

+1