Parsing an OPML with Ruby

And Ruby just doesn’t stop surprising us!! In the past we have to deal with XML files and parse them, incredibly easy task using Hpricot library. Now the turn was for OPML (Outline Processor Markup Language) files. In case you are not familiar with this type of files, its most common use is to exchange lists of web feeds between web feed aggregators.

We found this function to parse the OPML document recursively preserving its structure in the desktop weblog, that does the job of extracting the feeds, and modified it a bit. Now it returns a hash containing the title of the articles as keys, and its links as values.

Here’s the function:

def self.parse_opml(opml_node, parent_names=[])
  feeds = {}
  opml_node.elements.each('outline') do |el|
    if (el.elements.size != 0)
      feeds.merge!(parse_opml(el, parent_names + [el.attributes['text']]))
    end
    if (el.attributes['xmlUrl'])
      feeds[el.attributes['title']] = el.attributes['xmlUrl']
    end
  end
  return feeds
end

All you have to do is call it this way:

require 'rexml/Document'

opml = REXML::Document.new(File.read('my_feeds.opml'))
feeds = parse_opml(opml.elements['opml/body'])

Pretty easy, huh? Try it out and leave your comments…

Was this post helpful? Share it with others.

Post Post

Article you may like

4 min read

Ruby

New features in Ruby 2.5.0

A rundown of the most interesting new features in Ruby 2.5.0 including Hash#slice, Hash#transform_keys, pattern arguments on Enumerable methods, reverse backtraces, and PP by default.

Rodrigo Ponce de LeonDec 28, 2017

5 min read

Ruby

Behavior changes in Ruby 2.4

An overview of important behavior changes in Ruby 2.4, including Fixnum/Bignum unification into Integer, Unicode case conversion, and time zone preservation.

Jorge BejarJun 22, 2016

View All

Let’s Build Together.

Ready to partner with a team that cares as much about your success as you do?

Here’s the function:

def self.parse_opml(opml_node, parent_names=[])
  feeds = {}
  opml_node.elements.each('outline') do |el|
    if (el.elements.size != 0)
      feeds.merge!(parse_opml(el, parent_names + [el.attributes['text']]))
    end
    if (el.attributes['xmlUrl'])
      feeds[el.attributes['title']] = el.attributes['xmlUrl']
    end
  end
  return feeds
end

All you have to do is call it this way:

require 'rexml/Document'

opml = REXML::Document.new(File.read('my_feeds.opml'))
feeds = parse_opml(opml.elements['opml/body'])

Pretty easy, huh? Try it out and leave your comments…

Was this post helpful? Share it with others.

Post Post

Article you may like

4 min read

Ruby

New features in Ruby 2.5.0

A rundown of the most interesting new features in Ruby 2.5.0 including Hash#slice, Hash#transform_keys, pattern arguments on Enumerable methods, reverse backtraces, and PP by default.

Rodrigo Ponce de LeonDec 28, 2017

5 min read

Ruby

Behavior changes in Ruby 2.4

An overview of important behavior changes in Ruby 2.4, including Fixnum/Bignum unification into Integer, Unicode case conversion, and time zone preservation.

Jorge BejarJun 22, 2016

8 min read

Ruby

Immutable strings in Ruby 2.3

An in-depth look at frozen string literals in Ruby 2.3 — covering immutability, thread safety, performance benchmarks, and migration strategies.

Alexis MasDec 1, 2015

View All

Let’s Build Together.

Ready to partner with a team that cares as much about your success as you do?