Vita Rara: A Life Uncommon

Hpricot to Nokogiri Day 1


Categories: | |

Nokogiri's #xpath != Hpricot's #xpath

In Hpricot you can call xpath on a node to get the XPath that will retrieve that node from the document. In Nokogiri that equivalent is path.

I ran into this trying to figure out the xpath to a node in an HTML document. My normal routine is to load up the document in IRB and poke around to find the things I need.

Whitespace is Different

In Nokogiri  's are converted to whitespace, but they are not a normal space and aren't removed with the standard String#strip and friends. Tenderlove on IRC gave me the following snippet to remove them:

Nokogiri::HTML.parse(" y").at("p").inner_text.gsub(/\302\240/, ' ').strip == 'y'

I incorporated this right into String#strip and String#strip! because in the context of my application these are whitespace.

class String
  alias_method :old_strip, :strip
  def strip
    self.gsub(/^[\302\240|\s]*|[\302\240|\s]*$/, '')
  end

  def strip!
    before = self.reverse.reverse # TODO there must be a better way to do this. Don't have time. -Mark 2/9/09
    self.gsub!(/^[\302\240|\s]*|[\302\240|\s]*$/, '')
    before == self ? nil : self
  end
end