![]() | tech note |
| 自分をリファクタリング中。 |
scrAPIよりイイと評判のHpricotを使ってみる。
scrAPIみたいにScraper.defineしたりしなくていいので手軽でイイと評判みたい。
本家:An Hpricot Showcase
詳しい解説1:HTMLパーサ Hpricot
詳しい解説2:Hpricot Showcase-Ja
# gem install hpricot
Bulk updating Gem source index for: http://gems.rubyforge.org
Select which gem to install for your platform (i386-linux)
1. hpricot 0.6 (mswin32)
2. hpricot 0.6 (jruby)
3. hpricot 0.6 (ruby)
4. hpricot 0.5 (ruby)
5. hpricot 0.5 (mswin32)
6. Skip this gem
7. Cancel installation
> 3
Building native extensions. This could take a while...
Successfully installed hpricot-0.6
Installing ri documentation for hpricot-0.6...
Installing RDoc documentation for hpricot-0.6...<h1>Scraping#index</h1>
<%= start_form_tag(:action => 'scrape') %>
<%= radio_button_tag('api', 'hpricot', :checked => true) %>Hpricot
<%= radio_button_tag('api', 'scrapi') %>scrAPI<br />
<%= text_field_tag('url',
'http://fx.himawari-group.co.jp/price/blogparts.html') %>
<%= submit_tag("外為レート抽出") %>
<% if @now != nil -%>
<br /><hr />
<%= @now -%>
<table border="1">
<tr><th>通貨ペア</th><th>現在値</th><th>前日比</th></tr>
<% @fx.each{|f| -%>
<tr>
<td><%= f[0] -%></td>
<td><%= f[1] -%></td>
<td><%= f[2].sub(/▲/, "+").sub(/▼/, "-") -%></td>
</tr>
<% } -%>
</table>
<% end -%>
<%= end_form_tag %>require 'open-uri'
require 'hpricot'
class ScrapingController < ApplicationController
def index
@now = nil
end
def scrape
case params[:api]
when 'hpricot'
#↓Hpricot()だけでもok
doc = Hpricot.parse(NKF.nkf('-w', open(params[:url]).read))
#↓atメソッドで最初の要素のみ抽出
@now = doc.at("tr td")
#Hpricotのnth-childはなんかおかしい。バグってる?
#nth(0)かnth-child(-1)にしないとうまく検索できない!
currency = doc.search("tr td.tdborder-center:nth(0)").map {
|e| e.inner_text}
#↓searchはaliasとして/がある
rate = (doc/"tr td.tdborder-center:nth(1)").map {
|e| e.inner_text}
change = doc.search("tr td.tdborder-right").map {
|e| e.inner_text}
change.shift
@fx = currency.zip(rate, change)
when 'scrapi'
html = NKF.nkf('-w', open(params[:url]).read)
@now = html.scrape("tr td")[0]
currency = html.scrape("tr td.tdborder-center:nth-child(1)")
rate = html.scrape("tr td.tdborder-center:nth-child(2)")
change = html.scrape("tr td.tdborder-right")
change.shift
@fx = currency.zip(rate, change)
end
render :action => 'index'
end
end
