![]() | tech note |
| 自分をリファクタリング中。 |
Webページから必要なデータだけを抽出して利用するスクレイピングが面白そう。
まずはscrAPIを使ってみました。
http://blog.labnotes.org/category/scrapi/
# gem install scrapi
Bulk updating Gem source index for: http://gems.rubyforge.org
Install required dependency tidy? [Yn] Y
Successfully installed scrapi-1.2.0
Successfully installed tidy-1.1.2
Installing ri documentation for scrapi-1.2.0...
Installing ri documentation for tidy-1.1.2...
Installing RDoc documentation for scrapi-1.2.0...
Installing RDoc documentation for tidy-1.1.2...$ rails scraping
$ cd scraping/
$ ruby script/generate controller Scraping index scrape
require 'rubygems'
require 'scrapi'
require 'open-uri'
class ScrapingController < ApplicationController
def index
@links = nil
end
def scrape
links = Scraper.define do
process("a[href]", "urls[]" => "@href")
result :urls
end
@links = links.scrape(URI.parse(params[:url])).sort.uniq
render :action => 'index'
end
end
<h1>Scraping#index</h1>
<%= start_form_tag(:action => 'scrape') %>
<%= text_field_tag('url', 'http://tech.x-neon.com/') %>
<%= submit_tag("ScrAPIでリンク抽出") %>
<% if @links != nil -%>
<br /><hr />
<% @links.each{|link| -%>
<%= link_to(link, link) -%><br />
<% } -%>
<% end -%>
<%= end_form_tag %>
