Big Sky

muumoo.jpのPlaggerで取得したGoogleブックマークのフィードを整えるFilter:GoogleBookmarksFeedを書いたけど日本語消えちゃう (管理人日記)という記事より。

喜んだのもつかの間、日本語の文字を含むタグやコメントを書くと、その文字が消えてしまうようです。Plaggerではありがちな問題なような気がしますが、このPluginでも起きてしまいました。

確かに、ブラウザ上からだと日本語は見えるんですが、どうやらGoogleさんはUser-Agentを見て勝手にencodingをISO-8859-1に変えておられるようです。


# curl -L 'https://www.google.com/bookmarks/?output=rss' -u username:password

<?xml version="1.0" encoding="ISO-8859-1"?><rss vers
...

config.yamlの先頭に


global:

  timezone: Asia/Tokyo

  user_agent:

    agent: Mozilla/5.0

を入れたら取得出来ました。
ブクマコメントで書こうかと思いましたが、記事が半月程前のものなので管理人さんも見てないかと思い、記事にしました。

それよりも...LivedoorClip.pmで


Plagger [info] plugin Plagger::Plugin::Subscription::Config loaded.

Plagger [info] plugin Plagger::Plugin::UserAgent::AuthenRequest loaded.

Plagger [info] plugin Plagger::Plugin::Filter::GoogleBookmarksFeed loaded.

Plagger [info] plugin Plagger::Plugin::Publish::LivedoorClip loaded.

Plagger [info] plugin Plagger::Plugin::Bundle::Defaults loaded.

Plagger [info] plugin Plagger::Plugin::Aggregator::Simple loaded.

Plagger [info] plugin Plagger::Plugin::Summary::Auto loaded.

Plagger [info] plugin Plagger::Plugin::Summary::Simple loaded.

Plagger [info] plugin Plagger::Plugin::Namespace::HatenaFotolife loaded.

Plagger [info] plugin Plagger::Plugin::Namespace::MediaRSS loaded.

Plagger [info] plugin Plagger::Plugin::Namespace::ApplePhotocast loaded.

Plagger::Plugin::Aggregator::Simple [info] Fetch https://www.google.com/bookmarks/?output=rss

Plagger::Plugin::UserAgent::AuthenRequest [info] Adding credential to Google Search History at www.google.com:443

Plagger::Cache [debug] Cache HIT: Aggregator-Simple|https://www.google.com/bookmarks/?output=rss

Plagger::Plugin::Aggregator::Simple [debug] 200: https://www.google.com/bookmarks/?output=rss

Plagger::Plugin::Aggregator::Simple [info] Aggregate https://www.google.com/bookmarks/?output=rss success: 15 entries.

Died at C:/Perl/site/lib/WWW/Mechanize.pm line 1705.

なエラーが出る。なんぞ？
とりあえずcpan upgrade行ってきます。

追記1
GoogleBookmarksFeedで、tagsは1個でも配列で返ってきてそうだったので以下のように修正してます。もしかしたら間違ってるかも


*** GoogleBookmarksFeed.pm.orig Tue Sep 04 11:39:49 2007

--- GoogleBookmarksFeed.pm  Tue Sep 04 11:40:15 2007

***************

*** 22,28 ****

              $args->{entry}->body($orig_body);

              $context->log(info => "Parsing Google Bookmarks title " . $args->{entry}->permalink);

          }

!         if (my @orig_tags = @{$args->{orig_entry}->{entry}->{$ns}->{bkmk_label}}) {

              $args->{entry}->tags(@orig_tags);

          }

      }

--- 22,28 ----

              $args->{entry}->body($orig_body);

              $context->log(info => "Parsing Google Bookmarks title " . $args->{entry}->permalink);

          }

!         if (my @orig_tags = $args->{orig_entry}->{entry}->{$ns}->{bkmk_label}) {

              $args->{entry}->tags(@orig_tags);

          }

      }

追記2
大嘘ついてました。tagsは1つの場合は文字、2つ以上の場合は配列で戻るみたいです。


*** GoogleBookmarksFeed.pm.orig Tue Sep 04 11:39:49 2007

--- GoogleBookmarksFeed.pm  Tue Sep 04 14:54:17 2007

***************

*** 22,29 ****

              $args->{entry}->body($orig_body);

              $context->log(info => "Parsing Google Bookmarks title " . $args->{entry}->permalink);

          }

!         if (my @orig_tags = @{$args->{orig_entry}->{entry}->{$ns}->{bkmk_label}}) {

!             $args->{entry}->tags(@orig_tags);

          }

      }

  }

--- 22,33 ----

              $args->{entry}->body($orig_body);

              $context->log(info => "Parsing Google Bookmarks title " . $args->{entry}->permalink);

          }

!         if (my $orig_tags = $args->{orig_entry}->{entry}->{$ns}->{bkmk_label}) {

!           if (ref($orig_tags) eq "ARRAY") {

!               $args->{entry}->tags($orig_tags);

!           } else {

!               $args->{entry}->tags([$orig_tags]);

!           }

          }

      }

  }

twitterのAPIでfriendsが100件しか取れなくなって久しいですが...
WWW::MechanizeとXPathでtwitterの全friendsを取得するサンプル作ってみました。あまりやり過ぎると、オフィシャル側に怒られそうな気もしますが...
後の使い方は、適当で...


#!/usr/local/bin/perl



use warnings;

use strict;

use LWP::Simple;

use XML::Simple;

use WWW::Mechanize;

use HTML::TreeBuilder::XPath;

use HTML::Selector::XPath qw(selector_to_xpath);

use Data::Dumper;



my $username = 'your_username';

my $password = 'your_password';



my $m = WWW::Mechanize->new(timeout => 10);

$m->get('http://twitter.com/login');

$m->submit_form(

    form_number => 1,

    fields    => {

        username_or_email  => $username,

        password           => $password,

    },

    button    => 'commit',

);



my $xpath = selector_to_xpath('tr.vcard');

my @friends;



my $num_page = 1;

while (1) {

    my $res = $m->get("http://twitter.com/friends/?page=$num_page");

    my $encoding = $res->header('Content-Encoding');

    my $content = $res->content;

    $content = Compress::Zlib::memGunzip($content) if $encoding =~ /gzip/i;

    $content = Compress::Zlib::uncompress($content) if $encoding =~ /deflate/i;



    my $tree = HTML::TreeBuilder::XPath->new;

    $tree->parse($content);

    $tree->eof;

    my @nodes = $tree->findnodes($xpath);

    for my $tr (@nodes) {

        push(@friends, {

                nick => $tr->findnodes('td/strong/a')->[0]->as_text,

                image => $tr->findvalue('td[@class="thumb"]//img/@src')->as_string,

                name => $tr->findvalue('td[@class="thumb"]//img/@alt')->as_string,

                description => $tr->findvalue('td/strong/a/@title')->as_string,

                url => $tr->findvalue('td[@class="thumb"]/a/@href')->as_string,

            });

    }

    $tree->delete;

    @nodes or last;

    $num_page++;

}



print Dumper @friends;