Amazon Web Service 用のWrapper書いてみた - Solr, Python, MacBook Air in Shinagawa Seaside

Python から Amazon Web サービスを使って商品検索をする Wrapper を書いてみました。

PyAWSというのがあって
これを使うといいよとブログで紹介している方もいたので試してみたのですが
添付されていたサンプルをうまく動かせなかったり
XMLのパースにminidomを使っていたりしていたので
ElementTreeを使って自分で書いてみることにしました。

使い方

添付のソースの main のあたりをみてもらう方がわかりやすいかもしれません。

AmazonWebServicesWrapper() : インスタンスの作成

awsapw = AmazonWebServicesWrapper( AccessKeyID, associateTag=False, logger=False )

In

AccessKeyID ：アマゾンから発行される Amazon Web Services のキーを入れる
associateTag : アマゾンのアソシエイトに登録している人はここにキーを入れるとちょっと幸せになれるかも。hoge-22みたいなのがアソシエイトタグです。検索結果の商品詳細ページのURLにこのキーが含まれるので、うまく使うとチャリンチャリンです。
logger : ログとりたい場合に入れる。CGIとかで使う場合にはログ取りたくなるので。

Out

awsapw : AmazonWebServicesWrapperのインスタンス

ItemSearch() : ItemSerch検索の実行

root, xmlns = awsapi.ItemSearch( Keywords, SearchIndex, ResponseGroup )

In

Keywords : 検索ワード。お好みで。
SearchIndex : 検索するインデックスの指定。全部の場合はBlendedで、Books等に絞って検索することもできる。

指定できるのはこんな感じ。
Blended 全商品
Books 和書
Music 音楽
MusicTracks 曲名
Classical クラシック音楽
Video DVD&VHS
DVD DVD
VHS VHS
VideoGames ゲーム
Electronics 家電エレクトロニクス
Kitchen ホーム＆キッチン
Toys おもちゃ＆ホビー
Software PCソフト
などなど

ResponseGroup : 検索結果として返ってくるXMLに入れておいてほしい要素の指定。複数の場合は'ItemAttributes,BrowseNodes'のように指定する。いっぱいあって書ききれないので詳細はこちらを参考に

Out

root : 検索結果。ElementTree形式。
xmlns : xmlns。{http://webservices.amazon.com/AWSECommerceService/2005-10-05}のような表記がタグの名前の前につきます。

getItem0 : 検索結果の取り出し。取り出すタグ名が固定のお手軽版。

items = awsapw.getItem0( root, xmlns )

In
- root : 検索結果のElementTree
- xmlns : 検索結果のxmlns

Out
- items : Itemのリスト。リストの中はディクショナリで、Item単位にASIN、DetailPageURL、Title、ProductGroupが入る。

{ 'ASIN':'479733665X',
  'Title':'みんなのPython',
  'ProductGroup':'Book',
  'DetailPageURL':'http://www.amazon.co.jp/%E3%81%BF%E3%82%93%E3%81%AA%E3%81%AEPython-%E6%9F%B4%E7%94%B0-%E6%B7%B3/dp479733665X%3FSubscriptionId%3D0G51CM450ERBYXGF6V02%26tag%3Dhoge-22%26linkCode%3Dxm2%26camp%3D2025%26creative%3D165953%26creativeASIN%3D479733665X'}

itemsの取り出し方はこんな感じで。

 items = awsapw.getItem0( root, xmlns )
  for item in items:
   for key, val in item.items():
    print key, val

getItem1() : 検索結果の取り出し。取り出すタグを指定できる。

items = awsapw.getItem1( root, xmlns, extractAttributest )

In
- root : 検索結果のElementTree
- xmlns : 検索結果のxmlns
- extractAttributest : 取り出すタグ名とXPathのディクショナリ。指定の仕方はこんな感じで。XPathにはxmlnsを入れなければならない

  extractAttributest = {
   'ASIN':xmlns + 'ASIN',
   'DetailPageURL':xmlns + 'DetailPageURL',
   'Title':xmlns + 'ItemAttributes/' + xmlns + 'Title',
   'ProductGroup':xmlns + 'ItemAttributes/' + xmlns + 'ProductGroup' }
  items = awsapw.getItem1( root, xmlns, extractAttributest )

Out
- items : Itemのリスト。itemsの取り出し方は、お手軽版の getItems0 と同じ。

 items = awsapw.getItem0( root, xmlns )
  for item in items:
   for key, val in item.items():
    print key, val

checkBrowseNodeCheckWord() : BrowseNode(カテゴリ)のチェック

num = awsapw.checkBrowseNodeCheckWord( item, BrowseNodeCheckWord, logger＝False )

Item( 商品 ) に結びつけられている BrowseNode ( カテゴリ ) に指定された文字列が含まれている回数をカウントします。サービスによっては”アダルト”カテゴリの商品を検索結果に出したくない( もしくはそこにフォーカスして出したい )場合があると思うので作りました。

In

item : Item( 商品 ) のElement
BrowseNodeCheckWord : BrowseNodeに含まれているかチェックする文字列。
logger : ログを取りたい場合に。

Out

num : BrowseNode ( カテゴリ ) に指定された文字列が含まれていた回数。

まだ、試行錯誤の部分は残っているのですが
とりあえずソースはこんな感じです。

#!/usr/bin/env python
# coding=utf-8

import sys, urllib, logging
from xml.etree import ElementTree

class AmazonWebServicesWrapper:
 def __init__(self, appid, associateTag=False, logger=False ):
  self.opener = urllib.FancyURLopener()
  self.logger = logger
  self.AmazonWebServicesWrapperLocalSearchUrl = \
   "http://webservices.amazon.co.jp/onca/xml?" +\
   "Service=AWSECommerceService&" +\
   "SubscriptionId=" + appid

  if associateTag:
   self.AmazonWebServicesWrapperLocalSearchUrl = \
    self.AmazonWebServicesWrapperLocalSearchUrl + \
    "&AssociateTag=" + associateTag

  if self.logger:
   self.logger.debug( 'AmazonWebServicesWrapperLocalSearchUrl: ' +
                      self.AmazonWebServicesWrapperLocalSearchUrl )

 # xmlnsが定義されているとtagの名前の前に
 # xmlnsで定義されているURIが付加されて
 # {http://webservices.amazon.com/AWSECommerceService/2005-10-05}Title
 # のようになる
 def getXmlns( self, xml ):
  
  xmlns = ''
  start = xml.find( 'xmlns="' ) + len( 'xmlns="' )
  end = xml.find( '"', start + len( 'xmlns="' ) )

  if start >= 0 and end >= 0:
   xmlns = '{' + xml[ start : end ] + '}'

  if self.logger:
   self.logger.debug( 'xmlns: ' + xmlns )

  return xmlns

 # ItemSearchの実行
 def ItemSearch( self, Keywords, SearchIndex,
                                     ResponseGroup ):
  try:
   # Getの変数に値をセット
   Keywords = urllib.quote( Keywords )
   get_val = 'Operation=ItemSearch' +\
             '&Keywords=' + Keywords +\
             '&SearchIndex=' + SearchIndex +\
             '&ResponseGroup=' + ResponseGroup

   # ログ出力
   if self.logger:
    self.logger.debug( 'ItemSearch: ' +
     self.AmazonWebServicesWrapperLocalSearchUrl + '&' + get_val )
  
   # API実行
   xml = self.opener.open(self.AmazonWebServicesWrapperLocalSearchUrl ,
                        get_val).read()

   # xmlnsの取得
   xmlns = self.getXmlns( xml )

   # ElementTreeの構築
   root = ElementTree.fromstring( xml )

   return root, xmlns
  except:
   msg = 'ItemSearch失敗 %s: %s \n' \
            %(sys.exc_info()[0], sys.exc_info()[1])
   if self.logger:
    logger.error( msg )

 # ItemSearchの結果から要素を取り出す
 # ASIN, DetailPageURL, Title, ProductGroup, Itemが返るお手軽版
 def getItem0( self, root, xmlns ):
  try:
   # 検索にヒットしたItemのリスト作成
   ItemList = root.findall( xmlns + 'Items/' + xmlns + 'Item' )

   # XMLをパースしてASIN、DetailPageURL、Title、ProductGroupを返す
   # Itemごとにディショナリにして、さらにリストにまとめて返す
   items = 
   for item in ItemList:
    asin = item.find( xmlns + 'ASIN').text
    detailPageURL = item.find( xmlns + 'DetailPageURL').text
    title = item.find( xmlns + 'ItemAttributes/' + xmlns + 'Title').text.encode('utf-8')
    productGroup = item.find( xmlns + 'ItemAttributes/' +
                              xmlns + 'ProductGroup').text
    # print asin, title, productGroup, detailPageURL

    items.append( { 'ASIN':asin, 'Title':title,
                    'ProductGroup':productGroup,
                    'DetailPageURL':detailPageURL,
                    'Tree':item } )

   return items
  except:
   msg = 'getItem0失敗 %s: %s \n' \
            %(sys.exc_info()[0], sys.exc_info()[1])
   if self.logger:
    logger.error( msg )

 # ItemSearchの結果から要素を取り出す
 # 取り出す要素を指定できる。
 # 'Tree'(Item自身のElement)はデフォルトで返す。
 def getItem1( self, root, xmlns, extractAttributes ):
  try:

   # 検索にヒットしたItemのリスト作成
   ItemList = root.findall( xmlns + 'Items/' + xmlns + 'Item' )

   # 各ItemからextractAttributesで指定された要素を取り出して
   # itemsに入れて返す
   # itemsはディクショナリのリスト
   items = 
   for item in ItemList:
    Attr = {}
    Attr['Tree'] = item
    for key, val in extractAttributes.items():
     Attr[ key ] = item.find( val ).text.encode( 'utf-8' )
    items.append( Attr )
    
   return items
  except:
   msg = 'getItem1失敗 %s: %s \n' \
            %(sys.exc_info()[0], sys.exc_info()[1])
   if self.logger:
    logger.error( msg )

 # Item のBrowseNodeにBrowseNodeCheckWordで指定された
 # キーワードの出現回数を調べる
 # 主にアダルトチェックに使うつもりで作った
 def checkBrowseNodeCheckWord( self, item, BrowseNodeCheckWord,
                               logger=False):
  try:
   browseNodeCheckCount = 0
   if BrowseNodeCheckWord:
    for e in item.getiterator():
     if e.text:
      browseNode = e.text.encode('utf-8')
      if browseNode.find( BrowseNodeCheckWord ) >= 0:
       browseNodeCheckCount += 1
 
   return browseNodeCheckCount

  except:
   msg = 'checkBrowseNodeCheckWord失敗 %s: %s \n' \
            %(sys.exc_info()[0], sys.exc_info()[1])
   if self.logger:
    logger.error( msg )

if  __name__    ==  '__main__':
 # ---------- Logger set up ---------- #
 DEBUG_LEBEL=logging.DEBUG
 #DEBUG_LEBEL=logging.INFO
 LOG_FILE="./AmazonWebServicesWrapper.log"
 LOGGER_NAME="AmazonWebServicesWrapper"
 logger = logging.getLogger(LOGGER_NAME)
 logger.setLevel(DEBUG_LEBEL)
 formatter =\
  logging.Formatter("%(asctime)s, %(levelname)s, %(module)s, %(lineno)d, %(message)s")
 hdlr = logging.FileHandler(LOG_FILE)
 hdlr.setFormatter(formatter)
 logger.addHandler(hdlr)
 # ---------- End Logger set up ---------- #
 try:
  # 必須
  AccessKeyID = 'Your Access Key ID'
  # Amazonアソシエイトのアカウントがあればここに入れる
  AssociateTag = 'hoge-22'
  
  #awsapw = AmazonWebServicesWrapper( AccessKeyID )
  awsapw = AmazonWebServicesWrapper( AccessKeyID, AssociateTag, logger )

  # ItemSearchの実行
  #searchWord = 'Python'
  #searchWord = 'ブラッディ・マンディ'
  #searchWord = '涼宮ハルヒ'
  searchWord = '凛乎新香'
  SearchIndex = 'Blended'
  #SearchIndex = 'Books'
  # ItemSearch をする
  root, xmlns = awsapw.ItemSearch( 
           searchWord, SearchIndex, 'ItemAttributes,BrowseNodes',)

  # ItemSearchの検索結果から値を取り出す
  #items = awsapw.getItem0( root, xmlns )

  extractAttributest = { 
   'ASIN':xmlns + 'ASIN', 
   'DetailPageURL':xmlns + 'DetailPageURL',
   'Title':xmlns + 'ItemAttributes/' + xmlns + 'Title',
   'ProductGroup':xmlns + 'ItemAttributes/' + xmlns + 'ProductGroup' }
  items = awsapw.getItem1( root, xmlns, extractAttributest )
  #for item in items:
  # for key, val in item.items():
  #  print key, val

  # 検索結果の表示
  counter = 0.0
  BrowseNodeCheckWord = 'アダルト'
  if items:
   for item in items:
    #for f in item['Tree'].getiterator():
    # print f.tag, f.text

    BrowseNodeCheckCount = \
     awsapw.checkBrowseNodeCheckWord( item['Tree'], BrowseNodeCheckWord, logger )
    #print item['ASIN'], item['Title'], item['ProductGroup'],\
    #      BrowseNodeCheckCount, item['DetailPageURL'] 
    print item['Title']
    if BrowseNodeCheckCount > 0:
     counter += 1.0

   print searchWord, BrowseNodeCheckWord + '率',\
         counter / len( items) * 100.0, '%'
  else:
   print searchWord, "は Amazon Web Services で見つかりませんでした。"

 except:
  msg = 'main失敗 %s: %s \n' \
           %(sys.exc_info()[0], sys.exc_info()[1])
  logger.error( msg )

ではでは