`

[Python]检查你的站点的人气[1008Updated]

 
阅读更多
<iframe align="center" marginwidth="0" marginheight="0" src="http://www.zealware.com/csdnblog336280.html" frameborder="0" width="336" scrolling="no" height="280"></iframe>

车东很久以前写过一篇《http://www.chedong.com/tech/link_pop_check.html,如何评价一个网站的人气(Link Popularity Check)》,介绍通过搜索引擎的一些隐含命令评价网站的“人气”。

其实用python做到这一点很容易。

我们换一种他没有提及的方式来实现,那时候应该还没有del.ici.ous这个站点。[2:41 补充]我们还增加了对alltheweb.com这个搜索引擎的反向链接数目判断的支持。

我们提供的 getURLRank Python版本 就是这么一种概念:
一个站点的流行程度可以通过很多种方式来判断,比如通过del.ici.ous这个站点,看有多少人收藏了这个页面,据此判断。也可以通过google/alltheweb.com/msn/sogou/baidu的反向链接有多少来判断。

下面是代码,在Python 2.5c下运行成功。

[2:41 补充]上午2:30,增加了对alltheweb.com这个搜索引擎的支持,并根据pandaxiaoxi的建议增加了getattr转发器。

[20:43 补充]增加了对google.com/search.msn.com/sogou.com搜索引擎的支持,并根据limodou的建议增加了字典替换。

[23:00补充]按照limodou的建议,按照urllib.quote_plus()来编码URL。

[16:46补充]增加了对siteexplorer.search.yahoo.com搜索引擎的Inlinks检测支持).

[20061008补充]增加了对baidu收录博客情况的检测支持。

文件名为:getURLRank.py

本程序的运行结果为:
>>python geturlrank.py http://blog.sina.com.cn/m/xujinglei

enter parse_delicious function...
del.ici.ous query url:http://del.icio.us/url/check?url=http%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
get URL : http://del.icio.us/url/check?url=http%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
有多少个人收藏了你的url呢:
148
out parse_delicious function...
enter parse_google function...
google query url:http://www.google.com/search?hl=en&q=link%3Ahttp%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
google 有多少个反向链接呢:
5760
out parse_google function...
enter parse_alltheweb function...
Alltheweb query url:http://www.alltheweb.com/urlinfo?_sb_lang=any&q=http%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
get URL : http://www.alltheweb.com/urlinfo?_sb_lang=any&q=http%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
Alltheweb 有多少个反向链接呢:
1
out parse_alltheweb function...
enter parse_msn function...
msn query url:http://search.msn.com/results.aspx?FORM=QBRE&q=link%3Ahttp%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
msn 有多少个反向链接呢:
217107
out parse_msn function...
enter parse_sogou function...
sogou query url:http://www.sogou.com/web?num=10&query=link%3Ahttp%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
www.sogou.com 评分是多少呢:
66
out parse_sogou function...
enter parse_yahoo function...
yahoo siteexplorer query url:http://siteexplorer.search.yahoo.com/search?bwm=i&bwms=p&bwmf=u&fr=FP-tab-web-t500&fr2=seo-rd-se&p=http%3A%2F%2Fblog.sina.com.cn%2Fm%2Fxujinglei
siteexplorer.search.yahoo.com Inlinks是多少呢:
228334
out parse_yahoo function...

代码为:

#-*-coding:UTF-8-*-
#
#
#
#getURLRank文档生成日期:2006.09.02
#
#
#
#(1)概述:
#
#模块名称:getURLRank.py
#
#模块说明:
#
#解析用户提供的站点地址,看有多少人收藏他,以及有多少个反向链接。
#
#以此来判断一个站点的流行程度。
#
#所在子系统:getURLRank
#
#
#
#系统总描述:
#
#我们提供的getURLRankPython版本就是这么一种概念:
#
#一个站点的流行程度可以通过很多种方式来判断,比如通过del.ici.ous这个站点,看有多少人收藏了这个页面,据此判断。
#
#也可以通过google.com/alltheweb.com的反向链接有多少来判断。
#
#
#
#运行方式:
#
#pythongetURLRank.py--默认检查我的blog的各种人气指数(delicious/google/alltheweb/msn/sogou/siteexplorer.search.yahoo.com)
#
#pythongetURLRank.pydelicious--检查我的blog被del.ici.ous收藏程度
#
#pythongetURLRank.pyallthweb--检查我的blog被allthweb.com搜索到了多少个反向链接
#
#pythongetURLRank.pygoogle--检查我的blog被google.com搜索到了多少个反向链接
#
#pythongetURLRank.pymsn--检查我的blog被search.msn.com搜索到了多少个反向链接
#
#pythongetURLRank.pysogou--检查我的blog被www.sogou.com评分SogouRank是多少
#
#pythongetURLRank.pyyahoo--检查我的blog被siteexplorer.search.yahoo.com收集到的Inlinks是多少
#
#
#
#pythongetURLRank.pyhttp://blog.csdn.net--检查csdnblog的各种人气指数(delicious/google/alltheweb/msn/sogou)
#
#pythongetURLRank.pyhttp://blog.csdn.netalltheweb--检查csdnblog被allthweb搜索到了多少个反向链接
#
#pythongetURLRank.pyhttp://blog.csdn.netgoogle--检查csdnblog被google搜索到了多少个反向链接
#
#pythongetURLRank.pyhttp://blog.csdn.netmsn--检查csdnblog被search.msn.com搜索到了多少个反向链接
#
#pythongetURLRank.pyhttp://blog.csdn.netsogou--检查csdnblog的SogouRank是多少
#
#
#
#(2)历史记录:
#
#创建人:zhengyun_ustc(2006年9月3日,上午1:00)
#
#修改人:zhengyun_ustc(2006年9月3日,上午2:30,增加了对alltheweb.com这个搜索引擎的支持,并根据pandaxiaoxi的建议增加了getattr转发器)
#
#修改人:zhengyun_ustc(2006年9月3日,下午4:30,增加了对google.com和search.msn.com搜索引擎的支持)
#
#修改人:zhengyun_ustc(2006年9月3日,下午5:10,增加了对www.sogou.com搜索引擎的SogouRank检测支持)
#
#修改人:zhengyun_ustc(2006年9月5日,下午2:10,增加了对siteexplorer.search.yahoo.com搜索引擎的Inlinks检测支持)
#
#联系我:GoogleTalk>>zhengyun(at)gmail.com
#
#Blogs:http://blog.csdn.net/zhengyun_ustc/

##(3)版权声明:
#
#zhengyun_ustc这个python版本的getURLRank,代码您可以借鉴,但不得用于商业用途,除非得到zhengyun_ustc的授权。
#

fromsgmllibimportSGMLParser
importos,sys,re
importsocket
importhttplib
importurllib
importurllib2
fromxml.domimportminidom


name
='zhengyun'
google_email
='your@gmail.com'
google_password
='pass'

#可以将要替换的东西写成一个list或字典来对应,然后通过循环进行替换
#
aReplaceURL=[('://','%3A%2F%2F'),('/','%2F')]


#google声明:“未经Google事先明确许可,不得将任何形式的自动查询发到Google系统。请注意,“自动查询”包括通过使用软件向Google发送查询来确定搜索不同内容时网站的Google排名”
#
所以如果要用Google的某些服务,比如日历,我们只能事先登录。
#
但是对于查询,是不用登录的。
definitGoogleLogin(email,passwd):

params
=urllib.urlencode({'Email':email,
'Passwd':passwd,
'service':'cl',
'source':'upbylunch-googie-0'})
headers
={"Content-type":"application/x-www-form-urlencoded"}

print'enterinitGoogleLoginfunction...'
conn
=httplib.HTTPSConnection("www.google.com")
conn.request(
"POST","/accounts/ClientLogin",params,headers)

response
=conn.getresponse()
data
=response.read()

#IftheloginfailsGooglereturns403
ifresponse.status==403:
google_auth
=None
else:
google_auth
=data.splitlines()[2].split('=')[1]
print'google_auth='+google_auth

conn.close
returngoogle_auth

classGoogleClRedirectHandler(urllib2.HTTPRedirectHandler):
defhttp_error_301(self,req,fp,code,msg,hdrs):
result
=urllib2.HTTPRedirectHandler.http_error_301(self,req,fp,code,msg,hdrs)
result.status
=code
returnresult
defhttp_error_201(self,req,fp,code,msg,hdrs):
return'Success'
defhttp_error_302(self,req,fp,code,msg,hdrs):
result
=urllib2.HTTPRedirectHandler.http_error_302(self,req,fp,code,msg,hdrs)
result.status
=code
returnhdrs.dict['location']

#获取web页面内容并返回
defgetURLContent(url):
print"getURL:%s"%url
f
=urllib.urlopen(url)
data
=f.read()
f.close()
returndata

#检查del.ici.ous站点对你的收藏情况的函数
defparse_delicious(url):
print'enterparse_deliciousfunction...'

#这是delicious网站检查连接是否被收藏的链接前缀
delicousPrefixURL='http://del.icio.us/url/check?url='
#fori,jinaReplaceURL:
#postData=url.replace(i,j)
postData=delicousPrefixURL+urllib.quote_plus(url)
print'del.ici.ousqueryurl:'+postData
#下面这个data元素是delicious网站检查结果的HTML
data=getURLContent(postData)
#printdata
#下面我们要从中通过正则表达式
#urls*hass*beens*saveds*by(s*?[^p]*s*)people
#来解析出到底有多少人收藏了:
pParsere=re.compile('url/s*has/s*been/s*saved/s*by(/s*?[^p]*/s*)people')
#print'输出正则表达式对象:'
#printpParsere
#正式开始解析,此处使用pParsere.match(data)是不行的;只能使用pParsere.search(data)
matchDelicious=pParsere.search(data)
if(matchDelicious):
#print'有多少个人收藏?是否解析成功呢?:'
#printmatchDelicious
print'有多少个人收藏了你的url呢:'
#如果这里输出matchDelicious.group(0),那么将会是整个输出,如“urlhasbeensavedby2people”
#而如果是group(1),则是正确的数字:2
printmatchDelicious.group(1).replace(',','')
else:
#那我们只有通过正则表达式
#Theres*iss*nos*del.icio.uss*historys*fors*thiss*url
#来解析出是不是没有人收藏了:
pParsere=re.compile('There/s*is/s*no/s*del.icio.us/s*history/s*for/s*this/s*url')
#print'输出正则表达式对象:'
#printpParsere
#正式开始解析,此处使用pParsere.match(data)是不行的;只能使用pParsere.search(data)
matchDelicious=pParsere.search(data)
if(matchDelicious):
print'是不是没有人收藏?是否解析成功呢?:'
printmatchDelicious
print'没有人收藏这个url!'

print'outparse_deliciousfunction...'

#检查Alltheweb站点对你的反向链接情况的函数
defparse_alltheweb(url):
print'enterparse_allthewebfunction...'

#这是Alltheweb网站检查是否有反向链接的前缀
#_sb_lang=any这个参数极其的重要!
#如果没有这个参数,那么alltheweb仅仅在english里搜索,那么势必会造成失真,比如我的blog只有12个结果;
#只有发起请求的时候就列出“_sb_lang=any&”参数,才可以在全部语言中搜索,这样我的blog就有212个结果了。
AllthewebPrefixURL='http://www.alltheweb.com/urlinfo?_sb_lang=any&q='
#fori,jinaReplaceURL:
#postData=url.replace(i,j)
postData=AllthewebPrefixURL+urllib.quote_plus(url)
print'Allthewebqueryurl:'+postData
#下面这个data元素是Alltheweb网站检查结果的HTML
data=getURLContent(postData)
#printdata
#下面我们要从中通过正则表达式
#<spans>(?<howmanyalltheweb>[^<span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">来解析出到底有多少个反向链接:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">pParsere</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">re.compile(r</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000"><spans>(?P<howmanyalltheweb>[^<span style="COLOR: #800000">'</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">print'输出正则表达式对象:'</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">printpParsere</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">正式开始解析,此处使用pParsere.match(data)是不行的;只能使用pParsere.search(data)</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">matchAlltheweb</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">pParsere.search(data)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">if</span><span style="COLOR: #000000">(matchAlltheweb):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">print'有多少个反向链接?是否解析成功呢?:'</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">printmatchAlltheweb</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">Alltheweb有多少个反向链接呢:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">而如果是group(1),则是正确的数字:212</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000">matchAlltheweb.group(</span><span style="COLOR: #000000">1</span><span style="COLOR: #000000">).replace(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">,</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,</span><span style="COLOR: #800000">''</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">else</span><span style="COLOR: #000000">:<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">那我们只有通过正则表达式</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">Nos*Webs*pagess*founds*thats*matchs*yours*query</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">来解析出是不是没有反向链接:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">pParsere</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">re.compile(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">No/s*Web/s*pages/s*found/s*that/s*match/s*your/s*query</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">print'输出正则表达式对象:'</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">printpParsere</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">正式开始解析,此处使用pParsere.match(data)是不行的;只能使用pParsere.search(data)</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">matchAlltheweb</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">pParsere.search(data)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">if</span><span style="COLOR: #000000">(matchAlltheweb):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">是不是没有反向链接?:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000">matchAlltheweb<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">这个url没有反向链接!</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">outparse_allthewebfunction...</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">解析google网页内容,得到有多少个连接</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">def</span><span style="COLOR: #000000">parseGoogleResult(page):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">m</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">re.search(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">(?ofabout<b>)([0-9]|,)+</b></span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,page)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">return</span><span style="COLOR: #000000">m.group(0).replace(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">,</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,</span><span style="COLOR: #800000">''</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">检查google站点对你的反向链接情况的函数</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">def</span><span style="COLOR: #000000">parse_google(url):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">enterparse_googlefunction...</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">首先模拟google登录,获取验证信息:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">google_auth=initGoogleLogin(google_email,google_password)</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">这是google网站检查是否有反向链接的前缀</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">GooglePrefixURL</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">http://www.google.com/search?hl=en&amp;q=link%3A</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">fori,jinaReplaceURL:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">postURL=url.replace(i,j)</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">postURL</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">GooglePrefixURL</span><span style="COLOR: #000000">+</span><span style="COLOR: #000000">urllib.quote_plus(url)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">googlequeryurl:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"></span><span style="COLOR: #000000">+</span><span style="COLOR: #000000">postURL<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">由于google某些服务对软件发起的查询是禁止的;</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">所以我们事先用自己的google账号登录</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">headers={"Authorization":"GoogleLoginauth=%s"%google_auth,</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">"Content-type":"text/html;charset=UTF-8"}</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">printheaders</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">下面这个data元素是google网站检查结果的HTML</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">request</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">urllib2.Request(postURL)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">request.add_header(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">User-Agent</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">Mozilla/5.0(X11;U;Linuxi686;pt-BR;rv:1.8)Gecko/20051111Firefox/1.5</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">opener</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">urllib2.build_opener(GoogleClRedirectHandler)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">data</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">opener.open(request).read()<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">printdata</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">下面我们要从中通过正则表达式</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">(?ofabout<b>)([0-9]|,)+</b></span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">来解析出到底有多少个反向链接:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">matchGoogleLinks</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">re.search(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">(?ofabout<b>)([0-9]|,)+</b></span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,data)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">if</span><span style="COLOR: #000000">(matchGoogleLinks):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">google有多少个反向链接呢:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000">matchGoogleLinks.group(0).replace(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">,</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,</span><span style="COLOR: #800000">''</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">else</span><span style="COLOR: #000000">:<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">那我们只有通过正则表达式</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">dids*nots*matchs*anys*documents.</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">来解析出是不是没有反向链接:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">pParsere</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">re.compile(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">did/s*not/s*match/s*any/s*documents.</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">print'输出正则表达式对象:'</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">printpParsere</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">正式开始解析,此处使用pParsere.match(data)是不行的;只能使用pParsere.search(data)</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">matchGoogleLinks</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">pParsere.search(data)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">if</span><span style="COLOR: #000000">(matchGoogleLinks):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">是不是没有google反向链接?:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">printmatchGoogleLinks</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">url确实没有google反向链接!</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">outparse_googlefunction...</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">检查msn站点对你的反向链接情况的函数</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">def</span><span style="COLOR: #000000">parse_msn(url):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">enterparse_msnfunction...</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">这是msn网站检查是否有反向链接的前缀</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">PrefixURL</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">http://search.msn.com/results.aspx?FORM=QBRE&amp;q=link%3A</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">fori,jinaReplaceURL:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">postURL=url.replace(i,j)</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">postURL</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">PrefixURL</span><span style="COLOR: #000000">+</span><span style="COLOR: #000000">urllib.quote_plus(url)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">msnqueryurl:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"></span><span style="COLOR: #000000">+</span><span style="COLOR: #000000">postURL<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">下面这个data元素是msn网站检查结果的HTML</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">request</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">urllib2.Request(postURL)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">request.add_header(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">User-Agent</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">Mozilla/5.0(X11;U;Linuxi686;pt-BR;rv:1.8)Gecko/20051111Firefox/1.5</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">opener</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">urllib2.build_opener()<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">data</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">opener.open(request).read()<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">printdata</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">下面我们要从中通过正则表达式</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">Pages*1s*of(s*?[^p]*s*)resultss*containing</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">来解析出到底有多少个反向链接:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">matchLinks</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">re.search(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">Page/s*1/s*of(/s*?[^p]*/s*)results/s*containing</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,data)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">if</span><span style="COLOR: #000000">(matchLinks):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">msn有多少个反向链接呢:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">注意:matchLinks.group(0)将会打印出整整一句话“Page1of326resultscontaining”</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000">matchLinks.group(</span><span style="COLOR: #000000">1</span><span style="COLOR: #000000">).replace(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">,</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,</span><span style="COLOR: #800000">''</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">else</span><span style="COLOR: #000000">:<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">那我们只有通过正则表达式</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">Wes*couldn'ts*finds*anys*resultss*containing</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">来解析出是不是没有反向链接:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">pParsere</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">re.compile(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">We/s*couldn't/s*find/s*any/s*results/s*containing</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">print'输出正则表达式对象:'</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">printpParsere</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">正式开始解析,此处使用pParsere.match(data)是不行的;只能使用pParsere.search(data)</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">matchLinks</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">pParsere.search(data)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">if</span><span style="COLOR: #000000">(matchLinks):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">是不是没有msn反向链接?:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">url确实没有msn反向链接!</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">outparse_msnfunction...</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">检查sogou站点对你的sogouRank的函数</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">def</span><span style="COLOR: #000000">parse_sogou(url):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">enterparse_sogoufunction...</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">这是www.sogou.com网站检查是否有反向链接的前缀</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">PrefixURL</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">http://www.sogou.com/web?num=10&amp;query=link%3A</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">fori,jinaReplaceURL:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">postURL=url.replace(i,j)</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">postURL</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">PrefixURL</span><span style="COLOR: #000000">+</span><span style="COLOR: #000000">urllib.quote_plus(url)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">sogouqueryurl:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"></span><span style="COLOR: #000000">+</span><span style="COLOR: #000000">postURL<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">下面这个data元素是sogou网站检查结果的HTML</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">request</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">urllib2.Request(postURL)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">request.add_header(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">User-Agent</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">Mozilla/5.0(X11;U;Linuxi686;pt-BR;rv:1.8)Gecko/20051111Firefox/1.5</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">opener</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">urllib2.build_opener()<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">data</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">opener.open(request).read()<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">printdata</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">下面我们要从中通过正则表达式</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000"><imgs>]*width="([^%]*)</imgs></span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">来解析出sogou给这个网页打的分数是多少:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">matchLinks</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">re.search(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000"><img>]*width="([^/%]*)</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,data)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">if</span><span style="COLOR: #000000">(matchLinks):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">www.sogou.com评分是多少呢:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">注意:matchLinks.group(0)将会打印出整整一句话</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000">matchLinks.group(</span><span style="COLOR: #000000">1</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">else</span><span style="COLOR: #000000">:<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">那我们只有通过正则表达式</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000"><divs>抱歉,没有找到指向网页</divs></span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">来解析出是不是没有反向链接:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">pParsere</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">re.compile(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000"><div> <span style="COLOR: #800000">'</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">print'输出正则表达式对象:'</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">printpParsere</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">正式开始解析,此处使用pParsere.match(data)是不行的;只能使用pParsere.search(data)</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">matchLinks</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">pParsere.search(data)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">if</span><span style="COLOR: #000000">(matchLinks):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">是不是没有sogou反向链接?:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">url确实没有sogouRank!</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">outparse_sogoufunction...</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">检查yahoo!站点对你的siteexplorer的Inlinks数字</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">def</span><span style="COLOR: #000000">parse_yahoo(url):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">enterparse_yahoofunction...</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">这是siteexplorer.search.yahoo.com网站检查是否有反向链接的前缀</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">PrefixURL</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">http://siteexplorer.search.yahoo.com/search?bwm=i&amp;bwms=p&amp;bwmf=u&amp;fr=FP-tab-web-t500&amp;fr2=seo-rd-se&amp;p=</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">fori,jinaReplaceURL:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">postURL=url.replace(i,j)</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">postURL</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">PrefixURL</span><span style="COLOR: #000000">+</span><span style="COLOR: #000000">urllib.quote_plus(url)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">yahoositeexplorerqueryurl:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"></span><span style="COLOR: #000000">+</span><span style="COLOR: #000000">postURL<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">下面这个data元素是yahoo!网站检查结果的HTML</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">request</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">urllib2.Request(postURL)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">request.add_header(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">User-Agent</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">Mozilla/5.0(X11;U;Linuxi686;pt-BR;rv:1.8)Gecko/20051111Firefox/1.5</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">opener</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">urllib2.build_opener()<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">data</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">opener.open(request).read()<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">printdata</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">下面我们要从中通过正则表达式</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">([0-9]*)(?<first>[^&gt;]*)s*ofs*abouts*<strong>(s*?[^</strong></first></span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">来解析出sogou给这个网页打的分数是多少:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">matchLinks</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">re.search(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">/s*of/s*about/s*<strong>(/s*?[^</strong></span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,data)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">if</span><span style="COLOR: #000000">(matchLinks):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">siteexplorer.search.yahoo.comInlinks是多少呢:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">注意:matchLinks.group(0)将会打印出整整一句话</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000">matchLinks.group(</span><span style="COLOR: #000000">1</span><span style="COLOR: #000000">).replace(</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">,</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">,</span><span style="COLOR: #800000">''</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">else</span><span style="COLOR: #000000">:<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">那我们只有通过正则表达式</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">Wes*weres*unables*tos*finds*anys*resultss*fors*thes*givens*URLs*ins*ours*index</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">来解析出是不是没有反向链接:</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">pParsere</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">re.compile(r</span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">Wes*weres*unables*tos*finds*anys*resultss*fors*thes*givens*URLs*ins*ours*index</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">print'输出正则表达式对象:'</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">printpParsere</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">正式开始解析,此处使用pParsere.match(data)是不行的;只能使用pParsere.search(data)</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">matchLinks</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">pParsere.search(data)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">if</span><span style="COLOR: #000000">(matchLinks):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">是不是没有siteexplorer.search.yahoo.comInlinks?:</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">url确实没有siteexplorer.search.yahoo.comInlinks!</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">outparse_yahoofunction...</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000"></span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">#</span><span style="COLOR: #008000">把给定的站点的连接发送给每一个老大哥,比如说del.ici.ous,Alltheweb,google,msn,sogou等</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">#</span><span style="COLOR: #008000">从返回的页面中找到所需要的数值。</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">def</span><span style="COLOR: #000000">postURL2BigBrother(url</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">"</span><span style="COLOR: #800000">http://blog.csdn.net/zhengyun_ustc</span><span style="COLOR: #800000">"</span><span style="COLOR: #000000">,bigbrother</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">""</span><span style="COLOR: #000000">):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">sys.setdefaultencoding('utf-8')</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">if</span><span style="COLOR: #000000">(len(bigbrother)</span><span style="COLOR: #000000">&gt;</span><span style="COLOR: #000000">0):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">method_name</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">"</span><span style="COLOR: #800000">parse_%s</span><span style="COLOR: #800000">"</span><span style="COLOR: #000000"></span><span style="COLOR: #000000">%</span><span style="COLOR: #000000">bigbrother<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">RankMethod</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000"></span><span style="COLOR: #800080">__import__</span><span style="COLOR: #000000">(</span><span style="COLOR: #800000">"</span><span style="COLOR: #800000">getURLRank</span><span style="COLOR: #800000">"</span><span style="COLOR: #000000">)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">使用getattr函数,可以得到一个直到运行时才知道名称的函数的引用</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">method</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">getattr(RankMethod,method_name)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">method(url)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">else</span><span style="COLOR: #000000">:<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">parse_delicious(url)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">parse_google(url)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">parse_alltheweb(url)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">parse_msn(url)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">parse_sogou(url)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">parse_yahoo(url)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">应用入口</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">if</span><span style="COLOR: #000000"></span><span style="COLOR: #800080">__name__</span><span style="COLOR: #000000"></span><span style="COLOR: #000000">==</span><span style="COLOR: #000000"></span><span style="COLOR: #800000">'</span><span style="COLOR: #800000">__main__</span><span style="COLOR: #800000">'</span><span style="COLOR: #000000">:<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">argc</span><span style="COLOR: #000000">=</span><span style="COLOR: #000000">len(sys.argv)<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">if</span><span style="COLOR: #000000">(argc</span><span style="COLOR: #000000">==</span><span style="COLOR: #000000"></span><span style="COLOR: #000000">3</span><span style="COLOR: #000000">):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000">sys.argv[</span><span style="COLOR: #000000">1</span><span style="COLOR: #000000">]<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">print</span><span style="COLOR: #000000">sys.argv[</span><span style="COLOR: #000000">2</span><span style="COLOR: #000000">]<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">把postURL2BigBrother函数作为一个分发器,加一个参数,BigBrother='delicious'或者BigBrother='alltheweb'来指定使用谁解析</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000"></span><span style="COLOR: #008000">#</span><span style="COLOR: #008000">http://www.woodpecker.org.cn/diveintopython/power_of_introspection/getattr.html</span><span style="COLOR: #008000"><br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #000000">postURL2BigBrother(sys.argv[</span><span style="COLOR: #000000">1</span><span style="COLOR: #000000">],sys.argv[</span><span style="COLOR: #000000">2</span><span style="COLOR: #000000">])<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">elif</span><span style="COLOR: #000000">(argc</span><span style="COLOR: #000000">==</span><span style="COLOR: #000000"></span><span style="COLOR: #000000">2</span><span style="COLOR: #000000">):<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">postURL2BigBrother(</span><span style="COLOR: #800000">"</span><span style="COLOR: #800000">http://blog.csdn.net/zhengyun_ustc</span><span style="COLOR: #800000">"</span><span style="COLOR: #000000">,sys.argv[</span><span style="COLOR: #000000">1</span><span style="COLOR: #000000">])<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span><span style="COLOR: #0000ff">else</span><span style="COLOR: #000000">:<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif">postURL2BigBrother()<br><img alt="" align="top" src="http://images.csdn.net/syntaxhighlighting/OutliningIndicators/None.gif"></span> </div></span></howmanyalltheweb></spans></span></howmanyalltheweb></spans>

[20061008补充代码部分为]

#检查baidu站点对你的博客收录情况的函数
defparse_baidu(url):
print'enterparse_baidufunction...'

#这是baidu网站检查是否有反向链接的前缀
PrefixURL='http://www.baidu.com/s?lm=0&si=&rn=10&ie=gb2312&ct=0&wd=domain%3A'
#fori,jinaReplaceURL:
#postURL=url.replace(i,j)
postURL=PrefixURL+urllib.quote_plus(url)
print'baiduqueryurl:'+postURL

#下面这个data元素是baidu网站检查结果的HTML
request=urllib2.Request(postURL)
request.add_header(
'User-Agent','Mozilla/5.0(X11;U;Linuxi686;pt-BR;rv:1.8)Gecko/20051111Firefox/1.5')
opener
=urllib2.build_opener(GoogleClRedirectHandler)
data
=opener.open(request).read()
#printdata

#下面我们要从中通过正则表达式
matchLinks=re.search(u'(?',unicode(data,'cp936'))
if(matchLinksisNone):
matchLinks
=re.search(u'(?',unicode(data,'cp936'))
printmatchLinks
if(matchLinks):
print'baidu有多少个反向链接呢:'
printmatchLinks.group().replace(',','')
else:
#那我们只有通过正则表达式
#抱歉,没有找到与s*[^”]*
#来解析出是不是没有反向链接:
pParsere=re.compile(u'抱歉,没有找到与/s*[^”]*')
#正式开始解析,此处使用pParsere.match(data)是不行的;只能使用pParsere.search(data)
matchLinks=pParsere.search(unicode(data,'cp936'))
if(matchLinks):
print'是不是没有baidu收录呢?:'
#printmatchLinks
print'url确实没有baidu收录!'

print'outparse_baidufunction...'



分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics