对于一般的网站来说,通过以下代码便可以获取到cookie:

1
2
3
4
5
6
7
8
9
10
import urllib2
import urllib
import cookielib
logurl = "https://www.digikey.com/classic/RegisteredUser/Login.aspx?"
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
urllib2.install_opener(opener)
resp = urllib2.urlopen(logurl)
for index, cookie in enumerate(cj):
print '[',index, ']',cookie

然后在构造post数据向目标url发送即可(至于header,有人说如果在此时再次提交自己构造的handers将会覆盖获取到的有cookie的hander,未亲自试验,不过若是真的可以试试调用opener.addheaders方法添加)

但digikey这个网站不知什么原因访问后不给返回cookie???

经过试验发现,从浏览器中直接提取登录后的cookie添加到headers中,直接访问登录后的页面就实现了登录!连postdata似乎都是多余的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
teurl = "https://www.digikey.com/classic/RegisteredUser/MyDigikey.aspx"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:23.0) Gecko/20100101 Firefox/23.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language":"q=0.8,en-us",
"Host":"www.digikey.com",
"Cookie":"TS6b482d=1aa460f525235eaaaf6763d718b53ba564323; sid=5278426151-44225; TS50f921=c7b09221315dc72ed80a2b24f014fc2ffa4b16ccf877976952494826a3c65dd840b6a4ae6bad0a341809f751ca85ab033df615a7; TS168127=b58c4c2296cc37b2a5515974e403a008cdc46701e181d2a252494823019cdd5115d4a2a6a3c65dd840b6a4ae6bad0a341809f751ca85ab033df615a724f3d3ac91dbdb26; cur=USD; SiteForCur=US; utag_main=_st:1380536340264$ses_id:1380534378638%3Bexp-session; TS50f921_77=6487_b8824fd7e25d22fc_rsb_0_rs_https%3A%2F%2Fwww.digikey.com%2Fclassic%2FRegisteredUser%2FLogin.aspx%3FReturnUrl%3D%252fclassic%252fregistereduser%252fmydigikey.aspx%253fsite%253dus%2526lang%253den%26site%3Dus%26lang%3Den_rs_0; WT_FPC=id=36664e24-272b-40b3-9c92-2bce365f251d:lv=1380484152114:ss=1380483959856"
}
#postdata = {
# "__EVENTARGUMENT":"",
# "__EVENTTARGET":"",
# "__EVENTVALIDATION":"BcGpYOslmB3LGgxIVeQ+h35cvehYPZQcz1tM4jAlXqyYqV/g1blGRZnSJ4itN0YHd4C7aQtlJT0qWTL7vspdqVLEZtyljs5BJJuR+NhrIxCG0sdcfegZ1ZR1hdl/qIcNf1qpWfClikXsLCYWLe1N/Q6P1kU=",
# "__LASTFOCUS":"",
# "__SCROLLPOSITIONX":0,
# "__SCROLLPOSITIONY":0,
# "ctl00$ctl00$mainContentPlaceHolder$mainContentPlaceHolder$btnLogin":"Log In",
# "ctl00$ctl00$mainContentPlaceHolder$mainContentPlaceHolder$txtPassword":"xxxxx",
# "ctl00$ctl00$mainContentPlaceHolder$mainContentPlaceHolder$txtUsername":"xxxxx",
# }
#postdata=urllib.urlencode(postdata)
req = urllib2.Request(teurl,postdata,headers)
res = urllib2.urlopen(req)

没想到最后解决的方法居然这么简单,甚至可以使用动态ip的方式抓取。但不知道以后会不会出现cookie过期的情况。

至于获取不到cookie的情况,或许因为302页面跳转原因。

这样可以考虑用LWPCookieJar或MozillaCookieJar将获取的cookie存到文件中,再load()载入。