[ ]
宙飒天下网 1. url分析工具(purl)
url是web上指定内容的地址,它基本可以分为几个部分
- scheme 协议路径,比如http,https等
- host 主机名,比如
www.baidu.com
这样 - path 主机下内容具体所在的路劲
- query_params 在url中作为参数传入路径的内容
purl是一个简单好用的url分解工具,用它可以方便的获取一段url的各部分内容
安装:
pip install purl
使用:
url = "https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=0&rsv_idx=1&tn=baidu&wd=newbee&rsv_pq=c87beda3000c6192&rsv_t=69e1zt6mVNzkBbdJDmwMn6Q%2FanmE7t9awTdR05ddZ2dL0Sxnf9BuLA3TB3g&rsv_enter=1&rsv_sug3=8&rsv_sug1=10&rsv_sug7=100"
from purl import URL
from_str = URL(url)
from_str.scheme()
'https'
from_str.host()
'www.baidu.com'
from_str.query()
'ie=utf-8&f=8&rsv_bp=0&rsv_idx=1&tn=baidu&wd=newbee&rsv_pq=c87beda3000c6192&rsv_t=69e1zt6mVNzkBbdJDmwMn6Q%2FanmE7t9awTdR05ddZ2dL0Sxnf9BuLA3TB3g&rsv_enter=1&rsv_sug3=8&rsv_sug1=10&rsv_sug7=100'
url = "https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=0&rsv_idx=1&tn=baidu&wd=newbee&rsv_pq=c87beda3000c6192&rsv_t=69e1zt6mVNzkBbdJDmwMn6Q%2FanmE7t9awTdR05ddZ2dL0Sxnf9BuLA3TB3g&rsv_enter=1&rsv_sug3=8&rsv_sug1=10&rsv_sug7=100"
0
url = "https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=0&rsv_idx=1&tn=baidu&wd=newbee&rsv_pq=c87beda3000c6192&rsv_t=69e1zt6mVNzkBbdJDmwMn6Q%2FanmE7t9awTdR05ddZ2dL0Sxnf9BuLA3TB3g&rsv_enter=1&rsv_sug3=8&rsv_sug1=10&rsv_sug7=100"
1
url = "https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=0&rsv_idx=1&tn=baidu&wd=newbee&rsv_pq=c87beda3000c6192&rsv_t=69e1zt6mVNzkBbdJDmwMn6Q%2FanmE7t9awTdR05ddZ2dL0Sxnf9BuLA3TB3g&rsv_enter=1&rsv_sug3=8&rsv_sug1=10&rsv_sug7=100"
2
url = "https://www.baidu.com/s?ie=utf-8&f=8&rsv_bp=0&rsv_idx=1&tn=baidu&wd=newbee&rsv_pq=c87beda3000c6192&rsv_t=69e1zt6mVNzkBbdJDmwMn6Q%2FanmE7t9awTdR05ddZ2dL0Sxnf9BuLA3TB3g&rsv_enter=1&rsv_sug3=8&rsv_sug1=10&rsv_sug7=100"
3
还没有评论,来说两句吧...