爬虫:某音乐python批量下载

本文仅做技术交流学习使用,严禁一切商业或者其他用途,否则带来的后果由当事人自己承担。

python实现 某狗子音乐批量下载。

   python 3.11

以为搜索为例
https://www.kugou.com/yy/html/search.html#searchType=song&searchKeyWord=%E5%91%A8%E6%9D%B0%E4%BC%A6

在本页面可以看到排行榜歌曲,比如 周杰伦-晴天

接口抓取

  1. 打开控制台
  2. 刷新网页
  3. 找到网络面板和搜索按钮
  4. 在搜索按钮找关键字【晴天】
  5. 寻找并定位到接口位置

    如图:

接口分析

  1. 接口处右键复制链接地址得到信息:
    https://complexsearch.kugou.com/v2/search/song?callback=callback123&srcappid=2919&clientver=1000&clienttime=1702885903212&mid=1c8f9939584a8e35ed5d8e5510b845fe&uuid=1c8f9939584a8e35ed5d8e5510b845fe&dfid=1txNPf3JNHSk05LR6l3Upjvi&keyword=%E5%91%A8%E6%9D%B0%E4%BC%A6&page=1&pagesize=30&bitrate=0&isfuzzy=0&inputtype=0&platform=WebFilter&userid=0&iscorrection=1&privilege_filter=0&filter=10&token=&appid=1014&signature=461cd68d2aee31af3b9a163205848891
  2. 载荷和标头页面可以看到次网络请求的param和header
  3. 分析网络请求发现只有几处是变动的,其中 signature 字段看起来是需要动态计算


upload successful,

upload successful

js逆向获取加密函数

  1. 继续搜索关键字signature,分析并定位到此字段生成处。取值为d函数对s字符串进行加密所得
  2. 右键选择在来源面板中打开。

upload successful

断点调试

获取param:s的值,并进行md5加密测试。发现经过d函数运算和我们对s参数加密得到的结果一致 。 知识简单md5加密,并没有进行加盐等操作。

upload successful

代码组织

经过以上分析,我们拿到了音乐的搜索接口
https://complexsearch.kugou.com/v2/search/song 并且对此接口的入参有了一定的了解。

  1. 准备好url和header
1
2
3
4
5
6
7
8
9
 
url = "https://complexsearch.kugou.com/v2/search/song"
header = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
"Referer": "https://www.kugou.com/"

}


  1. 构造签名
    md5动态签名得到signature字段,其中每次只有歌手名和时间戳两个字段变化
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
 
def get_sign(name,req_time):
value = (
"NVPh5oo715z5DIWAeQlhMDsWXXQV4hwt"
"appid=1014"
"bitrate=0"
"callback=callback123"
f"clienttime={req_time}"
"clientver=1000"
"dfid=1txNPf3JNHSk05LR6l3Upjvi"
"filter=10"
"inputtype=0"
"iscorrection=1"
"isfuzzy=0"
f"keyword={name}"
"mid=1c8f9939584a8e35ed5d8e5510b845fe"
"page=1"
"pagesize=30"
"platform=WebFilter"
"privilege_filter=0"
"srcappid=2919"
"token="
"userid=0"
"uuid=1c8f9939584a8e35ed5d8e5510b845fe"
"NVPh5oo715z5DIWAeQlhMDsWXXQV4hwt"
)
return hashlib.md5(value.encode(encoding="UTF-8")).hexdigest()
  1. 构造params
    分析:每次搜索接口请求只有 时间戳,歌手名,签名三个字段可变
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    def getParams(time,name,sign) :
    return {
    "callback":"callback123",
    "srcappid":"2919",
    "clientver":"1000",
    "clienttime":f"{time}",
    "mid":"1c8f9939584a8e35ed5d8e5510b845fe",
    "uuid":"1c8f9939584a8e35ed5d8e5510b845fe",
    "dfid":"1txNPf3JNHSk05LR6l3Upjvi",
    "keyword":f"{name}",
    "page":"1",
    "pagesize":"30",
    "bitrate":"0",
    "isfuzzy":"0",
    "inputtype":"0",
    "platform":"WebFilter",
    "userid":"0",
    "iscorrection":"1",
    "privilege_filter":"0",
    "filter":"10",
    "token":"",
    "appid":"1014",
    "signature":f"{sign}"
    }
  2. 发送请求
    我们以搜索周杰伦歌曲为例
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
     SINGER_NAME="周杰伦"
    req_time = int(time.time() * 1000)
    singer_name = SINGER_NAME
    sign = get_sign(SINGER_NAME,req_time)
    params = getParams(req_time, SINGER_NAME, sign)

    r = requests.get(url, headers=header, params=params)
    r.encoding = 'utf-8'
    content = r.content
    print(content)