本文最后更新于 1035 天前,其中的信息可能已经有所发展或是发生改变。
环境准备
Python 3 环境
测试脚本
新建一个url.txt
的记事本文件,将需要检测的网址写在记事本里,每行一个网址,网址要用http://
或者https://
开头。
然后新建一个go2url.py
的 Python 脚本,复制粘贴以下代码:
# -*-coding: utf-8 -*-
import requests,re
import urllib3
import logging
logging.captureWarnings(True)
from concurrent.futures import ThreadPoolExecutor
import argparse
import time
#import ssl
from requests.packages.urllib3.exceptions import InsecureRequestWarning
#ssl._create_default_https_context = ssl._create_unverified_context
requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
def parser_args():
parser = argparse.ArgumentParser()
parser.add_argument("-f","--file",help="指定domain文件")
return parser.parse_args()
f = open("result.csv", "a", encoding='utf-8')
f.write("源地址"+","+"跳转地址"+","+"状态码"+","+"标题"+'\n')
f = f.close()
start = time.time()
def getTitle(url):
f = open("result.csv", "a", encoding='utf-8')
header = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
}
try:
res = requests.get(url, headers=header, verify=False, allow_redirects=True, timeout=10)
code = res.status_code
except Exception as error:
code = "无法访问"
code1 = str(code)
if code1 != "无法访问":
try:
urllib3.disable_warnings()
res = requests.get(url, headers=header, verify=False, allow_redirects=True,timeout=10)
res.encoding = res.apparent_encoding
title = re.findall("(?<=\<title\>)(?:.|\n)+?(?=\<)", res.text, re.IGNORECASE)[0].strip()
except :
title = "[ ]"
f.write(url+","+res.url+","+code1+","+title+'\n')
print(url+","+res.url+","+code1+","+title)
else:
title = " "
f.write(url + "," + " " + "," + code1 + "," + title + '\n')
print(url + "," + " " + "," + code1 + "," + title)
f = f.close()
a = vars(parser_args())
file = a['file']
try:
with ThreadPoolExecutor(max_workers=100) as executor:
for i in open(file, errors="ignore").readlines():
executor.submit(getTitle, i.strip().strip('\\'))
except:
print('-f 指定domain文件')
end = time.time()
print("总耗时:",end - start,"秒")
保存脚本后,记得将.txt
文件和.py
文件放在同一个文件目录下,打开终端执行以下命令:
python go2url.py -f url.txt
测试完记事本里的所有网址后,脚本会自动创建一个名为result.csv
的表格文件,所有的测试结果都保存在这个表格文件里。
附录:批量转换网址脚本
有时候我们拿到的网址缺少http://
或者https://
开头的信息,用以下 Python 脚本就可以批量给网址加上必要的开头。
先新建一个名为ip.txt
的记事本文件,将缺少头部信息的网址填进去,每行一个网址。
然后新建一个名为ip2url.py
的 Python 脚本,复制粘贴以下代码:
# -*-coding: utf-8 -*-
with open("ip.txt","r") as f:
line = f.readlines()
with open("ip.txt","w") as f2:
for i in line:
f2.write('http://'+i)
# 是否另起一行生成 https 开头的地址
with open("ip.txt","a+") as f3:
f3.write('\n')
with open("ip.txt","a+") as f4:
for i in line:
f4.write('https://'+i)
保存后打开终端执行以下命令:
python ip2url.py
命令执行完后重新打开ip.txt
的记事本文件,就可以看到网址前面加上了必要的信息了 。
以上。
强,学习中
测试可以使用,会有很小概率返回数据结果有出入,可能是因为网络的原因。总之十分感谢!!!