每個網站通常都會遇到很多非搜索引擎的爬蟲,這些爬蟲大部分都是用于內容采集或是初學者所寫,它們和搜索引擎的爬蟲不一樣,沒有頻率控制,往往會消耗大量服務器資源,導致帶寬白白浪費了。
其實nginx可以非常容易地根據user-agent過濾請求,我們只需要在需要url入口位置通過一個簡單的正則表達式就可以過濾不符合要求的爬蟲請求:
location / {
if ($http_user_agent ~* python|curl|java|wget|httpclient|okhttp) {
return 503;
}
# 其它正常配置
...
}注意:變量$http_user_agent是一個可以直接在location中引用的nginx變量。~*表示不區分大小寫的正則匹配,通過python就可以過濾掉80%的python爬蟲。
nginx中禁止屏蔽網絡爬蟲
server {
listen 80;
server_name www.xxx.com;
#charset koi8-r;
#access_log logs/host.access.log main;
#location / {
# root html;
# index index.html index.htm;
#}
if ($http_user_agent ~* qihoobot|baiduspider|googlebot|googlebot-mobile|googlebot-image|mediapartners-google|adsbot-google|feedfetcher-google|yahoo! slurp|yahoo! slurp china|youdaobot|sosospider|sogou spider|sogou web spider|msnbot|ia_archiver|tomato bot) {
return 403;
}
location ~ ^/(.*)$ {
proxy_pass http://localhost:8080;
proxy_redirect off;
proxy_set_header host $host;
proxy_set_header x-real-ip $remote_addr;
proxy_set_header x-forwarded-for $proxy_add_x_forwarded_for;
client_max_body_size 10m;
client_body_buffer_size 128k;
proxy_connect_timeout 90;
proxy_send_timeout 90;
proxy_read_timeout 90;
proxy_buffer_size 4k;
proxy_buffers 4 32k;
proxy_busy_buffers_size 64k;
proxy_temp_file_write_size 64k;
}
#error_page 404 /404.html;
# redirect server error pages to the static page /50x.html
#
error_page 500 502 503 504 /50x.html;
location = /50x.html {
root html;
}
# proxy the php scripts to apache listening on 127.0.0.1:80
#
#location ~ \.php$ {
# proxy_pass http://127.0.0.1;
#}
# pass the php scripts to fastcgi server listening on 127.0.0.1:9000
#
#location ~ \.php$ {
# root html;
# fastcgi_pass 127.0.0.1:9000;
# fastcgi_index index.php;
# fastcgi_param script_filename /scripts$fastcgi_script_name;
# include fastcgi_params;
#}
# deny access to .htaccess files, if apache's document root
# concurs with nginx's one
#
#location ~ /\.ht {
# deny all;
#}
}可以用 curl 測試一下
curl -i -a qihoobot www.xxx.com
哪個域名注冊商免實名的epic服務器離線進不了游戲怎么辦 epic服務器離線進不了游戲如何解決Nginx內存池初始化配置技術講解360瀏覽器保存網頁賬號密碼的操作方法gpu云物理服務器價格程序放在云服務器性能測試商標申請對圖樣有要求嗎阿里巴巴云服務器怎么添加庫存