php获取远程网址header头信息

一个用php获取远程网址header头信息的方法,这在采集时很有用,他可以让你判断出来,远程文件或网页是否正常,是否是404页 有二种方法 ### 用php的函数get_headers $url = 'http://www.example.com'; print_r(get_headers($url)); print_r(get_headers($url, 1)); 上例的输出如下 Array ( [0] => HTTP/1.1 200 OK [1] => Date: Sat, 29 May 2004 12:28:13 GMT [2] => Server: Apache/1.3.27 (Unix) (Red-Hat/Linux) [3] => Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT [4] => ETag: "3f80f-1b6-3e1cb03b" [5] => Accept-Ranges: bytes [6] => Content-Length: 438 [7] => Connection: close [8] => Content-Type: text/html ) Array ( [0] => HTTP/1.1 200 OK [Date] => Sat, 29 May 2004 12:28:14 GMT [Server] => Apache/1.3.27 (Unix) (Red-Hat/Linux) [Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT [ETag] => "3f80f-1b6-3e1cb03b" [Accept-Ranges] => bytes [Content-Length] => 438 [Connection] => close [Content-Type] => text/html ) get_headers 是用来取得远程服务器的响应头信息的.用返回的第一个数组再加上正则就可以判断远程地址是否为200正常网页 ### 用curl CURLOPT_NOBODY参数只抓取header头信息 curl函数真是个好东西,curl参数里有一项可以配置只抓取远程网页的header头信息 如下代码,加红的地方是关健,他指定了curl抓的内容中包含header头,并且不要body内容. function get_header($url){ $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_HEADER, true); curl_setopt($ch, CURLOPT_NOBODY,true); curl_setopt($ch, CURLOPT_RETURNTRANSFER,true); curl_setopt($ch, CURLOPT_FOLLOWLOCATION,true); curl_setopt($ch, CURLOPT_AUTOREFERER,true); curl_setopt($ch, CURLOPT_TIMEOUT,30); curl_setopt($ch, CURLOPT_HTTPHEADER, array( 'Accept: */*', 'User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)', 'Connection: Keep-Alive')); $header = curl_exec($ch); return $header; }
联系我们

邮箱 626512443@qq.com
电话 18611320371(微信)
QQ群 235681453

Copyright © 2015-2022

备案号:京ICP备15003423号-3