1
sadara 2014-04-29 22:49:51 +08:00 via iPhone
记得有个淘宝客程序叫单店宝
|
2
mahone3297 2014-04-29 23:35:37 +08:00
已fork。。。
|
3
leyle 2014-04-29 23:55:55 +08:00 via Android
这个有意思,先关注下,白天电脑看看
|
4
bigshan 2014-04-30 01:49:46 +08:00 via iPhone
明天用电脑看看咯
|
5
huangsong 2014-04-30 10:35:31 +08:00
fork 一下
|
6
aWangami 2014-04-30 12:40:28 +08:00
C:\Users\Administrator\Desktop\Fetch-Taobao>php fetch.php 'http://shop65262430.taobao.com'
PHP Warning: file_put_contents(/tmp/fetchgoods.pid): failed to open stream: No such file or directo ry in C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php on line 13 PHP Stack trace: PHP 1. {main}() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:0 PHP 2. file_put_contents() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:13 Warning: file_put_contents(/tmp/fetchgoods.pid): failed to open stream: No such file or directory in C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php on line 13 Call Stack: 0.0010 127528 1. {main}() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:0 0.0010 128008 2. file_put_contents() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php :13 PHP Notice: Undefined index: scheme in C:\Users\Administrator\Desktop\Fetch-Taobao\FetchGoods.class .php on line 59 PHP Stack trace: PHP 1. {main}() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:0 PHP 2. FetchGoods->run() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:33 PHP 3. FetchGoods->fetchOneShop() C:\Users\Administrator\Desktop\Fetch-Taobao\FetchGoods.class.php :50 Notice: Undefined index: scheme in C:\Users\Administrator\Desktop\Fetch-Taobao\FetchGoods.class.php on line 59 Call Stack: 0.0010 127528 1. {main}() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:0 0.0068 192584 2. FetchGoods->run() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:3 3 0.0068 193152 3. FetchGoods->fetchOneShop() C:\Users\Administrator\Desktop\Fetch-Taobao\Fe tchGoods.class.php:50 PHP Notice: Undefined index: host in C:\Users\Administrator\Desktop\Fetch-Taobao\FetchGoods.class.p hp on line 59 PHP Stack trace: PHP 1. {main}() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:0 PHP 2. FetchGoods->run() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:33 PHP 3. FetchGoods->fetchOneShop() C:\Users\Administrator\Desktop\Fetch-Taobao\FetchGoods.class.php :50 Notice: Undefined index: host in C:\Users\Administrator\Desktop\Fetch-Taobao\FetchGoods.class.php on line 59 Call Stack: 0.0010 127528 1. {main}() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:0 0.0068 192584 2. FetchGoods->run() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:3 3 0.0068 193152 3. FetchGoods->fetchOneShop() C:\Users\Administrator\Desktop\Fetch-Taobao\Fe tchGoods.class.php:50 shop_url:'http://shop65262430.taobao.com' ... start_time:04-29 15:19:11 ... start! PHP Fatal error: Call to undefined function curl_init() in C:\Users\Administrator\Desktop\Fetch-Tao bao\HttpFetch.class.php on line 127 PHP Stack trace: PHP 1. {main}() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:0 PHP 2. FetchGoods->run() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:33 PHP 3. FetchGoods->fetchOneShop() C:\Users\Administrator\Desktop\Fetch-Taobao\FetchGoods.class.php :50 PHP 4. HttpFetch->get() C:\Users\Administrator\Desktop\Fetch-Taobao\FetchGoods.class.php:74 PHP 5. HttpFetch->disguise_curl() C:\Users\Administrator\Desktop\Fetch-Taobao\HttpFetch.class.php: 29 Fatal error: Call to undefined function curl_init() in C:\Users\Administrator\Desktop\Fetch-Taobao\H ttpFetch.class.php on line 127 Call Stack: 0.0010 127528 1. {main}() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:0 0.0068 192584 2. FetchGoods->run() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:3 3 0.0068 193152 3. FetchGoods->fetchOneShop() C:\Users\Administrator\Desktop\Fetch-Taobao\Fe tchGoods.class.php:50 0.0215 197880 4. HttpFetch->get() C:\Users\Administrator\Desktop\Fetch-Taobao\FetchGoods.c lass.php:74 0.0215 197896 5. HttpFetch->disguise_curl() C:\Users\Administrator\Desktop\Fetch-Taobao\Ht tpFetch.class.php:29 PHP Warning: unlink(/tmp/fetchgoods.pid): No such file or directory in C:\Users\Administrator\Deskt op\Fetch-Taobao\fetch.php on line 15 PHP Stack trace: PHP 1. {main}() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:0 PHP 2. FetchGoods->run() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:33 PHP 3. FetchGoods->fetchOneShop() C:\Users\Administrator\Desktop\Fetch-Taobao\FetchGoods.class.php :50 PHP 4. HttpFetch->get() C:\Users\Administrator\Desktop\Fetch-Taobao\FetchGoods.class.php:74 PHP 5. HttpFetch->disguise_curl() C:\Users\Administrator\Desktop\Fetch-Taobao\HttpFetch.class.php: 29 PHP 6. removePidFile() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:0 PHP 7. unlink() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:15 Warning: unlink(/tmp/fetchgoods.pid): No such file or directory in C:\Users\Administrator\Desktop\Fe tch-Taobao\fetch.php on line 15 Call Stack: 0.0010 127528 1. {main}() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:0 0.0068 192584 2. FetchGoods->run() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:3 3 0.0068 193152 3. FetchGoods->fetchOneShop() C:\Users\Administrator\Desktop\Fetch-Taobao\Fe tchGoods.class.php:50 0.0215 197880 4. HttpFetch->get() C:\Users\Administrator\Desktop\Fetch-Taobao\FetchGoods.c lass.php:74 0.0215 197896 5. HttpFetch->disguise_curl() C:\Users\Administrator\Desktop\Fetch-Taobao\Ht tpFetch.class.php:29 0.0342 194016 6. removePidFile() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:0 0.0342 194128 7. unlink() C:\Users\Administrator\Desktop\Fetch-Taobao\fetch.php:15 C:\Users\Administrator\Desktop\Fetch-Taobao> |
7
andyhu 2014-05-01 04:45:49 +08:00
mark关注下,不过采集这东西用php有点太痛苦了
|
10
hanchengluo 2014-05-03 10:22:41 +08:00
@andyhu 我也是用PHP采集的,2G数据用了差不多一个月时间,有更好的推荐吗?
|
11
andyhu 2014-05-03 10:43:43 +08:00
@hanchengluo 试下node.js+request+cheerio吧,我其实工作中是用PHP的,但如果有需要抓取远程页面这种工作,用完这个组合以后再回去PHP会觉得非常痛苦
|
13
hanchengluo 2014-05-03 10:52:18 +08:00
|
14
andyhu 2014-05-03 10:57:42 +08:00
html parsing也浪费时间,另外php不支持多线程,每个请求都要等待很慢的。数据库我用的是mongodb,速度还是很快的
|
15
andyhu 2014-05-03 11:01:49 +08:00
@hanchengluo 刚才看了您的网站,网页快照用的是什么啊?是phantomjs搞定的吗?node有个thumbbot比较强悍,可以通吃网页 图片 视频缩略图预览。不过是基于phantomjs的,如果需要截取带flash的界面,估计还是要用特殊定制的版本才行,老版的phantomjs已经不支持flash了。总体感觉抓取这东西,php和node.js毫无可比性。python都比php好用很多,也有不少专业的爬虫模块
|
16
hanchengluo 2014-05-03 11:14:40 +08:00
@andyhu 多谢光临,我就只用PHP下面的CI,对JS也不熟。以前想搞个爬虫,想学下GoLang,但没坚持,还是用php了,人老了,学不动了。准备将网站改成一个小门户,还在构思中,没采集又没资料,但又怕采集被K。
|
17
laodao 2014-05-03 12:14:27 +08:00
|
18
ym1623 2014-09-03 14:30:13 +08:00
我发现你这个项目不行啊,,一样会被天猫拦截到...
|