top of page

Web crawler - a case of Amazon

Writer's picture: Philip LaiPhilip Lai
Briefing

This project is to develope a crawler collecting info from Amazon so manager can on time review the market changing.


Key Challenge

The main issue in this project is Amazon's anti-crawler feature. Although we can add info into header pretenting as a user to avoid its anti-crawler, Amazon is changing, it takes us a lot efforts to maintain the program.

Once we pass and download content from website successfully, using "findall" to collect required text, e.g. number, price and model name, from data that Beautifulsoup converted.

By making a loop with the linkage of "next page", we can download content page by page.

Another issue is about model name as it contain too much redundant info or even incorrect info, sometimes we just can not split it or recognize it perfectly.

Summary

This project wasn't proceed to the analysis step as company afterward obtained sales data from Amazon backend by regist an Amazon account. Also, due to the program take efforts to maintain, all relative requirement were gether to MIS.




10 views0 comments

Recent Posts

See All

コメント


bottom of page