Visual scraping with Portia is Scrapying tool. Non-engineer can scrape Web service regularly.
π½ How to configure scrapying rule
- Go to https://app.scrapinghub.com/
- Login(Register new user)
- Create Portia project
- Create spider and configure how to scrape a field of a item by UI
Personal Comment
- It can scrape a HTML at a decent level.
- It is not good sometimes by configuring XPath manually, but it is possible to support some processes(e.g. using Lambda after scraping)
Issues to use Portia
- It need to set
DEPTH_LIMIT
in settings menu. If donβt do that, the spider scrapes a lot of pages. - It is for scraping only HTML page. If you want to scrape RSS page, you need to add other processes.
π References
π₯ Recommended VPS Service
VULTR provides high performance cloud compute environment for you.
Vultr has 15 data-centers strategically placed around the globe, you can use a VPS with 512 MB memory for just $ 2.5 / month ($ 0.004 / hour).
In addition, Vultr is up to 4 times faster than the competition, so please check it => Check Benchmark Results!!