Impression memo for Portia [Scrapy/Scraping tool]


Visual scraping with Portia is Scrapying tool. Non-engineer can scrape Web service regularly.

πŸ‘½ How to configure scrapying rule

  • Go to https://app.scrapinghub.com/
  • Login(Register new user)
  • Create Portia project
  • Create spider and configure how to scrape a field of a item by UI

Personal Comment

  • It can scrape a HTML at a decent level.
  • It is not good sometimes by configuring XPath manually, but it is possible to support some processes(e.g. using Lambda after scraping)

Issues to use Portia

  • It need to set DEPTH_LIMIT in settings menu. If don’t do that, the spider scrapes a lot of pages.
  • It is for scraping only HTML page. If you want to scrape RSS page, you need to add other processes.

🐝 References

πŸ–₯ Recommended VPS Service

VULTR provides high performance cloud compute environment for you. Vultr has 15 data-centers strategically placed around the globe, you can use a VPS with 512 MB memory for just $ 2.5 / month ($ 0.004 / hour). In addition, Vultr is up to 4 times faster than the competition, so please check it => Check Benchmark Results!!