Web Scraping and “Elastic Search” it

  • Web Scrape IMDB top 250 movies
  • We will use 3 different languages: Python, JavaScript, Go
  • It is static web page scraping. For dynamic web pages, it requires Selenium, not covered in this article
  • For the information we scrape, we will store in Elastic Search. (This is also a typical use case when you find some useful information without search function provided, you can create your own)
  • A simple front end web page using Vue.js will visualize the search demo

1. Web Scrape using Python, JavaScript and Go

IMDB top 250 movies page, from IMDB.com
IMDB top 250 movies page, screen cap from IMDB.com
The Shawshank Redemption page from IMDB.com
The Shawshank Redemption page screen cap from IMDB.com

2. Store the scrape result in Elastic Search

// get the Elastic Search image
docker pull docker.elastic.co/elasticsearch/elasticsearch:7.6.1
// start Elastic Search container
docker run -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" docker.elastic.co/elasticsearch/elasticsearch:7.6.1
// check container is running ok
curl -X GET "localhost:9200/_cat/nodes?v&pretty"
// bulk index, movies.json can be generated via any scripts of (Go, JavaScript, Python)
curl -X POST "localhost:9200/movie/_bulk" -H 'Content-Type: application/json' --data-binary @movies.json

3. Visualize our search function

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store