Implementing Web Scraping In Python Using Scrapy


What Is Scrapy

Scrapy is an application framework which will act like a web crawler that mainly used to extract the data from the website. Today, our topic is very much bound to explore about Scrapy hence we’re going to implement web scrapping in Python using Scrapy in our project.


This blog will hopefully cover the following topics :


  1. How To Install Scrapy
  2. Create A Scrapy Project
  3. Export Scraped Data As CSV


Scrappy will only run on python 2.7 and python 3.4 or run above. If you’re using Anaconda, you can install the package from the conda-forge channel packages on Linux, Windows and OS X.


How To Install Scrapy:

You can install scrappy either using conda or if you’re familiar with the installation of Python packages, you can install Scrapy and its dependencies from PyPI itself.


Install Scrappy Using Anaconda


conda install -c conda-forge scrapy


Install Scrapy Using PyPI


pip install Scrapy


Install Scrapy On Ubuntu 14.04 Above


Ubuntu 14.04 and above, If you install scrapy on Ubuntu systems, you need to install these dependencies:


sudo apt-get install python-dev python-pip libxml2-dev libxslt1-dev zlib1g-dev libffi-dev libssl-dev


Install Scrapy On Python

If you want to install Scrapy on Python 3, you’ll also need Python 3 development headers:


sudo apt-get install python3 python3-dev


Inside a virtualenv, you can install Scrapy with pip :


pip install scrapy


Create A Scrapy Project

Before you start scrapping, we need to create our scrappy project. Now, switch to the desired directory where we should run the scrapy project.


scrapy startproject project_name


This will create the following directory structure:



scrapy.cfg         # deploy configuration file

project_name/          # project's Python module, you'll import your code from here       # project items definition file # project middlewares file   # project pipelines file    # project settings file

    spiders/       # a directory where you'll later put your spiders


The two most important files we should consider are: – This file will hold all the settings you have set for your project.

spiders/ – This folder will store all your custom spiders used in the project. 



Related : Introduction To Web Scraping With Node JS



Create A Scrapy Spider :

Spiders are the classes which you define and that Scrapy uses to scrape information from a website (or a group of websites).


Here’s the code for a spider that scrapes famous quotes from website, following the pagination:


import scrapy

class QuotesSpider(scrapy.Spider):

name = 'quotes'

start_urls = [



def parse(self, response):

    for quote in response.css('div.quote'):

        yield {

            'text': quote.css('span.text::text').get(),

            'author': quote.xpath('span/small/text()').get(),


    next_page = response.css(' a::attr("href")').get()

    if next_page is not None:

        yield response.follow(next_page, self.parse)


The Spider subclasses scrapy.Spider and defines some attributes and methods:


Name: which indicates the spider, the name must be unique in the project and we can’t assign the same name to another file.


start_requests(): return our request in an iterative way so when the crawl begins then our request will be processed successively from the initial request to end.


parse(): This method is mainly called to handle our response in download, based on our “request.Response” method is an instance of TextResponse that holds the page content.


Other side, The parse() method will also parse the response and extract the crawled data as dicts & finds new URLs to follow and creating new requests (Request) from them.


How To Run Spider From Scrapy


To make your spider work, go to the project’s top level directory and run:


scrapy crawl quotes


This command will run the spider and generate following output,


... (omitted for brevity)

2016-12-16 21:24:05 [scrapy.core.engine] INFO: Spider opened

2016-12-16 21:24:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

2016-12-16 21:24:05 [scrapy.extensions.telnet] DEBUG: Telnet console listening on

2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (404) <GET> (referer: None)

2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET> (referer: None)

2016-12-16 21:24:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET> (referer: None)

2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-1.html

2016-12-16 21:24:05 [quotes] DEBUG: Saved file quotes-2.html

2016-12-16 21:24:05 [scrapy.core.engine] INFO: Closing spider (finished)


Also Read: Writing a web crawler with Scrapy and Scrapinghub


Export Scraped Data As CSV :

We can still extract all the data in the command line but it is always good to export the scraped data in various formats like CSV, Excel, JSON, etc. This saves lots of our time and also can be imported into programs else wherever we want. To make this process even easier, Scrapy provides the functions called “nifty” which allows you to export the downloaded content in various formats.


To do that, just add the following code block in file:


#Export as CSV Feed


FEED_URI = "your csv name.csv"


That’s all guys! we have successfully exported the data as CSV. Now we know to implement web Scraping Using Scrapy.


Contact form 7 Mailchimp extension by Renzo Johnson - Web Developer