Basic web scraping using Goutte and Symfony DomCrawler
loader image
Quick Tips

Basic web scraping using Goutte and Symfony DomCrawler

Here, I am going to explain how to perform basic web scraping using Goutte and Symfony DomCrawler, and how to get machine-readable information from Web pages by way of Web scraping. Currently, most of the API documentation process is not written by hand, and such documentations are generated by tools meant for this purpose. There are several tools available in the market for API document generation such as PHPDocumentor or Sami (these are more popular and reliable).

Now, interestingly, we will reverse this process of creating documentation from code, and thereby generate code from documents!

Required Installation

Before going to use DomCrawler, obviously, you need to install it:


Only after successful installation can we be able to use the Symfony DomCrawler, since Symfony DomCrawler uses the service of Goutte.

Now, start a simple DomCrawler to find the available links from the web page.

Add the below lines above the class name of the file – src/AppBundle/Controller/DefaultController.php


Add the below lines in the bottom of all the methods of the file – src/AppBundle/Controller/DefaultController.php


Here, I have created the new router http://localhost/links for my application (http://localhost is my local domain name) and created one object for Client class and named it as “$client”. Using this object I will call a request method to gather information in that page like the following line


From the line “$crawler->filter(‘a’)->count()” we can find HTML <a> tag count in the particular page (

Therefore, similarly, from this line “$crawler->filter(‘a’)->links()” we can get the all the links form the particular page.

Similarly, again, from the line “$link->getURI()” we can get each of the links of the particular page.


The above example shows how to extract all the links from the HTML document and save them in an array as ‘$all_links’. Likewise, we can extract several data from the particular web page.

In fact, many more powerful activities can be performed and code be extracted. For instance, in the above example, we can even travel into all the pages from the links present, and find many more information as required. I will handle more such extraction performances with different examples in future blogs. Try it out for yourself…

The following two tabs change content below.
Vignesh Thandapani

Vignesh Thandapani

An enthusiastic Tech Lead with 6 plus years of experience in Web development arena. Owns legitimate experience in CorePHP, Laravel, Symfony, CakePHP, Wordpress, Joomla. Behalf, a young Aspiring "Travel admirer" craves to live with Nature.

4 thoughts on “Basic web scraping using Goutte and Symfony DomCrawler”

Leave a Reply

Your email address will not be published. Required fields are marked *