Agiratech Com

info@agiratech.
com +1 888 50 AGIRA (+1 888-502-4472)
/ Basic web scraping using Goutte and Symfony

Home / BlogDomCrawler
Corporate Blog
07 Basic web scraping using Goutte and Symfony

03 DomCrawler
Vignesh Thandapani Standard 0 comment
Share Tweet Share Share Share Mail 5 Total

Shares
Here, I am going to explain how to perform basic web scraping using Goutte and Symfony DomCrawler, and how to get
machine-readable information from Web pages by way of Web scraping. Currently, most of the API documentation
process is not written by hand, and such documentations are generated by tools meant for this purpose. There are several
tools available in the market for API document generation such as PHPDocumentor or Sami (these are more popular and
reliable).
Now, interestingly, we will reverse this process of creating documentation from code, and thereby generate code from
documents!
Required Installation
Before going to use DomCrawler, obviously, you need to install it: https://github.com/FriendsOfPHP/Goutte
composer require fabpot/goutte
Only after successful installation can we be able to use the Symfony DomCrawler, since Symfony DomCrawler uses the
service of Goutte.
Now, start a simple DomCrawler to find the available links from the web page.
Add the below lines above the class name of the file src/AppBundle/Controller/DefaultController.php
use Goutte\Client;
use Symfony\Component\DomCrawler\Crawler;
Add the below lines in the bottom of all the methods of the file src/AppBundle/Controller/DefaultController.php
/**
* @Route("/links", name="crawler")
*/
public function crawlerAction()
{
$url = "http://www.agiratech.com";
$client = new Client();
$crawler = $client->request('GET', $url);
$links_count = $crawler->filter('a')->count();
$all_links = [];
if($links_count > 0){
$links = $crawler->filter('a')->links();
foreach ($links as $link) {
$all_links[] = $link->getURI();
}
$all_links = array_unique($all_links);
echo "All Avialble Links From this page $url Page<pre>"; print_r($all_links);echo "</pre>";
} else {
echo "No Links Found";
}
die;
}
Here, I have created the new router http://localhost/links for my application (http://localhost is my local domain name)
and created one object for Client class and named it as $client. Using this object I will call a request method to gather
information in that page like the following line
$crawler = $client->request('GET', $url).
From the line $crawler->filter(a)->count() we can find HTML <a> tag count in the particular page
(http://www.agiratech.com).
Therefore, similarly, from this line $crawler->filter(a)->links() we can get the all the links form the particular page.
Similarly, again, from the line $link->getURI() we can get each of the links of the particular page.
Conclusion
The above example shows how to extract all the links from the HTML document and save them in an array as $all_links.
Likewise, we can extract several data from the particular web page.
In fact, many more powerful activities can be performed and code be extracted. For instance, in the above example, we
can even travel into all the pages from the links present, and find many more information as required. I will handle more
such extraction performances with different examples in future blogs. Try it out for yourself
Posted by Vignesh Thandapani
0 Comment
Leave a comment
Your email address will not be published.
Name
Email
Website
Your Comment...
Send Comment
Search ...
Recent Posts
Basic web scraping using Goutte and Symfony DomCrawler
Basic Implementation of Angular2 using Angular CLI
Importance of Manual Testing for Start-ups
Rails Refactoring Techniques Concerns
Guide to use the Node Package Manager (npm)
Archives
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
August 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
Categories
Amazon Web Services (AWS)
AngularJs
API
Big Data
Code Study
DevOps
Docker
GitHub
Go
Golang
Javascript FrameWorks
Laravel
Management
Metrics
mobile application development
Non-Technical
PostgreSql
ReactJS
Ruby
Ruby on Rails
SocialMedia
Standard
Technical
Tips & Tricks
Unix
Web Development
WordPress
Contact
Us
Email : info@agiratech.com
INDIA : +91 44 4357 4451
USA: +1 888 50 AGIRA (+1 888-502-4472)
Terms of Service
Privacy policy
copyright 2015 - 2016 Agira Technologies
CHENNAI
INDIA
Agira Technologies Pvt Ltd,
#42/32, 4th Floor, Gee Gee Complex,
42, Anna Salai,
Chennai - 600 002, India.
Social
Media

Agiratech Com

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Agiratech Com

Hochgeladen von

Copyright:

Verfügbare Formate

info@agiratech.

com +1 888 50 AGIRA (+1 888-502-4472)

/ Basic web scraping using Goutte and Symfony

07 Basic web scraping using Goutte and Symfony

Share Tweet Share Share Share Mail 5 Total

composer require fabpot/goutte

$crawler = $client->request('GET', $url).

Posted by Vignesh Thandapani

Your email address will not be published.

Basic Implementation of Angular2 using Angular CLI

Importance of Manual Testing for Start-ups

Rails Refactoring Techniques Concerns

Guide to use the Node Package Manager (npm)

mobile application development

Tips & Tricks

copyright 2015 - 2016 Agira Technologies

Das könnte Ihnen auch gefallen