Web Scraping with PHP – How to Crawl Web Pages Using Open Source Tools

Web scraping lets you collect data from web pages across the internet. It’s also called web crawling or web data extraction. PHP is a widely used back-end scripting language for creating dynamic websites and web applications. And you can implement a web scraper using plain PHP code.

composer require fabpot/goutte

scrap.php

<?php

require 'vendor/autoload.php';

use Goutte\Client;

// Create a new Goutte client
$client = new Client();

// Specify the URL to scrape
$url = 'https://ndtv.in';

// Fetch the HTML content of the page
$crawler = $client->request('GET', $url);

// Extract the title
$title = $crawler->filter('.crd_lnk')->text();

// Output the title
echo "Title: $title\n";

?>

scrapping in loop

<?php

require 'vendor/autoload.php';

use Goutte\Client;

// Create a new Goutte client
$client = new Client();

// Specify the URL to scrape
$url = 'https://ndtv.in';

// Fetch the HTML content of the page
$crawler = $client->request('GET', $url);

// Extract the title
// $title = $crawler->filter('.crd_lnk')->text();

$crawler->filter('.crd_lnk')->each(function ($node) {
print $node->text()."\n";
});


?>

$news = $crawler->filter("<headline's selector>")->text();
$link = $crawler->selectLink($news)->link();
$crawler = $client->click($link);
$link = $node->filter('a')->attr('href'); 

$crawler->filter('.crd_img-full > a > img')->each(function ($node) {
print $node->attr('src')."\n";
});

$client = new Client();
$crawler = $client->request('GET', 'https://github.com/');
$crawler = $client->click($crawler->selectLink('Sign in')->link());
$form = $crawler->selectButton('Sign in')->form();
$crawler = $client->submit($form, ['login' => 'your email', 'password' => 'your password']);
$h1 = $crawler->filter("h1")->text();

echo($h1."\n");

request(): sends a request to the specified URL and returns an object that represents the HTML content of a web page
selectLink(): selects a link with a particular condition on a web page
link(): returns a link from a specific HTML element on a web page
click(): performs a click action on a selected link on a web page
text(): prints the text content presented on an HTML element
filter(): selects only HTML elements with specific values such as class name, ID, and tags
selectButton(): selects a form with a button that has a specific label
submit(): submits data to a form object with specific form data

Scrap and save into csv formate

<?php 

// Include the required autoload file 
require 'vendor/autoload.php'; 

// Import the Goutte client class 
use Goutte\Client; 

// Create a new instance of the Goutte client 
$client = new Client(); 

// Define the URL of the web page to scrape 
$url = "https://news.ycombinator.com/"; 

// Send a GET request to the URL and retrieve the web page 
$crawler = $client->request('GET', $url); 

// Create an empty array to store the extracted data 
$data = []; 

// Filter the DOM elements with class 'titleline' and perform an action for each matched element 
$crawler->filter('.titleline')->each(function ($node) use (&$data) { 

// Extract the title text from the node 
$title = $node->text(); 

// Extract the link URL from the node 
$link = $node->filter('a')->attr('href'); 

// Add the title and link to the data array 
$data[] = [$title, $link]; 
}); 

// Specify the directory path where you want to save the CSV file 
$directory = 'data/'; 

// Specify the CSV file path 
$filePath = $directory . 'scraped_data.csv'; 

// Create a CSV file for writing 
$csvFile = fopen($filePath, 'w'); 

// Write headers to the CSV file 
fputcsv($csvFile, ['Title', 'Link']); 

// Write each row of data to the CSV file 
foreach ($data as $row) { 
// Write a row to the CSV file 
fputcsv($csvFile, $row); 
} 

// Close the CSV file 
fclose($csvFile);

Tags: basics, coding, how do web scrapers work, how to, introduction, php, phpcrawling, python, python programming, python programming basics, python programming for beginners, python programming tutorial, python projects, python tutorial, python web scraping, raspberry pi, raspberry pi projects, software development for beginners, tinker, tinkernut, tutorial, web scraper, web scraping, web scraping tutorial, web scraping with python, webcrawling, webscrapping, weekend hacker, what are web scrapers used for, what is web scraping

Web Scraping with PHP – How to Crawl Web Pages Using Open Source Tools

Web Scraping with PHP – How to Crawl Web Pages Using Open Source Tools

Leave a Reply Cancel reply

About Us

Useful Links

Most Visited Categories

Social Media

© Copyright 2024 pktechnology - All Rights Reserved

Cook with me Share this content

You Might Also Like

PHP – Class Constants

Push notification in firebase using php/laravel

Session in php

Leave a Reply Cancel reply

About Us

Useful Links

Most Visited Categories

Social Media

© Copyright 2024 pktechnology - All Rights Reserved

Share this content