Here we will explain how to web scrape some simple data from a webpage
In this tutorial we will extract all the headings ‘h1’ in the webpage http://www.ajarunthomas.com/jquery/
First we will get the webpage content and store it as a DOM document
$html = file_get_contents("http://www.ajarunthomas.com/jquery");
if(!empty($html)){
$aj_dom = new DOMDocument();
$aj_dom->loadHTML($html);
}
Now we will define the xpath which is ‘h1’ since we want to get all the h1 headings
$aj_xpath = new DOMXPath($aj_dom);
$aj_row = $aj_xpath->query('//h1');
Now we will store all the headings to an array
if($aj_row->length > 0){
foreach($aj_row as $row){
$arr[] = $aj_dom->saveXML($row);
}
}
And finally we will display the headings
$y = count($arr);
for($i = 0; $i < $y; $i++){
echo $arr[$i];
}
If you want to exclude the libxml errors on the output page, then
libxml_use_internal_errors(TRUE);
libxml_clear_errors();
Therefore to conclude the whole code looks as below,
<?php
$html = file_get_contents("http://www.ajarunthomas.com/jquery");
libxml_use_internal_errors(TRUE);
libxml_clear_errors();
if(!empty($html)){
$aj_dom = new DOMDocument();
$aj_dom->loadHTML($html);
$aj_xpath = new DOMXPath($aj_dom);
$aj_row = $aj_xpath->query('//h1');
if($aj_row->length > 0){
foreach($aj_row as $row){
$arr[] = $aj_dom->saveXML($row);
$y = count($arr);
}
}
}
for($i = 0; $i < $y; $i++){
echo $arr[$i];
}
?>