Php

Fetching Data from Another Site with Php: Covid-19 Data Table

In some cases, there may be a need to fetch data from other sites and publish it in format that we desire. It is quite easy to do this with php codes. However, the following points should be emphasized:

  • You need to edit the data fetching code you write in accordance with the html structure of the site where you will receive the data.
  • The structure of the website where you will pull data can force you to develop different algorithms.
  • The structure of the website from which you will fetch data may force you to use different features or functions of Php.
  • The website from which you will fetch data may not allow this.
  • It is often necessary to indicate the source of data you captured  for copyrights.

There are many ways for fetching data from another site with php. We are going to do this in an extremely simple way here. 

In this sample, data we will capture will be country-based data from Covid-19. These data include country flag, country name, confirmed case, number of cases per 1 million people, number of healers and number of deaths. Here is the link we will fetch the data from:

https://news.google.com/covid19/map?hl=en&gl=EN&ceid=En:en

This site publishes the data it receives from wikipedia in its own format. First we will pull raw data from this site after we will get the data we want. While doing this, we will work like a surgeon. Let's create php code step by step.

Step 1: Creating client with Curl and fetching raw data

Curl is a large service that allows you to perform various client operations in Php and contains various functions. You can normally fetch data with functions like file_get_contents(). But with curl functions are better to fetch data from another site. So by creating a client with curl with the following block of code. First we assign html codes in the target url to a variable completely. In other steps, we will tweeze the data we want via using some functions.

<?php
$client = curl_init();
curl_setopt($client, CURLOPT_URL, 'https://news.google.com/covid19/map?hl=en-US&gl=US&ceid=US:en');
curl_setopt($client, CURLOPT_RETURNTRANSFER, 1);
$raw_data = curl_exec($client);
echo $raw_data;
?>

With this code, we have defined a client with curl. Then we set the target url and set CURLOPT_RETURNTRANSFER property to 1 so that the incoming data can be transferred to the variable. By running the client, we passed all the html code in the target url to $raw_data variiable. When we run this code, the result will be an exact copy of full html codes in the destination URL. We need to follow the steps below to get Covid-19 table data.

Step 2: Trimming Covid-19 table rows with the preg_match_all() function

Now it's time for the surgical operation. We will use preg_match_all() function to pull the rows of Covid-19 table from raw data. For this we need to determine the pattern of the html table. Right click and view source in target url for determining texture like below. 


As in the picture above, when source code is examined, texture of  the rows of Covid-19 table is lik this:

<tr class="sgXHf YvL7re>...</tr>

We will use this texture in preg_match_all() function for specifying the rows of Covid table in $raw_data variable. And specified datas will be transferred to $rows variable.

<?php
  $client = curl_init();
  curl_setopt($client, CURLOPT_URL, 'https://news.google.com/covid19/map?hl=en-US&gl=US&ceid=US:en');
  curl_setopt($client, CURLOPT_RETURNTRANSFER, 1);
  $raw_data = curl_exec($client);
  preg_match_all('#<tr class="sgXwHf  YvL7re">(.*?)</tr>#si', $raw_data, $rows);
  echo $rows;
?>

In preg_match_all() function, we place the texture between #...#si for defining textures start and end. We mean everything that comes together with (. *?).

If you run this code, you will see that only "array" text is written on the screen. Because  preg_match_all() function converts the $rows variable to a 2-dimensional array and transfers data. When you print such variables with echo, you only see "array" text. In next steps, we will fetch these values ​​from array and print them separately.

Step 3: Fetching the values ​​from array separately with foreach loop

So far, we have passed the rows we wanted in $raw data to $rows array variable. But we're not done. For getting data in $rows array separately and process it as we want, we will insert $rows array variable into the foreach loop. Thus, we can process each data in the array as we want.

You can use other types of loops for this process. But the foreach loop is the perfect fit for this kind of array operation. It allows us to do our job quite pratically. Now let's perform the event through code below.

<?php
  $client = curl_init();
  curl_setopt($client, CURLOPT_URL, 'https://news.google.com/covid19/map?hl=en-US&gl=US&ceid=US:en');
  curl_setopt($client, CURLOPT_RETURNTRANSFER, 1);
  $row_data = curl_exec($client);

  preg_match_all('#<tr class="sgXwHf  YvL7re">(.*?)</tr>#si', $row_data, $rows);
  foreach ($rows[0] as $row) {

     preg_match('#<th class="l3HOY" scope="row" role="rowheader" data-n-hod="1">(.*?)</th>#si', $row, $header);
     preg_match_all('#<td class="l3HOY">(.*?)</td>#si', $row, $datas);
      echo strip_tags($header[0],'<img>')."-";
      echo $datas[0][0]."-".$datas[0][1]."-".$datas[0][2]."-".$datas[0][3]."<br>";
  }

  curl_close($client);
?>

In this code above foreach loop was used to fetch data in 2-dimensional $rows array.

The preg_match() function in the loop is used to fetch  country flag and name at the beginning of row.

The preg_match() function only gets the first match. It is sufficient to use the preg_match() function instead of preg_match_all() since there is only 1 flag and country name cell in each row we receive. The carriage data extracted with preg_match has been processed with the strip_tags() function for freeing from html tags. Including tag <img> in strip_tags() is allowed for getting country flags.

Other datas in a row are confirmed cases, number of cases per 1 million people, recovery numbers, numerical data in. Therefore, the preg_match_all() function is used again in the foreach loop to pull them. These datas were transferred in two dimensioned $datas variable and each data was printed with echo as above. You can use the strip_tags() function to simplify datas here.

When we ran this code block, we got the Covid-19 table data simply as follows:


Click here to reach this page.

WARNING: If the destination URL structure changes, you need to update your php code.

388 views
COMMENTS