html – Web scraping in PHP-ThrowExceptions

Exception or error:

I’m looking for a way to make a small preview of another page from a URL given by the user in PHP.

I’d like to retrieve only the title of the page, an image (like the logo of the website) and a bit of text or a description if it’s available. Is there any simple way to do this without any external libraries/classes? Thanks

So far I’ve tried using the DOCDocument class, loading the HTML and displaying it on the screen, but I don’t think that’s the proper way to do it

How to solve:

I recommend you consider simple_html_dom for this. It will make it very easy.

Here is a working example of how to pull the title, and first image.

require 'simple_html_dom.php';

$html = file_get_html('');
$title = $html->find('title', 0);
$image = $html->find('img', 0);

echo $title->plaintext."<br>\n";
echo $image->src;

Here is a second example that will do the same without an external library. I should note that using regex on HTML is NOT a good idea.

$data = file_get_contents('');

preg_match('/<title>([^<]+)<\/title>/i', $data, $matches);
$title = $matches[1];

preg_match('/<img[^>]*src=[\'"]([^\'"]+)[\'"][^>]*>/i', $data, $matches);
$img = $matches[1];

echo $title."<br>\n";
echo $img;


You may use either of these libraries. As you know each one has pros & cons, so you may consult notes about each one or take time & try it on your own:

  • Guzzle: An Independent HTTP client, so no need to depend on cURL, SOAP or REST.
  • Goutte: Built on Guzzle & some of Symfony components by Symfony developer.
  • hQuery: A fast scraper with caching capabilities. high performance on scraping large docs.
  • Requests: Famous for its user-friendly usage.
  • Buzz: A lightweight client, ideal for beginners.
  • ReactPHP: Async scraper, with comprehensive tutorials & examples.

You’d better check them all & use everyone in its best intended occasion.


You can use SimpleHtmlDom for this. and then look for the title and img tags or what ever else you need to do.


I like the Dom Crawler library. Very easy to use, has lots of options like:

$crawler = $crawler
->filter('body > p')
->reduce(function (Crawler $node, $i) {
    // filters every other node
    return ($i % 2) == 0;

Leave a Reply

Your email address will not be published. Required fields are marked *