php – Clean OO-structure vs. SQL performance-ThrowExceptions

Exception or error:

When programming in PHP I always try to create meaningful ‘models’ (classes) that correspond to tables in the database. I often encounter the following problem:

Assuming that I’ve created a database with two tables: authors and blogs, which both have a corresponding model in my application.

Let’s say I want to print all the blogs along with information about the author, I’d have to do something like this:

<?php
foreach ($app->getBlogs() as $blog) {
  echo "<h1>" . $blog->title . "</h1>";
  echo "Written by" . $blog->getAuthor()->name . "</p>";
  // ... et cetera
}
?>

The problem is that the application will now fire 1 SQL query to get all the blog items, and [number of blog items] queries to get the information for every author. Having used straightforward SQL I could have retrieved this information using a simple query:

SELECT * FROM blogs
JOIN authors ON authors.id = blogs.author

What’s the best way of dealing with such issues: developing an object-oriented application without executing too many useless SQL queries.

How to solve:

IMO, I think you should just write another class that will encapsulate what you have. Does it always make sense for a blog to have an author? Does every author have a blog? Can an author have multiple blogs? Think about these issues, then design a class that will encapsulate this. Remember, typical database schemas are not OO… they are relational. Yes, they are close, but there are subtle differences.

So if an author can have multiple blogs, you can have a key to multivalued class of some sort (with the key based on author id) and you can initialize or load this class with one SQL call. Just some things to think about.

Answer:

Unless you know for a fact that the inefficient sql operations will have no real impact, such as the number of redundant iterations or the rows affected will always be small (e.g. iterating an operation over the number of children in a family, which, short of very rare cases like the Duggars, can be relied on to be less than 10), I have always favored efficiency of the relational query over beauty of the OO code.

Although ugly OO code can make maintenance a pain, inefficient data access can bring a system to its knees, usually when you’re on vacation or trying to sleep. And most of the time, you can find a good compromise that makes the most efficient SQL operations have a reasonably “objecty” interface. It may cost you a bit more time when it comes to refactoring or adding features if your object model isn’t beautiful, but it is costing your customers time every single time they push that button (or money in terms of bigger hardware to run the app – never a good method of optimization), and the man hours spent using the app should far outstrip the man hours put into developing it (one would hope).

As far as your concerns about whether an interface will be needed (forcing you to figure out all possible consumption patterns), I have dealt with this by doing all my data modification via stored procedures, but allowing data access to go straight against the tables & views by giving all users select privileges only. This is a semi-controversial position, as many people would like to lock out all data access operations from downstream consumers in the name of ensuring that all sql being run is to their standards. But new ways of looking at the data are always coming up, and if you have to add a new stored proc, update your core class libraries and update your client code every time someone wants to implement a new feature, deployment and qualification can grow to be a real burden – far more than having to deal with an object model that doesn’t fit a religious ideal. And it’s a lot easier to implement a code inspection process that verifies that new select statements written by downstream consumers are kosher.

Answer:

I’m a huge ORM advocate, and here’s my weigh-in:

It’s okay to trade an inperceptible amount of application performance for a ton of developer performance. Servers are extremely powerful these days and that extra iron gives us some new flexibility.

That said, if you do something silly that obliterates the user experience by bringing the server to its knees, that’s no longer okay. If you had a million authors in your example, pulling them all down along with all of their fields and iterating through them would be unwise. If you had only 20 authors, then it’s no big deal.

In the case of huge datasets and expensive batch operations, even as an ORM guy, I have to optimize and write special sprocs or SQL statements just for that case. And I have to be careful not to write my code in such a way that I hammer the database would it would be better to use a caching pattern where I pull down a large data set and then work off of that.

This is a big on-going debate, but to me it’s just a matter of understanding that you can’t solve every problem with a single tool.

Answer:

Things like this are exactly what creating your own data layer should solve for you. In your model for your blogs, there should be a function like getBlogList() that will return the blog titles and author’s name all in one query.

Answer:

Whatever solution you use, ORM or not, should be capable if issuing a single select in this case, and also it should be capable of selecting only the necessary columns. Then from that join it should be capable of populating the authors objects with corresponding lists of blogs per author. Having to issue multiple SQL is wasteful.

Answer:

Propel is an example of a PHP ORM which can cope with this. I’m sure Doctrine must be able to do so, though I’ve never looked at it.

Why reinvent the wheel?

Answer:

I used some other bindings.. Example:

<?php
$blogs = $app->getBlogs();
$blogs->getAuthor();
foreach ($blogs as $blog) {
  echo "<h1>" . $blog->title . "</h1>";
  echo "Written by" . $blog->getAuthor()->name . "</p>";
  // ... et cetera
}
?>

-> getAuthor() call on $blog queries the DB only once, and using special object for array, the getAuthor() call is called on each (but, is somehow optimized to run only as one query).

Answer:

You’ve already answered the question:

Having used straightforward SQL I could have retrieved this information using a simple query

You have a choice between SQL that fetches only blog posts and SQL that fetches blog posts and authors. Likewise you have a choice between some PHP code that fetches just blog posts or PHP code that fetches blog posts and authors. You have to make a choice about your PHP code, just as you have to make a choice about your SQL.

There are plenty of examples above which demonstrate how this would work in practice. The recommendation to use Doctrine or Propel is also a good one.

Answer:

Consider Command/Query Separation as described by Greg Young and Martin Fowler. Your query model can have Blog and Author de-normalized into a single table optimized for retrieving DTOs for your presentation layer.

Greg Young has a great presentation on CQS on InfoQ.

Answer:

This is the classic ORM problem. Many many schools of thought. Not sure of php specifics, but there are several strategies to resolve this ‘impedence mismatch.’ Google orm.

Answer:

one way to do this is to create a view qith your join in it and map view results to another class which contains data for blog and author.

Answer:

Honestly, just create a method on your Blog class called getBlogsWithAuthors() and then run

SELECT  *
FROM    blogs
JOIN    authors 
        ON authors.id = blogs.author

I know it may seem like a pain to write stuff like this for each model class, but there is really no other way. You could make it a bit more dynamic, however:

//this is a method of a model class. 
//Assume $this->table is the table name of the model (ie, Blog)
public function getWith($joinTable, $pivot1, $pivot2)
{
    $sql="SELECT    *
            FROM    {$this->table}
            JOIN    $joinTable 
                    ON $pivot1 = $pivot2";

    return executeQuery($sql);    
}

$blog=new Blog();
$result=$blog->getWith('authors', 'authors.id', 'blogs.author');
[play with results here]

Answer:

You could always use memcached as an intermediate layer. Each query would be purely RAM-based, which means you can run as many as you want.

Leave a Reply

Your email address will not be published. Required fields are marked *