Blog entries tagged doctrine :: mwop.net

SQL Nested Queries or Sub Queries with Doctrine DBAL

contact@mwop.net (Matthew Weier O'Phinney) — Mon, 10 Mar 2025 08:24:34 -0500

I recently ran into a problem with my website for which the solution was a nested query (sometimes termed a subquery). However, I use Doctrine DBAL for creating my dynamic queries, and there's no documentation on how to do them.

The problem

For my art gallery, the raw SQL looks something like the following:

SELECT
    p.filename,
    p.description,
    p.created,
    array_agg(DISTINCT t.tag) as tags
FROM
    photos p
LEFT JOIN tags t ON p.filename = t.content_id
WHERE
    p.filename = :filename
    AND t.content_type = :content_type
GROUP BY
    p.filename

I started noticing an odd issue where, post insertion of an image into my gallery, I'd get no error, but the form would not appear to have processed. On further inspection, using Z-Ray from ZendHQ, I realized that the insertion was successful, but that the redirect to view the inserted image was returning a 404. I grabbed the executed SQL from Z-Ray for retrieving the image, and it returned no rows.

I started thinking about why this image wasn't posting, when others were, and realized there was one trivial difference: I wasn't including any hashtags in my description, which meant no tags.

Hopefully you can see where this is leading.

A LEFT JOIN normally will not cause the entire query to fail if it finds no matching rows on the joined table; that's the behavior of INNER JOIN. However, if you put a condition that is based on a joined table outside the join itself, it essentially acts like an INNER JOIN, as this is now a condition of the SELECT query.

Sure enough, when I removed the LEFT JOIN and the array_agg column, I got a hit.

"Obvious" solution: move the condition

As a commenter on this post noted, the immediate solution is to move the AND t.content_type = :content_type clause to the JOIN:

SELECT
    p.filename,
    p.description,
    p.created,
    array_agg(DISTINCT t.tag) as tags
FROM
    photos p
LEFT JOIN tags t ON p.filename = t.content_id AND t.content_type = :content_type
WHERE
    p.filename = :filename
GROUP BY
    p.filename

This does work, and requires no huge changes to the DBAL query builder; I just move the condition into the joinLeft(), and carry on.

Preferred solution: nested query

The solution I chose was to do a nested query, and to aggregate those results as an array. This makes it more clear when reading the query as to the intent: I want to select all distinct tags for this image and assign them as an array to a column. I'm using PostgreSQL, so the query looks like this:

SELECT
    p.filename,
    p.description,
    p.created,
    (SELECT ARRAY(
        SELECT DISTINCT tag FROM tags WHERE p.filename = content_id AND content_type = :content_type
        )) as tags
FROM
    photos p
WHERE
    p.filename = :filename
GROUP BY
    p.filename

With this approach, if no rows are returned from the tags table, an empty array is created; otherwise an array of the tag values that match is returned.

However, I didn't know how to create this query using the Doctrine DBAL query builder.

DBAL solution

When creating a SELECT using the DBAL query builder, you do something like this:

$select = $dbal->createQueryBuilder();
$select
    ->select(
        // one argument per column to select
    )
    ->from('table_name', 't') // alias the table

The arguments to select() are expected to be strings, and any given string can be an arbitrary SQL expression.

Creating the subselect is easy; you do it like any other query:

$tags = $dbal->createQueryBuilder();
$tags
    ->select('tag')
    ->distinct()
    ->from('tags')
    ->where('content_id = p.filename')
    ->andWhere('content_type = :content_type');

Now, how do I get that into a column string for a select?

The getSQL() method of a query builder will spit out the SQL sent. Moreover, it does not replace placeholders, so even if you set a bound parameter, it won't be injected into the generated SQL.

Knowing all this, I did the following:

$tags = $dbal->createQueryBuilder();
$tags
    ->select('tag')
    ->distinct()
    ->from('tags')
    ->where('content_id = p.filename')
    ->andWhere('content_type = :content_type');

$select = $dbal->createQueryBuilder();
$select
    ->select(
        'p.filename',
        'p.description',
        'p.created',
        sprintf('(SELECT ARRAY(%s)) as tags', $tags->getSQL()),
    )
    ->from('photos', 'p')
    ->where('p.filename = :filename')
    ->groupBy('p.filename')
    ->setParameter('filename', $filename, ParameterType::STRING)
    ->setParameter('content_type', 'photo', ParameterType::STRING);

(Where ParameterType is imported from the namespace Doctrine\DBAL.)

This approach worked immediately, and generated exactly the same result as the raw SQL I had tested.

Changelog

2025-03-07: clarified that the solution was targeting the DBAL query builder. DBAL can consume raw SQL as well, and does not require usage of the query builder.
2025-03-10: noted that a LEFT JOIN will still work, as long as the t.content_type = :content_type condition is moved from the SELECT to the LEFT JOIN.

SQL Nested Queries or Sub Queries with Doctrine DBAL was originally published 6 March 2025 on https://mwop.net by Matthew Weier O'Phinney.

Advent 2023: Doctrine DBAL

contact@mwop.net (Matthew Weier O'Phinney) — Mon, 11 Dec 2023 09:21:00 -0600

I've mostly taken database abstraction for granted since I started at Zend. We had a decent abstraction layer in ZF1, and improved it for ZF2. There were a lot quirks to it — you really had to dive in and look at the various SQL abstraction classes to understand how to do more complex stuff — but it worked, and was always right there and available in the projects I worked on.

In the last couple of years, though, we came to the realization in the Laminas Project that we didn't really have anybody with the expertise or time to maintain it. We've marked it security-only twice now, and while we've managed to keep it updated to each new PHP version, it's becoming harder and harder, and whenever there's a CI issue, it's anybody's guess as to whether or not we'll be able to get it resolved.

My alternatives have been straight PDO, or Doctrine DBAL, with the latter being my preference.

Doctrine what?

When most folks who use PHP hear "Doctrine", they immediately think "ORM"; it's how most folks use it, and what it's best known for.

Underlying the ORM is its database abstraction layer (hence "DBAL"). This library exposes an API that will work across any database it supports; this is essentially what zend-db, and later laminas-db, were doing as well. What most folks don't realize is that you can use the DBAL by itself, without the ORM.

Why no ORM?

ORMs are fine. Really. But they add an additional layer of complexity to understanding what you are actually doing. Additionally, if you want to do something that doesn't quite fit how the ORM works, you'll need to drop down to the DBAL anyways. So my take has always been: why not just use the DBAL from the beginning?

So, how does Matthew write code that interacts with the database?

I start by writing value objects that represent discrete aspects of the application. Most of my work will be in consuming or creating these. From there, I write a repository class that I use for purposes of persisting and retrieving them. I can usually extract an interface from this, which aids in my testing, or if I decide I need a different approach to persistence later.

I push the work of mapping the data from the database to these objects, and vice versa, either in the repository, or in the value objects themselves (often via a named constructor). Using these approaches creates lean code that can be easily tested, and for which there's no real need to understand the underlying system; it's all right there in what I've written for the application.

Some gripes about the documentation, and some tips

The Doctrine DBAL docs are a bit sparse, particularly when it comes to its SQL abstraction. And there's no "getting started" or "basic usage" guide. In fact, it's not until the third page within the docs that you get any code examples; thankfully, at that point they give you information on how to get a database connection:

use Doctrine\DBAL\DriverManager;

$connectionParams = [
    'dbname'   => 'mydb',
    'user'     => 'user',
    'password' => 'secret',
    'host'     => 'localhost',
    'driver'   => 'pdo_mysql',
];
$conn = DriverManager::getConnection($connectionParams);

They also provide a number of other approaches, including using a DSN (an acronym they never explain, but based on using PDO, likely means "data source name").

Once you have a connection, what do you do? Well the DBAL connection allows you to prepare and execute queries, including via the use of prepared statements. It provides a variety of methods for fetching individual or multiple rows, with a variety of options for how the data is returned (indexed arrays, associative arrays, individual columns, individual values, etc.). These retrieval methods are mirrored in the result instances returned when executing prepared statements as well.

And that brings me to the SQL abstraction.

First, it's really, really good. It's minimal, but it covers just about anything you need to do. If you need to write something complex, you probably can; the beauty is that if you can't, you can always fall back to a SQL query, and using the connection's API for binding values.

But the documentation could be better.

It felt like it was written by a database admin who has forgotten more than most people ever learn about databases, and never considered that others might not know as much as them. The fact that it starts with architecture and not usage feels hugely antagonistic for somebody coming in just wanting to know how to connect to the database, build a query, and fetch some results. (The irony is not lost on me that this is almost exactly how Laminas and Mezzio docs are written, and, yes, I recognize we could all do better!)

Before folks start grousing, yes, I have on my TODO list an item for contributing to the DBAL docs. I'm trying to work up an outline of what I would have found useful, what acronyms need explanation, and some examples of common patterns before I make any suggestions, however.

First, they have a whole documentation page related to the SQL query builder, and a lot of examples. But not a single one details how to actually execute the query! So, for those wondering:

$sql = $conn->createQueryBuilder();

// ... build your query ...

// Execute a query that will retrieve results (generally SELECT queries):
$result = $sql->executeQuery();

// Execute a query that produces changes (INSERT, UPDATE, DELETE, etc.):
$count = $sql->executeStatement();

Query results have a variety of fetch*() operations on them, while executing a statement returns an integer indicating the number of rows affected (assuming the database supports this).

Second, when I started doing joins, the argument names were confusing, and made it harder to understand what was needed. I eventually figured it out, but it was really easy to flip the arguments for the different tables being joined. The usage below illustrates names that would better describe how to use it:

$sql->innerJoin(
    $primaryTableOrItsAliasIfYouSpecifiedOne, // e.g. "user" or "u"
    $newTableToJoin,                          // e.g. "address"
    $aliasForNewTableToJoin,                  // e.g. "a"
    $conditionToJoinOn                        // e.g. "u.id = a.uid"
);

Third, there's some odd differences in the API between INSERT and UPDATE operations., When setting a value, one takes setValue(), while the other takes set(), and only one of these is valid for a given operation (it's setValue() for INSERT operations, and set() for UPDATE operations, in case you were wondering). This is especially confusing when using bound parameters, because both can use the setParameter() method for binding positional placeholder values.

Speaking of plaeholders, the docs don't do a great job of detailing how to handle placeholders gracefully.

The documentation suggests patterns like this:

$queryBuilder
    ->select('id', 'name')
    ->from('users')
    ->where('email = ?')
    ->setParameter(0, $userInputEmail);

Which is fine when there's only one parameterized value, but what if you have several, or if you're dynamically building the query (e.g., looping through user-supplied sorting or criteria, etc.), and you don't know their exact position in the final query? And what if you want to use named parameters instead of positional parameters, but you're not sure if your database supports them?

The answer is in the docs, but the various examples don't use the pattern (other than in the discussion of the methods), which is infuriating. The above can also be written as follows:

$queryBuilder
    ->select('id', 'name')
    ->from('users')
    ->where('email = ' . $queryBuilder->createNamedParameter($userInputEmail));

There's also a createPositionalParameter() method. Both accept an optional second argument, where you can specify the value type, which can help ensure that values are quoted correctly for the SQL type they will map to. This also allows you to do IN() operations, and each value will be quoted correctly, with the appropriate list separator for the database.

Once you know this approach, it's easy to remember and use, but it took me a few times through the docs before I stumbled across it.

The SQL it generates, though, is great, and when I've used tools like ZendHQ's Z-Ray to introspect queries, I'm always impressed by what was actually sent over the wire.

2023-12-11 Update

Alexander Kim pointed out to me that you can use named parameters within the query builder, along with the setParameter() method. That usage looks like this:
use Doctrine\DBAL\ParameterType;

$queryBuilder
    ->select('id', 'name')
    ->from('users')
    ->where('email = :email')
    ->setParameter('email', $userInputEmail, ParameterType::STRING);
You can also specify named parameters when using set() and setValue(), though I'd argue that using createNamedParameter() is easier in those contexts.

But for all these issues, the fact is that the docs generally give you enough, and the API is so clean and reasonably documented that you can generally figure out how things work just from your IDE hints and autocompletion. Yes, I have gripes, but the library is very solid, very well written, and absolutely something I can depend on.

Final Thoughts

I've often used straight PDO for projects, and it works fine. However, having a tool available like Doctrine DBAL has been a huge boon in ensuring I can switch from SQLite while prototyping to MySQL for production, and know that things will "just work".

I also find the way it juggles types to be really useful. I know that if a value is typed in the database as a NULL or as text or as a float or integer, I'll actually get those types back when I query; the same is true for when I send data to the database. There's no magic involved, and I don't have to remember to do type conversions to and from the database. That's exactly the type of functionality I want from a DBAL.

Yes, writing database-centric code is cumbersome, and there's a reason folks use ORMs, ActiveRecord, and the like. However, it generally only needs to be written once, with occasional updates. Having a good DBAL available helps keep complexity of your application down and gives you the tools to communicate securely with your database.

Advent 2023: Doctrine DBAL was originally published 10 December 2023 on https://mwop.net by Matthew Weier O'Phinney.

On Visibility in OOP

contact@mwop.net (Matthew Weier O'Phinney) — Sat, 30 Jun 2012 10:00:00 -0500

I'm a big proponent of object oriented programming. OOP done right helps ease code maintenance and enables code re-use.

Starting in PHP, OOP enthusiasts got a whole bunch of new tools, and new tools keep coming into the language for us with each minor release. One feature that has had a huge impact on frameworks and libraries has been available since the earliest PHP 5 versions: visibility.

Theory

The visibility keywords include private, protected, and public, often referred to as PPP. There's an additional keyword I often lump in with them, final.

Public visibility is the default, and equivalent to the only visibility available to PHP prior to version 5: any member declared public is accessible from any scope. This means the following:

class Foo
{
    public $bar = 'bar';

    public function baz() 
    {
        // I can access within my own scope
        return $this->bar;
    }
}

class FooBar extends Foo
{
    public function doThat()
    {
        // I have access to members in my parent
        return $this->bar . $this->baz();
    }
}

$foo = new Foo();

// I can access public members from an instance
echo $foo->bar . $foo->baz();

Basically, public visibility means that I can access the member from within the object, within an extending class, or from simply an instance.

Protected visibility starts to tighten things down a little. With protected visibility, only the class itself, or an extending class, can access the member:

class Foo
{
    protected $bar = 'bar';

    protected function baz() 
    {
        // I can access within my own scope
        return $this->bar;
    }
}

class FooBar extends Foo
{
    public function doThat()
    {
        // I can access protected members in my parent
        return $this->bar . $this->baz();
    }
}

$foo = new FooBar();

// This works, as I'm calling a public member of an extending class:
$foo->doThat();

// But these are both illegal:
echo $foo->bar . $foo->baz();

Protected visibility is nice for hiding things from those consuming your class. It can be used to hide implementation details, and to prevent direct modification of public properties — something important to consider, if a property may be the product of calculation, or if a particular type is required.

Private visibility locks things down further. With private visibility, the object member is only directly modifiable or callable within the declaring class.

class Foo
{
    private $bar = 'bar';

    private function baz() 
    {
        // I can access within my own scope
        return $this->bar;
    }
}

class FooBar extends Foo
{
    public function doThat()
    {
        // These are both illegal
        return $this->bar . $this->baz();
    }
}

$foo = new FooBar();

// These are also both illegal:
echo $foo->bar . $foo->baz();

Private visibility is generally of interest for locking down algorithms. For instance, if you know that a particular value or operation must not change, even in extending classes, declaring the member private ensures that extending classes cannot directly call it.

At any point, you can redeclare a property in an extending class using equal or more public visibility. The effect of doing so depends on what the visibility of the member was in the parent class.

In the case of a public property, if an extending class re-declares with public visibility, any access to the member within the extending class or an instance of the extending class will see only the new declaration.

class Foo
{
    public $bar = 'bar';

    public function baz() 
    {
        return $this->bar;
    }
}

class FooBar extends Foo
{
    public $bar = 'foobar';
}

$foo = new FooBar();
echo $foo->bar;   // "foobar"
echo $foo->baz(); // "foobar"

In the instance of a protected property, if the extending class re-declares with either public or protected visibility, you get the same behavior as public -> public.

class Foo
{
    protected $bar = 'bar';

    public function baz() 
    {
        return $this->bar;
    }
}

class FooBar extends Foo
{
    public $bar = 'foobar';
}

$foo = new FooBar();
echo $foo->bar;   // "foobar"
echo $foo->baz(); // "foobar"

In the instance of a private property, things get interesting. The private value or method will be used for any access made within code declared in the parent class, but not overridden in the child. However, if the child class overrides any code, the value of the re-declared instance will be used. This is far easier to understand via an example.

class Foo
{
    private $bar = 'bar';
    private $baz = 'baz';

    public function baz() 
    {
        return $this->bar;
    }
}

class FooBar extends Foo
{
    protected $bar = 'foobar';
    private $baz = 'foobaz';

    public function myBaz() 
    {
        return $this->bar;
    }

    public function myBaz2()
    {
        return $this->baz;
    }
}

$foo = new FooBar();
echo $foo->baz();    // "bar"
echo $foo->myBaz();  // "foobar"
echo $foo->myBaz2(); // "foobaz"

My personal takeaway from this is:

Use public for members that are safe for anything to call.
Use protected for anything you don't want called from instance methods, not important to the public API (implementation details), and anything you feel is safe for extending classes to muck about with.
Use private for any important implementation details that could adversely affect execution if overridden by an extending class.

Those paying attention will note that I skipped final. Actually, I saved that for last. Marking a class or method final tells PHP that the class or method may not be extended or re-declared/overridden. At all. I lump this with visibility, because it's another way of locking down access to an API; marking something final is saying, "you cannot extend this", similar to using private, but without even the possibility of redeclaring.

Applied

What got me to thinking about all this was a turn of events with Zend Framework 2. We've had an annotation parser since last summer. Ralph Schindler developed it in order to facilitate automatic discovery of injection points for our Dependency Injection container. Classes could mark a method with the @Inject annotation, and the various DI compilers would know that that method needed to be injected.

use Zend\Di\Definition\Annotation\Inject;

class Foo
{
    protected $bar;

    /**
     * @Inject()
     * @param  Bar $bar
     * @return void
     */
    public function setBar(Bar $bar)
    {
        $this->bar = $bar;
    }
}

class Bar {}

Recently, part of our Forms RFC included a feature to allow creating forms and their related input filters by using annotations. Basically, this allows developers to hint on their domain entities how specific properties should be filtered, validated, and potentially represented at the form level.

use Zend\Form\Annotation;

class Foo
{
    /**
     * @Annotation\Filter({"name":"StringTrim"})
     * @Annotation\Validator({"name":"Between","options":{"min":5,"max":20}})
     * @Annotation\Attributes({"type":"range"})
     */
    protected $bar;
}

One developer testing the support wanted to use a combination of Doctrine annotations and ZF2 form annotations — that way his entities could also describe validation and representation.

I did some work to make this happen, and everybody was happy. Except then that same developer went to use that entity with Doctrine, and Doctrine's annotation parser started raising exceptions on all the ZF2 annotations.

After some debate, I realized: (a) we were basically just making up syntax for our annotations; it'd be better to use an established syntax; but (b) we should still retain the ability to use arbitrary syntax, as we can't really know what sorts of annotations developers may already be using.

So, we decided to make our annotation component depend on the annotations support in Doctrine\Common, and to use the annotation syntax they utilize. ZF2 would provide some code to make it possible to plug in arbitrary parsers, and use the Doctrine\Common annotation parser to parse annotations officially supported by ZF2.

However, when I went to start making this happen, I ran into immediate issues.

Remember how this post is about visibility? Well, the class I was directly interested in, Doctrine\Common\Annotations\DocParser, not only contains private members, but is marked final.

My immediate response was to start dissecting the class, cutting and pasting the bits interesting to my solution into a new class in ZF2. I went down this route for several hours, gradually pulling in more and more methods as I discovered how far down the rabbit hole I needed to go to accomplish my task.

But at the back of my head, I kept thinking this was a bad idea. If any patches ever came in for the original class, I'd need to port them into our ZF2 solution. And I couldn't help but think that I'd miss a crucial piece.

So I started playing with its public API, to see if there were any shortcuts I might be able to take. And there were.

The class has a public parse() method. Based on how Doctrine uses the code, I assumed I needed to pass a full PHP docblock in — which ran counter to how I wanted to use the code. I wanted to pass in an annotation at a time. But when I looked closer, I realized that the parser didn't require a full docblock; any fragment would do.

To make a long story short: I was able to feed the parser a single annotation at a time from ZF2's AnnotationScanner. This allowed me to build a very simple class that allows registering a set of annotations it can handle, and feeding it a single annotation string at a time to decide (a) if it supports it, and (b) to parse it and return the associated annotation object.

In sum: because the class in question was marked final and had private members, I found myself forced to think critically about what I wanted to accomplish, and then thoroughly understand the public API to see how I might accomplish that task without the ability to extend.

Conclusions

Doctrine has a policy that encourages poka-yoke solutions: code should be executable in a specific way. The policy was developed to both aid users (having multiple ways of doing something is often confusing), as well as to ease maintenance (fewer extension points means less liklihood of developers doing hard-to-debug things in extending code and reporting it back to the project). These have led them to heavily use private and final visibility.

I've said it before, and I'll say it again: I feel that frameworks and libraries should use private and final sparingly. Over the years, I've seen code repurposed in simply wondrous ways — largely due to keeping the code as open as possible to extension. I like to enable my users as much as possible.

That said, I can also see Doctrine's argument — and can see where, while it can often be frustrating, it can also lead to potentially more sound and elegant solutions.

I'll probably continue shying away from private and final visibility, but I do plan to experiment with it more in the future. What about you?

On Visibility in OOP was originally published 28 June 2012 on https://mwop.net by Matthew Weier O'Phinney.