<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title type="text">Blog entries tagged spl :: mwop.net</title>
  <updated>2011-01-21T17:07:25-06:00</updated>
  <generator uri="https://getlaminas.org" version="2">Laminas_Feed_Writer</generator>
  <link rel="alternate" type="text/html" href="https://mwop.net/blog/tag/spl"/>
  <link rel="self" type="application/atom+xml" href="https://mwop.net/blog/tag/spl/atom.xml"/>
  <id>https://mwop.net/blog/tag/spl</id>
  <entry xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <title type="html"><![CDATA[Taming SplPriorityQueue]]></title>
    <published>2011-01-17T10:02:00-06:00</published>
    <updated>2011-01-21T17:07:25-06:00</updated>
    <link rel="alternate" type="text/html" href="https://mwop.net/blog/253-Taming-SplPriorityQueue.html"/>
    <id>https://mwop.net/blog/253-Taming-SplPriorityQueue.html</id>
    <author>
      <name>Matthew Weier O'Phinney</name>
      <email>contact@mwop.net</email>
      <uri>https://mwop.net</uri>
    </author>
    <content xmlns:xhtml="http://www.w3.org/1999/xhtml" type="xhtml">
      <xhtml:div xmlns:xhtml="http://www.w3.org/1999/xhtml"><xhtml:p><xhtml:a href="http://php.net/SplPriorityQueue">SplPriorityQueue</xhtml:a>
is a fantastic new feature of PHP 5.3. However, in trying to
utilize it in a few projects recently, I've run into some behavior
that's (a) non-intuitive, and (b) in some cases at least,
undesired. In this post, I'll present my solutions.</xhtml:p>
<xhtml:h2>Of Heaps and Queues</xhtml:h2>
<xhtml:p><xhtml:em>Queues</xhtml:em> in programming are any data structure that, when
iterated, return values in a "first-in-first-out" (FIFO) order. For
"last-in-first-out" (LIFO) iteration, you define a
<xhtml:em>stack</xhtml:em>.</xhtml:p>
<xhtml:p>A <xhtml:em>heap</xhtml:em> is a data structure where, given a specific
node, all nodes beneath it are of a value less than it.
(Technically, this would be considered a "max-heap," as you can
also have a variant where all child nodes are of a value greater;
this is called a "min-heap.")</xhtml:p>
<xhtml:p>A <xhtml:em>priority queue</xhtml:em> is a specialized version of a
max-heap. Typically, data is registered with a specific priority —
so the max-heap is looking at only the priority value, not the data
itself. This allows inserting data into the queue in any order
desired, while ensuring that they are iterated in the order
specified by the priorities provided.</xhtml:p>
<xhtml:p>PHP offers SPL data structures corresponding to each:</xhtml:p>
<xhtml:ul>
<xhtml:li><xhtml:a href="http://php.net/SplQueue">SplQueue</xhtml:a>, corresponding
to a <xhtml:em>queue</xhtml:em>.</xhtml:li>
<xhtml:li><xhtml:a href="http://php.net/SplStack">SplStack</xhtml:a>, corresponding
to a <xhtml:em>stack</xhtml:em>.</xhtml:li>
<xhtml:li><xhtml:a href="http://php.net/SplHeap">SplHeap</xhtml:a>, corresponding to
a <xhtml:em>heap</xhtml:em>.</xhtml:li>
<xhtml:li><xhtml:a href="http://php.net/SplMaxHeap">SplMaxHeap</xhtml:a>,
correspondaxg to a <xhtml:em>max-heap</xhtml:em>.</xhtml:li>
<xhtml:li><xhtml:a href="http://php.net/SplMinHeap">SplMinHeap</xhtml:a>,
corresponding to a <xhtml:em>min-heap</xhtml:em>.</xhtml:li>
<xhtml:li><xhtml:a href="http://php.net/SplPriorityQueue">SplPriorityQueue</xhtml:a>,
corresponding to a <xhtml:em>priority queue</xhtml:em>.</xhtml:li>
</xhtml:ul>
<xhtml:h2>Problems</xhtml:h2>
<xhtml:p>The first problem I ran into was really a lapse of reasoning on
my part, and is namely this:</xhtml:p>
<xhtml:blockquote>
<xhtml:p><xhtml:em>Iterating over a heap removes the values from the
heap.</xhtml:em></xhtml:p>
</xhtml:blockquote>
<xhtml:p>Basically, in order to satisfy the <xhtml:em>heap</xhtml:em> contract, which
is that the root node is always the maximum value (or minimum, in
the case of a min-heap), any previous nodes must be removed.</xhtml:p>
<xhtml:p>The problem with this, obviously, is that if you want to iterate
over a heap of any sort multiple times, well, you can't with the
same instance.</xhtml:p>
<xhtml:p>The next problem I ran into was with SplPriorityQueue
specifically: when items of equal priority are enqueued, the
iteration order of these items is… unexpected. While the <xhtml:a href="http://php.net/splpriorityqueue.compare">documentation</xhtml:a> notes
that "multiple elements with the same priority will get dequeued in
no particular order," the fact is that it <xhtml:em>is</xhtml:em> predictable,
and unintuitive. For example, given the following:</xhtml:p>
<xhtml:pre><xhtml:code class="language-php hljs php" data-lang="php">$queue-&gt;insert(<xhtml:span class="hljs-string">'foo'</xhtml:span>, <xhtml:span class="hljs-number">1000</xhtml:span>);
$queue-&gt;insert(<xhtml:span class="hljs-string">'bar'</xhtml:span>, <xhtml:span class="hljs-number">1000</xhtml:span>);
$queue-&gt;insert(<xhtml:span class="hljs-string">'baz'</xhtml:span>, <xhtml:span class="hljs-number">1000</xhtml:span>);
$queue-&gt;insert(<xhtml:span class="hljs-string">'bat'</xhtml:span>, <xhtml:span class="hljs-number">1000</xhtml:span>);

<xhtml:span class="hljs-keyword">foreach</xhtml:span> ($queue <xhtml:span class="hljs-keyword">as</xhtml:span> $data) <xhtml:span class="hljs-keyword">echo</xhtml:span> $data, <xhtml:span class="hljs-string">" "</xhtml:span>;
</xhtml:code></xhtml:pre>
<xhtml:p>I'd expect a result of "foo bar baz bat", assuming FIFO order
(which is expected in a <xhtml:em>queue</xhtml:em>) for equal priorities; "foo
baz bat bar", assuming ordering by data (which might be expected in
a max-heap). In fact, neither is true: the first item will be
emitted first, and then the remaining items in reverse order of
when enqueued: "foo bat baz bar".</xhtml:p>
<xhtml:p>While this may be somewhat predictable, I find I don't want to
assume such order, nor try and write code around it.</xhtml:p>
<xhtml:h2>Solutions</xhtml:h2>
<xhtml:h3>Allowing multiple iterations</xhtml:h3>
<xhtml:p>Allowing multiple iterations of a queue is as easy as cloning it
prior to iteration:</xhtml:p>
<xhtml:pre><xhtml:code class="language-php hljs php" data-lang="php"><xhtml:span class="hljs-keyword">foreach</xhtml:span> (<xhtml:span class="hljs-keyword">clone</xhtml:span> $queue <xhtml:span class="hljs-keyword">as</xhtml:span> $datum) <xhtml:span class="hljs-keyword">echo</xhtml:span> $datum, <xhtml:span class="hljs-string">" "</xhtml:span>;
</xhtml:code></xhtml:pre>
<xhtml:p>The problem is automating this — there are cases where I don't
want users to really have to understand the internal
implementation.</xhtml:p>
<xhtml:p>My solution to this was to use the idea of inner and outer
iterators. In this particular case, I created a
<xhtml:code>PriorityQueue</xhtml:code> class that composes an
<xhtml:code>SplPriorityQueue</xhtml:code> instance, and which also implements
<xhtml:code>IteratorAggregate</xhtml:code>. This allows the following:</xhtml:p>
<xhtml:pre><xhtml:code class="language-php hljs php" data-lang="php"><xhtml:span class="hljs-keyword">namespace</xhtml:span> <xhtml:span class="hljs-title">Foo</xhtml:span>;

<xhtml:span class="hljs-class"><xhtml:span class="hljs-keyword">class</xhtml:span> <xhtml:span class="hljs-title">PriorityQueue</xhtml:span> <xhtml:span class="hljs-keyword">implements</xhtml:span> <xhtml:span class="hljs-title">Countable</xhtml:span>, <xhtml:span class="hljs-title">IteratorAggregate</xhtml:span>
</xhtml:span>{
    <xhtml:span class="hljs-keyword">protected</xhtml:span> $innerQueue;
    
    <xhtml:span class="hljs-keyword">public</xhtml:span> <xhtml:span class="hljs-function"><xhtml:span class="hljs-keyword">function</xhtml:span> <xhtml:span class="hljs-title">__construct</xhtml:span><xhtml:span class="hljs-params">()</xhtml:span>
    </xhtml:span>{
        <xhtml:span class="hljs-comment">// I'll explain the lack of global namespacing later...</xhtml:span>
        <xhtml:span class="hljs-keyword">$this</xhtml:span>-&gt;innerQueue = <xhtml:span class="hljs-keyword">new</xhtml:span> SplPriorityQueue;
    }

    <xhtml:span class="hljs-keyword">public</xhtml:span> <xhtml:span class="hljs-function"><xhtml:span class="hljs-keyword">function</xhtml:span> <xhtml:span class="hljs-title">count</xhtml:span><xhtml:span class="hljs-params">()</xhtml:span>
    </xhtml:span>{
        <xhtml:span class="hljs-keyword">return</xhtml:span> count(<xhtml:span class="hljs-keyword">$this</xhtml:span>-&gt;innerQueue);
    }

    <xhtml:span class="hljs-keyword">public</xhtml:span> <xhtml:span class="hljs-function"><xhtml:span class="hljs-keyword">function</xhtml:span> <xhtml:span class="hljs-title">insert</xhtml:span><xhtml:span class="hljs-params">($datum, $priority)</xhtml:span>
    </xhtml:span>{
        <xhtml:span class="hljs-keyword">$this</xhtml:span>-&gt;innerQueue-&gt;insert($datum, $priority);
    }
    
    <xhtml:span class="hljs-keyword">public</xhtml:span> <xhtml:span class="hljs-function"><xhtml:span class="hljs-keyword">function</xhtml:span> <xhtml:span class="hljs-title">getIterator</xhtml:span><xhtml:span class="hljs-params">()</xhtml:span>
    </xhtml:span>{
        <xhtml:span class="hljs-keyword">return</xhtml:span> <xhtml:span class="hljs-keyword">clone</xhtml:span> <xhtml:span class="hljs-keyword">$this</xhtml:span>-&gt;innerQueue;
    }
}
</xhtml:code></xhtml:pre>
<xhtml:p>This approach means that as I consume
<xhtml:code>PriorityQueue</xhtml:code>, I can be assured that I can count and
iterate over it… again and again.</xhtml:p>
<xhtml:p>I mention in the code comments that I'm not importing
<xhtml:code>SplPriorityQueue</xhtml:code> into the namespace. The reason is
that I want to also solve the problem of predictable queue
order.</xhtml:p>
<xhtml:h3>Enforcing predictable queue order</xhtml:h3>
<xhtml:p>The solution to the queue order problem with equal priorities is
actually quite simple. While I found it on <xhtml:a href="http://php.net/splpriorityqueue.compare">the
SplPriorityQueue::compare manual page</xhtml:a>, <xhtml:a href="http://twitter.com/elazar">Matthew Turland</xhtml:a> also <xhtml:a href="http://www.slideshare.net/tobias382/new-spl-features-in-php-53">discusses
it in a presentation on SPL</xhtml:a>, and it hinges on one, simple fact:
<xhtml:em>priorities do not need to be integers</xhtml:em>.</xhtml:p>
<xhtml:p>What does this mean? It means that the following are not
equivalent, and will lead to a more expected sort order:</xhtml:p>
<xhtml:pre><xhtml:code class="language-php hljs php" data-lang="php">$queue-&gt;insert(<xhtml:span class="hljs-string">'foo'</xhtml:span>, <xhtml:span class="hljs-keyword">array</xhtml:span>(<xhtml:span class="hljs-number">1000</xhtml:span>, <xhtml:span class="hljs-number">1000</xhtml:span>));
$queue-&gt;insert(<xhtml:span class="hljs-string">'bar'</xhtml:span>, <xhtml:span class="hljs-keyword">array</xhtml:span>(<xhtml:span class="hljs-number">1000</xhtml:span>, <xhtml:span class="hljs-number">100</xhtml:span>));
$queue-&gt;insert(<xhtml:span class="hljs-string">'baz'</xhtml:span>, <xhtml:span class="hljs-keyword">array</xhtml:span>(<xhtml:span class="hljs-number">1000</xhtml:span>, <xhtml:span class="hljs-number">10</xhtml:span>));
$queue-&gt;insert(<xhtml:span class="hljs-string">'bat'</xhtml:span>, <xhtml:span class="hljs-keyword">array</xhtml:span>(<xhtml:span class="hljs-number">1000</xhtml:span>, <xhtml:span class="hljs-number">1</xhtml:span>));

<xhtml:span class="hljs-keyword">foreach</xhtml:span> ($queue <xhtml:span class="hljs-keyword">as</xhtml:span> $data) <xhtml:span class="hljs-keyword">echo</xhtml:span> $data, <xhtml:span class="hljs-string">" "</xhtml:span>;
</xhtml:code></xhtml:pre>
<xhtml:p>This results in "foo bar baz bat"!</xhtml:p>
<xhtml:p>The trick, then, is automating the solution. I achieved this in
a custom <xhtml:code>SplPriorityQueue</xhtml:code> extension:</xhtml:p>
<xhtml:pre><xhtml:code class="language-php hljs php" data-lang="php"><xhtml:span class="hljs-keyword">namespace</xhtml:span> <xhtml:span class="hljs-title">Foo</xhtml:span>;

<xhtml:span class="hljs-class"><xhtml:span class="hljs-keyword">class</xhtml:span> <xhtml:span class="hljs-title">SplPriorityQueue</xhtml:span> <xhtml:span class="hljs-keyword">extends</xhtml:span> \<xhtml:span class="hljs-title">SplPriorityQueue</xhtml:span>
</xhtml:span>{
    <xhtml:span class="hljs-keyword">protected</xhtml:span> $queueOrder = PHP_INT_MAX;

    <xhtml:span class="hljs-keyword">public</xhtml:span> <xhtml:span class="hljs-function"><xhtml:span class="hljs-keyword">function</xhtml:span> <xhtml:span class="hljs-title">insert</xhtml:span><xhtml:span class="hljs-params">($datum, $priority)</xhtml:span>
    </xhtml:span>{
        <xhtml:span class="hljs-keyword">if</xhtml:span> (is_int($priority)) {
            $priority = <xhtml:span class="hljs-keyword">array</xhtml:span>($priority, <xhtml:span class="hljs-keyword">$this</xhtml:span>-&gt;queueOrder--);
        }
        <xhtml:span class="hljs-keyword">parent</xhtml:span>::insert($datum, $priority);
    }
}
</xhtml:code></xhtml:pre>
<xhtml:p>As each datum is added to the queue, if the priority is an
integer, it wraps it in an array, using <xhtml:code>$queueOrder</xhtml:code> as
a second value to the array, and decrementing
<xhtml:code>$queueOrder</xhtml:code> on completion. The new priority is then
used to insert the value.</xhtml:p>
<xhtml:p>Using this extension ensures that order in the priority queue is
now predictable.</xhtml:p>
<xhtml:h2>Conclusions</xhtml:h2>
<xhtml:p><xhtml:code>SplPriorityQueue</xhtml:code> is indeed powerful, and saves me a
ton of time programming — and also likely CPU processes and memory
when using larger data sets. While it may not always meet my use
cases, the fact is that, particularly with namespacing available, I
can easily override the class to meet my needs.</xhtml:p>
<xhtml:div class="h-entry"><xhtml:img class="u-photo photo" width="50" src="https://avatars0.githubusercontent.com/u/25943?v=3&amp;u=79dd2ea1d4d8855944715d09ee4c86215027fa80&amp;s=140" alt="matthew"/> <xhtml:a class="u-url u-uid p-name" href="https://mwop.net/blog/253-Taming-SplPriorityQueue.html">Taming
SplPriorityQueue</xhtml:a> was originally published <xhtml:time class="dt-published" datetime="2011-01-17T10:02:00-06:00">17 January
2011</xhtml:time> on <xhtml:a href="https://mwop.net">https://mwop.net</xhtml:a> by
<xhtml:a rel="author" class="p-author" href="https://mwop.net">Matthew
Weier O'Phinney</xhtml:a>.</xhtml:div>
</xhtml:div>
    </content>
  </entry>
  <entry xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <title type="html"><![CDATA[Applying FilterIterator to Directory Iteration]]></title>
    <published>2010-08-16T10:30:00-05:00</published>
    <updated>2010-08-20T15:45:21-05:00</updated>
    <link rel="alternate" type="text/html" href="https://mwop.net/blog/244-Applying-FilterIterator-to-Directory-Iteration.html"/>
    <id>https://mwop.net/blog/244-Applying-FilterIterator-to-Directory-Iteration.html</id>
    <author>
      <name>Matthew Weier O'Phinney</name>
      <email>contact@mwop.net</email>
      <uri>https://mwop.net</uri>
    </author>
    <content xmlns:xhtml="http://www.w3.org/1999/xhtml" type="xhtml">
      <xhtml:div xmlns:xhtml="http://www.w3.org/1999/xhtml"><xhtml:p>I'm currently doing research and prototyping for autoloading
alternatives in <xhtml:a href="http://framework.zend.com/">Zend
Framework</xhtml:a> 2.0. One approach I'm looking at involves creating
explicit class/file maps; these tend to be much faster than using
the <xhtml:code>include_path</xhtml:code>, but do require some additional
setup.</xhtml:p>
<xhtml:p>My algorithm for generating the maps was absurdly simple:</xhtml:p>
<xhtml:ul>
<xhtml:li>Scan the filesystem for PHP files</xhtml:li>
<xhtml:li>If the file does not contain an interface, class, or abstract
class, skip it.</xhtml:li>
<xhtml:li>If it does, get its declared namespace and classname</xhtml:li>
</xhtml:ul>
<xhtml:p>The question was what implementation approach to use.</xhtml:p>
<xhtml:p>I'm well aware of <xhtml:code>RecursiveDirectoryIterator</xhtml:code>, and
planned to use that. However, I also had heard of
<xhtml:code>FilterIterator</xhtml:code>, and wondered if I could tie that in
somehow. In the end, I could, but the solution was non-obvious.</xhtml:p>
<xhtml:h2>What I Thought I'd Be Able To Do</xhtml:h2>
<xhtml:p><xhtml:code>FilterIterator</xhtml:code> is an abstract class. When extending
it, you must define an <xhtml:code>accept()</xhtml:code> method.</xhtml:p>
<xhtml:pre><xhtml:code class="language-php hljs php" data-lang="php"><xhtml:span class="hljs-class"><xhtml:span class="hljs-keyword">class</xhtml:span> <xhtml:span class="hljs-title">FooFilter</xhtml:span> <xhtml:span class="hljs-keyword">extends</xhtml:span> <xhtml:span class="hljs-title">FilterIterator</xhtml:span>
</xhtml:span>{
    <xhtml:span class="hljs-keyword">public</xhtml:span> <xhtml:span class="hljs-function"><xhtml:span class="hljs-keyword">function</xhtml:span> <xhtml:span class="hljs-title">accept</xhtml:span><xhtml:span class="hljs-params">()</xhtml:span>
    </xhtml:span>{
    }
}
</xhtml:code></xhtml:pre>
<xhtml:p>In that method, you typically will inspect whatever is returned
by <xhtml:code>$this-&gt;current()</xhtml:code>, and then return a boolean
<xhtml:code>true</xhtml:code> or <xhtml:code>false</xhtml:code>, depending on whether you
want to keep it or not.</xhtml:p>
<xhtml:pre><xhtml:code class="language-php hljs php" data-lang="php"><xhtml:span class="hljs-class"><xhtml:span class="hljs-keyword">class</xhtml:span> <xhtml:span class="hljs-title">FooFilter</xhtml:span> <xhtml:span class="hljs-keyword">extends</xhtml:span> <xhtml:span class="hljs-title">FilterIterator</xhtml:span>
</xhtml:span>{
    <xhtml:span class="hljs-keyword">public</xhtml:span> <xhtml:span class="hljs-function"><xhtml:span class="hljs-keyword">function</xhtml:span> <xhtml:span class="hljs-title">accept</xhtml:span><xhtml:span class="hljs-params">()</xhtml:span>
    </xhtml:span>{
        $item = <xhtml:span class="hljs-keyword">$this</xhtml:span>-&gt;current();

        <xhtml:span class="hljs-keyword">if</xhtml:span> ($someCriteriaIsMet) {
            <xhtml:span class="hljs-keyword">return</xhtml:span> <xhtml:span class="hljs-keyword">true</xhtml:span>;
        }

        <xhtml:span class="hljs-keyword">return</xhtml:span> <xhtml:span class="hljs-keyword">false</xhtml:span>;
    }
}
</xhtml:code></xhtml:pre>
<xhtml:p>I'll go into the mechanics of my criteria later; what's
important now is knowing that a <xhtml:code>FilterIterator</xhtml:code> allows
you to limit the results returned by your iterator.</xhtml:p>
<xhtml:p>I originally thought I'd be able to simply pass a
<xhtml:code>DirectoryIterator</xhtml:code> or
<xhtml:code>RecursiveDirectoryIterator</xhtml:code> to my filtering instance.
This worked in the former case, as it's only one level deep.
However, for the latter, it would only return the first directory
level for all classes that matched — i.e., if I ran it over
<xhtml:code>Zend/Controller</xhtml:code>, I'd get a match for each class under
<xhtml:code>Zend/Controller/Action/Helper/</xhtml:code>, but it would return
simply <xhtml:code>Zend/Controller/Action</xhtml:code> as the match. This
certainly wasn't useful.</xhtml:p>
<xhtml:p>I then discovered <xhtml:code>RecursiveFilterIterator</xhtml:code>, which
looked like it would solve the recursion problem. However, I found
one of two results occurred: either I'd receive an entire subtree
if at least one item matched, or it would skip an entire subtree if
the first item found failed the criteria. There was no middle
ground.</xhtml:p>
<xhtml:h2>The Solution</xhtml:h2>
<xhtml:p>The solution was incredibly simple and elegant, once I stumbled
upon it: pass my <xhtml:code>RecursiveIteratorIterator</xhtml:code> instance to
the <xhtml:code>FilterIterator</xhtml:code>.</xhtml:p>
<xhtml:pre><xhtml:code class="language-php hljs php" data-lang="php">$rdi      = <xhtml:span class="hljs-keyword">new</xhtml:span> RecursiveDirectoryIterator($somePath);
$rii      = <xhtml:span class="hljs-keyword">new</xhtml:span> RecursiveIteratorIterator($rdi);
$filtered = <xhtml:span class="hljs-keyword">new</xhtml:span> FooFilter($rii);
</xhtml:code></xhtml:pre>
<xhtml:p>Really. It was that simple — but, as noted, non-obvious. It also
required a slight change within my filter — instead of using
<xhtml:code>current()</xhtml:code>, I'd need to first pull the "inner" iterator
instance: <xhtml:code>$this-&gt;getInnerIterator()-&gt;current()</xhtml:code>.
I show an example of that below when I go over the filter
implementation.</xhtml:p>
<xhtml:p>As for my criteria, I had several options. I could
<xhtml:code>require_once</xhtml:code> the file, and use the Reflection API to
inspect the class to determine if it was an interface, abstract
class, or class, as well as to determine the namespace. However, I
couldn't be 100% sure the file would contain a class, so this
seemed like overkill. That, and horribly non-performant, due to
using reflection.</xhtml:p>
<xhtml:p>The next option was to simply slurp in the file contents into a
variable, and use regular expressions. I love regular expressions,
but in this case, it felt like I could possibly end up with some
false positives. Also, since some of these files could be quite
large, I was worried again about performance implications — I don't
want to have to wait forever to generate these maps.</xhtml:p>
<xhtml:p>The solution I went with was to use the <xhtml:a href="http://php.net/tokenizer">tokenizer</xhtml:a> to inspect the file.
Tokenizing is incredibly fast, and it's also incredibly simple to
analyze the tokens.</xhtml:p>
<xhtml:p>I decided to store the detected namespace and classnames as
public properties of the <xhtml:code>SplFileInfo</xhtml:code> objects returned;
this makes it simple to iterate over the entire collection and
utilize that information. Also, because I have
<xhtml:code>SplFileInfo</xhtml:code> objects, I already have the paths I
need.</xhtml:p>
<xhtml:p>My implementation looks like this:</xhtml:p>
<xhtml:pre><xhtml:code class="language-php hljs php" data-lang="php"><xhtml:span class="hljs-comment">/** <xhtml:span class="hljs-doctag">@namespace</xhtml:span> */</xhtml:span>
<xhtml:span class="hljs-keyword">namespace</xhtml:span> <xhtml:span class="hljs-title">Zend</xhtml:span>\<xhtml:span class="hljs-title">File</xhtml:span>;

<xhtml:span class="hljs-comment">// import SPL classes/interfaces into local scope</xhtml:span>
<xhtml:span class="hljs-keyword">use</xhtml:span> <xhtml:span class="hljs-title">DirectoryIterator</xhtml:span>,
    <xhtml:span class="hljs-title">FilterIterator</xhtml:span>,
    <xhtml:span class="hljs-title">RecursiveIterator</xhtml:span>,
    <xhtml:span class="hljs-title">RecursiveDirectoryIterator</xhtml:span>,
    <xhtml:span class="hljs-title">RecursiveIteratorIterator</xhtml:span>;

<xhtml:span class="hljs-comment">/**
 * Locate files containing PHP classes, interfaces, or abstracts
 * 
 * <xhtml:span class="hljs-doctag">@package</xhtml:span>    Zend_File
 * <xhtml:span class="hljs-doctag">@license</xhtml:span>    New BSD {<xhtml:span class="hljs-doctag">@link</xhtml:span> http://framework.zend.com/license/new-bsd}
 */</xhtml:span>
<xhtml:span class="hljs-class"><xhtml:span class="hljs-keyword">class</xhtml:span> <xhtml:span class="hljs-title">ClassFileLocater</xhtml:span> <xhtml:span class="hljs-keyword">extends</xhtml:span> <xhtml:span class="hljs-title">FilterIterator</xhtml:span>
</xhtml:span>{
    <xhtml:span class="hljs-comment">/**
     * Create an instance of the locater iterator
     * 
     * Expects either a directory, or a DirectoryIterator (or its recursive variant) 
     * instance.
     * 
     * <xhtml:span class="hljs-doctag">@param</xhtml:span>  string|DirectoryIterator $dirOrIterator 
     * <xhtml:span class="hljs-doctag">@return</xhtml:span> void
     */</xhtml:span>
    <xhtml:span class="hljs-keyword">public</xhtml:span> <xhtml:span class="hljs-function"><xhtml:span class="hljs-keyword">function</xhtml:span> <xhtml:span class="hljs-title">__construct</xhtml:span><xhtml:span class="hljs-params">($dirOrIterator = <xhtml:span class="hljs-string">'.'</xhtml:span>)</xhtml:span>
    </xhtml:span>{
        <xhtml:span class="hljs-keyword">if</xhtml:span> (is_string($dirOrIterator)) {
            <xhtml:span class="hljs-keyword">if</xhtml:span> (!is_dir($dirOrIterator)) {
                <xhtml:span class="hljs-keyword">throw</xhtml:span> <xhtml:span class="hljs-keyword">new</xhtml:span> InvalidArgumentException(<xhtml:span class="hljs-string">'Expected a valid directory name'</xhtml:span>);
            }

            $dirOrIterator = <xhtml:span class="hljs-keyword">new</xhtml:span> RecursiveDirectoryIterator($dirOrIterator);
        }
        <xhtml:span class="hljs-keyword">if</xhtml:span> (!$dirOrIterator <xhtml:span class="hljs-keyword">instanceof</xhtml:span> DirectoryIterator) {
            <xhtml:span class="hljs-keyword">throw</xhtml:span> <xhtml:span class="hljs-keyword">new</xhtml:span> InvalidArgumentException(<xhtml:span class="hljs-string">'Expected a DirectoryIterator'</xhtml:span>);
        }

        <xhtml:span class="hljs-keyword">if</xhtml:span> ($dirOrIterator <xhtml:span class="hljs-keyword">instanceof</xhtml:span> RecursiveIterator) {
            $iterator = <xhtml:span class="hljs-keyword">new</xhtml:span> RecursiveIteratorIterator($dirOrIterator);
        } <xhtml:span class="hljs-keyword">else</xhtml:span> {
            $iterator = $dirOrIterator;
        }

        <xhtml:span class="hljs-keyword">parent</xhtml:span>::__construct($iterator);
        <xhtml:span class="hljs-keyword">$this</xhtml:span>-&gt;rewind();
    }

    <xhtml:span class="hljs-comment">/**
     * Filter for files containing PHP classes, interfaces, or abstracts
     * 
     * <xhtml:span class="hljs-doctag">@return</xhtml:span> bool
     */</xhtml:span>
    <xhtml:span class="hljs-keyword">public</xhtml:span> <xhtml:span class="hljs-function"><xhtml:span class="hljs-keyword">function</xhtml:span> <xhtml:span class="hljs-title">accept</xhtml:span><xhtml:span class="hljs-params">()</xhtml:span>
    </xhtml:span>{
        $file = <xhtml:span class="hljs-keyword">$this</xhtml:span>-&gt;getInnerIterator()-&gt;current();

        <xhtml:span class="hljs-comment">// If we somehow have something other than an SplFileInfo object, just </xhtml:span>
        <xhtml:span class="hljs-comment">// return false</xhtml:span>
        <xhtml:span class="hljs-keyword">if</xhtml:span> (!$file <xhtml:span class="hljs-keyword">instanceof</xhtml:span> \SplFileInfo) {
            <xhtml:span class="hljs-keyword">return</xhtml:span> <xhtml:span class="hljs-keyword">false</xhtml:span>;
        }

        <xhtml:span class="hljs-comment">// If we have a directory, it's not a file, so return false</xhtml:span>
        <xhtml:span class="hljs-keyword">if</xhtml:span> (!$file-&gt;isFile()) {
            <xhtml:span class="hljs-keyword">return</xhtml:span> <xhtml:span class="hljs-keyword">false</xhtml:span>;
        }

        <xhtml:span class="hljs-comment">// If not a PHP file, skip</xhtml:span>
        <xhtml:span class="hljs-keyword">if</xhtml:span> ($file-&gt;getBasename(<xhtml:span class="hljs-string">'.php'</xhtml:span>) == $file-&gt;getBasename()) {
            <xhtml:span class="hljs-keyword">return</xhtml:span> <xhtml:span class="hljs-keyword">false</xhtml:span>;
        }

        $contents = file_get_contents($file-&gt;getRealPath());
        $tokens   = token_get_all($contents);
        $count    = count($tokens);
        $i        = <xhtml:span class="hljs-number">0</xhtml:span>;
        <xhtml:span class="hljs-keyword">while</xhtml:span> ($i &lt; $count) {
            $token = $tokens[$i];

            <xhtml:span class="hljs-keyword">if</xhtml:span> (!is_array($token)) {
                <xhtml:span class="hljs-comment">// single character token found; skip</xhtml:span>
                $i++;
                <xhtml:span class="hljs-keyword">continue</xhtml:span>;
            }

            <xhtml:span class="hljs-keyword">list</xhtml:span>($id, $content, $line) = $token;

            <xhtml:span class="hljs-keyword">switch</xhtml:span> ($id) {
                <xhtml:span class="hljs-keyword">case</xhtml:span> T_NAMESPACE:
                    <xhtml:span class="hljs-comment">// Namespace found; grab it for later</xhtml:span>
                    $namespace = <xhtml:span class="hljs-string">''</xhtml:span>;
                    $done      = <xhtml:span class="hljs-keyword">false</xhtml:span>;
                    <xhtml:span class="hljs-keyword">do</xhtml:span> {
                        ++$i;
                        $token = $tokens[$i];
                        <xhtml:span class="hljs-keyword">if</xhtml:span> (is_string($token)) {
                            <xhtml:span class="hljs-keyword">if</xhtml:span> (<xhtml:span class="hljs-string">';'</xhtml:span> === $token) {
                                $done = <xhtml:span class="hljs-keyword">true</xhtml:span>;
                            }
                            <xhtml:span class="hljs-keyword">continue</xhtml:span>;
                        }
                        <xhtml:span class="hljs-keyword">list</xhtml:span>($type, $content, $line) = $token;
                        <xhtml:span class="hljs-keyword">switch</xhtml:span> ($type) {
                            <xhtml:span class="hljs-keyword">case</xhtml:span> T_STRING:
                            <xhtml:span class="hljs-keyword">case</xhtml:span> T_NS_SEPARATOR:
                                $namespace .= $content;
                                <xhtml:span class="hljs-keyword">break</xhtml:span>;
                        }
                    } <xhtml:span class="hljs-keyword">while</xhtml:span> (!$done &amp;&amp; $i &lt; $count);

                    <xhtml:span class="hljs-comment">// Set the namespace of this file in the object</xhtml:span>
                    $file-&gt;namespace = $namespace;
                    <xhtml:span class="hljs-keyword">break</xhtml:span>;
                <xhtml:span class="hljs-keyword">case</xhtml:span> T_ABSTRACT:
                <xhtml:span class="hljs-keyword">case</xhtml:span> T_CLASS:
                <xhtml:span class="hljs-keyword">case</xhtml:span> T_INTERFACE:
                    <xhtml:span class="hljs-comment">// Abstract class, class, or interface found</xhtml:span>

                    <xhtml:span class="hljs-comment">// Get the classname</xhtml:span>
                    $class = <xhtml:span class="hljs-string">''</xhtml:span>;
                    <xhtml:span class="hljs-keyword">do</xhtml:span> {
                        ++$i;
                        $token = $tokens[$i];
                        <xhtml:span class="hljs-keyword">if</xhtml:span> (is_string($token)) {
                            <xhtml:span class="hljs-keyword">continue</xhtml:span>;
                        }
                        <xhtml:span class="hljs-keyword">list</xhtml:span>($type, $content, $line) = $token;
                        <xhtml:span class="hljs-keyword">switch</xhtml:span> ($type) {
                            <xhtml:span class="hljs-keyword">case</xhtml:span> T_STRING:
                                $class = $content;
                                <xhtml:span class="hljs-keyword">break</xhtml:span>;
                        }
                    } <xhtml:span class="hljs-keyword">while</xhtml:span> (<xhtml:span class="hljs-keyword">empty</xhtml:span>($class) &amp;&amp; $i &lt; $count);

                    <xhtml:span class="hljs-comment">// If a classname was found, set it in the object, and </xhtml:span>
                    <xhtml:span class="hljs-comment">// return boolean true (found)</xhtml:span>
                    <xhtml:span class="hljs-keyword">if</xhtml:span> (!<xhtml:span class="hljs-keyword">empty</xhtml:span>($class)) {
                        $file-&gt;classname = $class;
                        <xhtml:span class="hljs-keyword">return</xhtml:span> <xhtml:span class="hljs-keyword">true</xhtml:span>;
                    }
                    <xhtml:span class="hljs-keyword">break</xhtml:span>;
                <xhtml:span class="hljs-keyword">default</xhtml:span>:
                    <xhtml:span class="hljs-keyword">break</xhtml:span>;
            }
            ++$i;
        }

        <xhtml:span class="hljs-comment">// No class-type tokens found; return false</xhtml:span>
        <xhtml:span class="hljs-keyword">return</xhtml:span> <xhtml:span class="hljs-keyword">false</xhtml:span>;
    }
}
</xhtml:code></xhtml:pre>
<xhtml:p><xhtml:em>Note: the Exceptions thrown in this class are defined in the
same namespace; I'll leave how they're implemented to your
imagination.</xhtml:em></xhtml:p>
<xhtml:h2>Iterating Faster</xhtml:h2>
<xhtml:p>The next trick I discovered was in the form of
<xhtml:code>iterator_apply()</xhtml:code>. Normally when I use iterators, I use
<xhtml:code>foreach</xhtml:code>, because, well, that's what you do. But in
looking through the various iterators for this exercise, I stumbled
across this gem.</xhtml:p>
<xhtml:p>Basically, you pass the iterator, a callback, and argument(s)
you want passed to the callback. Like <xhtml:code>FilterIterator</xhtml:code>,
you don't get the actual item returned by the iterator, so in most
use cases, you pass the iterator itself:</xhtml:p>
<xhtml:pre><xhtml:code class="language-php hljs php" data-lang="php">iterator_apply($it, $callback, <xhtml:span class="hljs-keyword">array</xhtml:span>($it));
</xhtml:code></xhtml:pre>
<xhtml:p>You can then grab the current value and/or key from the iterator
itself:</xhtml:p>
<xhtml:pre><xhtml:code class="language-php hljs php" data-lang="php"><xhtml:span class="hljs-keyword">public</xhtml:span> <xhtml:span class="hljs-function"><xhtml:span class="hljs-keyword">function</xhtml:span> <xhtml:span class="hljs-title">process</xhtml:span><xhtml:span class="hljs-params">(Iterator $it)</xhtml:span>
</xhtml:span>{
    $value = $it-&gt;current();
    $key   = $it-&gt;key();
    <xhtml:span class="hljs-comment">// ...</xhtml:span>
}
</xhtml:code></xhtml:pre>
<xhtml:p>While you can use any valid PHP callback, I found the most
interesting solution was to use a closure, as it allows you to
define everything up front:</xhtml:p>
<xhtml:pre><xhtml:code class="language-php hljs php" data-lang="php">iterator_apply($it, <xhtml:span class="hljs-function"><xhtml:span class="hljs-keyword">function</xhtml:span><xhtml:span class="hljs-params">()</xhtml:span> <xhtml:span class="hljs-title">use</xhtml:span> <xhtml:span class="hljs-params">($it)</xhtml:span> </xhtml:span>{
    $value = $it-&gt;current();
    $key   = $it-&gt;key();
    <xhtml:span class="hljs-comment">// ...</xhtml:span>
});
</xhtml:code></xhtml:pre>
<xhtml:p>If you pass in a local value via a <xhtml:code>use</xhtml:code> statement,
you can do some aggregation:</xhtml:p>
<xhtml:pre><xhtml:code class="language-php hljs php" data-lang="php">$map = <xhtml:span class="hljs-keyword">new</xhtml:span> \stdClass;
iterator_apply($it, <xhtml:span class="hljs-function"><xhtml:span class="hljs-keyword">function</xhtml:span><xhtml:span class="hljs-params">()</xhtml:span> <xhtml:span class="hljs-title">use</xhtml:span> <xhtml:span class="hljs-params">($it, $map)</xhtml:span> </xhtml:span>{
    $file = $it-&gt;current();
    $namespace = !<xhtml:span class="hljs-keyword">empty</xhtml:span>($file-&gt;namespace) ? $file-&gt;namespace . <xhtml:span class="hljs-string">'\' : '</xhtml:span><xhtml:span class="hljs-string">';
    $classname = $namespace . $file-&gt;classname;
    $map-&gt;{$classname} = $file-&gt;getPathname();
});
</xhtml:span></xhtml:code></xhtml:pre>
<xhtml:p>Not only is this a nice, concise technique, it's also
tremendously fast — I was finding it was 200%–300% faster than
using a traditional <xhtml:code>foreach</xhtml:code> loop. Clearly it cannot be
used in all situations, but if you <xhtml:em>can</xhtml:em> use it, you
probably should.</xhtml:p>
<xhtml:p>So, start playing with <xhtml:code>FilterIterator</xhtml:code> and
<xhtml:code>iterator_apply()</xhtml:code> if you haven't already — the two
offer tremendous possibilities and capabilities for your
applications.</xhtml:p>
<xhtml:div class="h-entry"><xhtml:img class="u-photo photo" width="50" src="https://avatars0.githubusercontent.com/u/25943?v=3&amp;u=79dd2ea1d4d8855944715d09ee4c86215027fa80&amp;s=140" alt="matthew"/> <xhtml:a class="u-url u-uid p-name" href="https://mwop.net/blog/244-Applying-FilterIterator-to-Directory-Iteration.html">
Applying FilterIterator to Directory Iteration</xhtml:a> was originally
published <xhtml:time class="dt-published" datetime="2010-08-16T10:30:00-05:00">16 August 2010</xhtml:time> on <xhtml:a href="https://mwop.net">https://mwop.net</xhtml:a> by <xhtml:a rel="author" class="p-author" href="https://mwop.net">Matthew Weier
O'Phinney</xhtml:a>.</xhtml:div>
</xhtml:div>
    </content>
  </entry>
</feed>
