<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
  <title type="text">Blog entries tagged git :: mwop.net</title>
  <updated>2023-12-18T15:34:00-06:00</updated>
  <generator uri="https://getlaminas.org" version="2">Laminas_Feed_Writer</generator>
  <link rel="alternate" type="text/html" href="https://mwop.net/blog/tag/git"/>
  <link rel="self" type="application/atom+xml" href="https://mwop.net/blog/tag/git/atom.xml"/>
  <id>https://mwop.net/blog/tag/git</id>
  <entry xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <title type="html"><![CDATA[Advent 2023: (n)vim Plugins: vim-fugitive]]></title>
    <published>2023-12-18T15:34:00-06:00</published>
    <updated>2023-12-18T15:34:00-06:00</updated>
    <link rel="alternate" type="text/html" href="https://mwop.net/blog/2023-12-18-advent-vim-fugitive.html"/>
    <id>https://mwop.net/blog/2023-12-18-advent-vim-fugitive.html</id>
    <author>
      <name>Matthew Weier O'Phinney</name>
      <email>contact@mwop.net</email>
      <uri>https://mwop.net</uri>
    </author>
    <content xmlns:xhtml="http://www.w3.org/1999/xhtml" type="xhtml">
      <xhtml:div xmlns:xhtml="http://www.w3.org/1999/xhtml"><xhtml:p>Because I've spent most of my professional life coding, I've
also spent a lot of time using source control. I've been using
specifically <xhtml:a href="https://git-scm.com/">git</xhtml:a> for many years
(even pre-dating the Zend Framework migration from <xhtml:a href="https://subversion.apache.org">Subversion</xhtml:a>). While I typically
use a terminal multiplexer (for me, that's <xhtml:a href="https://github.com/tmux/tmux/wiki">tmux</xhtml:a>; for others, that
might be <xhtml:a href="https://www.gnu.org/software/screen/">screen</xhtml:a>), and can move to
another pane or create one quickly in order to run source control
commands, doing so interrupts flow.</xhtml:p>
<xhtml:p>That's where <xhtml:a href="https://github.com/tpope/vim-fugitive">vim-fugitive</xhtml:a> comes into
play.</xhtml:p>
<xhtml:h3>What does it solve?</xhtml:h3>
<xhtml:p>Fugitive integrates with git, plain and simple. It exposes a
number of commands and functions that allow you to do common
operations quickly, but also has some deeper bindings to allow
doing more complex things such as viewing a file from previous
commits, or performing a diff between the staged and working
version, or using <xhtml:code>git blame</xhtml:code> within vim.</xhtml:p>
<xhtml:h3>How do I use it?</xhtml:h3>
<xhtml:p>Admittedly, I use a very small subset of what Fugitive
provides.</xhtml:p>
<xhtml:p>On a daily basis, I use <xhtml:code>:Gwrite</xhtml:code> to stage changes,
and <xhtml:code>:G</xhtml:code> to view the status of the working tree. When in
the status view, I often use <xhtml:code>cc</xhtml:code> to
<xhtml:strong>c</xhtml:strong>ommit <xhtml:strong>c</xhtml:strong>hanges, which splits open
a pane for writing the commit message. I also use
<xhtml:code>:GRemove</xhtml:code> when I want to remove a file from the
tree.</xhtml:p>
<xhtml:p>Something else that has come in handy when reviewing code with
others: <xhtml:code>:GBrowse</xhtml:code> can open the file in the canonical
repository, using the visual selection as the line range, allowing
you to quickly share a link to specific code to review.</xhtml:p>
<xhtml:h3>Final Thoughts</xhtml:h3>
<xhtml:p>This plugin does exactly what it says on the tin. I love the
fact that it integrates with the underlying <xhtml:code>git</xhtml:code>
command, as that follows the Unix Philosophy of doing one thing
well, and piping out to other processes to perform complex
behavior. For me, the fact that I can stay directly within my
editor and still get full access to git when needed is tremondously
powerful.</xhtml:p>
<xhtml:div class="h-entry"><xhtml:img class="u-photo photo" width="50" src="https://avatars0.githubusercontent.com/u/25943?v=3&amp;u=79dd2ea1d4d8855944715d09ee4c86215027fa80&amp;s=140" alt="matthew"/> <xhtml:a class="u-url u-uid p-name" href="https://mwop.net/blog/2023-12-18-advent-vim-fugitive.html">Advent
2023: (n)vim Plugins: vim-fugitive</xhtml:a> was originally published
<xhtml:time class="dt-published" datetime="2023-12-18T15:34:00-06:00">18
December 2023</xhtml:time> on <xhtml:a href="https://mwop.net">https://mwop.net</xhtml:a> by <xhtml:a rel="author" class="p-author" href="https://mwop.net">Matthew Weier
O'Phinney</xhtml:a>.</xhtml:div>
</xhtml:div>
    </content>
  </entry>
  <entry xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <title type="html"><![CDATA[Splitting the ZF2 Components]]></title>
    <published>2015-05-15T19:30:00-05:00</published>
    <updated>2015-05-15T19:30:00-05:00</updated>
    <link rel="alternate" type="text/html" href="https://mwop.net/blog/2015-05-15-splitting-components-with-git.html"/>
    <id>https://mwop.net/blog/2015-05-15-splitting-components-with-git.html</id>
    <author>
      <name>Matthew Weier O'Phinney</name>
      <email>contact@mwop.net</email>
      <uri>https://mwop.net</uri>
    </author>
    <content xmlns:xhtml="http://www.w3.org/1999/xhtml" type="xhtml">
      <xhtml:div xmlns:xhtml="http://www.w3.org/1999/xhtml"><xhtml:p>Today we accomplished one of the major goals towards Zend
Framework 3: splitting the various components into their own
repositories. This proved to be a huge challenge, due to the amount
of history in our repository (the git repository has history going
back to 2009, around the time ZF 1.8 was released!), and the goals
we had for what component repositories should look like. This is
the story of how we made it happen.</xhtml:p>
<xhtml:h2>Why split them at all?</xhtml:h2>
<xhtml:p>"But you can already install components individually!"</xhtml:p>
<xhtml:p>True, but if you knew how that occurs, you'd cringe. We've tried
a variety of solutions, and every single one has failed us at some
point or another, typically when we move to a new minor version of
the framework, but occasionally even on trivial bugfix releases.
We've tried <xhtml:code>filter-branch</xhtml:code> with
<xhtml:code>subdirectory-filter</xhtml:code>, we've tried <xhtml:code>subtree
split</xhtml:code>, and even <xhtml:a href="https://github.com/dflydev/git-subsplit">subsplit</xhtml:a>. We've used
manual scripts that rsync the contents of each commit and create a
reference commit. Our current version is a combination of several
approaches, but we've found we must run it manually and verify the
results before pushing, as we've had a number of situations, as
recently as the 2.4.0 release, where contents were not correct.</xhtml:p>
<xhtml:p>On top of all this, there's another concern: why do all
components get bumped in version, even when no changes are present?
As an example, a number of components have had zero new
<xhtml:em>features</xhtml:em> since the 2.0 release; they're either stable, or
have smaller user bases. It doesn't make sense to bump their
versions, but they get bumped regardless whenever we do a new
release of the framework. When we start considering a new major
version of the framework, it doesn't necessarily make sense to bump
such components, as there will be literally zero breaking changes,
and, in many cases, no new features.</xhtml:p>
<xhtml:p>In other cases, such as the <xhtml:code>EventManager</xhtml:code>,
<xhtml:code>ServiceManager</xhtml:code>, and a handful of other components, we
know that these will require major versions due to necessary
architectural changes. However, as long as we're still developing
minor release branches of the framework, we cannot have meaningful
development on those features due to the complexities of keeping
changes in sync between branches.</xhtml:p>
<xhtml:p>In short, we'd like to be able to version the individual
components separately, in their own cycles.</xhtml:p>
<xhtml:p>On top of that, when we look at maintenance, having a monolithic
repository poses a challenge: we have to limit the number of
developers with commit rights to ensure that those who <xhtml:em>can</xhtml:em>
commit are aware of the impact a change might have across the
framework. This means that a number of developers with time and
energy to spend on improving a single component or small subset of
components are hampered by how quickly their changes can be
reviewed by the maintainers.</xhtml:p>
<xhtml:p>Splitting the components gives us the opportunity to expand the
number of contributors with commit access. The framework itself can
pin to specific versions of components, and maintainers with commit
access to the <xhtml:em>framework</xhtml:em> can review and change those
versions based on integration and smoke tests. In the meantime, a
larger set of contributors can be gradually improving the
individual components, and <xhtml:em>users</xhtml:em> can selectively adopt
those new versions into their applications, on their own review
cycles.</xhtml:p>
<xhtml:p>In the end:</xhtml:p>
<xhtml:ul>
<xhtml:li>We get components that follow <xhtml:a href="http://semver.org">Semantic Versioning</xhtml:a> properly.</xhtml:li>
<xhtml:li>We get accelerated development in components that need it.</xhtml:li>
<xhtml:li>We expand the number of active, able maintainers.</xhtml:li>
<xhtml:li>We enable users to adopt new features at their own pace.</xhtml:li>
<xhtml:li>We retain framework stability.</xhtml:li>
</xhtml:ul>
<xhtml:h2>The Goal</xhtml:h2>
<xhtml:p>Since we branched ZF2 development, our repository has looked
something like the following:</xhtml:p>
<xhtml:pre><xhtml:code class="language-asciidoc hljs asciidoc" data-lang="asciidoc"><xhtml:span class="hljs-title">.coveralls.yml</xhtml:span>
<xhtml:span class="hljs-title">.gitattributes</xhtml:span>
<xhtml:span class="hljs-title">.gitignore</xhtml:span>
<xhtml:span class="hljs-title">.php_cs</xhtml:span>
<xhtml:span class="hljs-title">.travis.yml</xhtml:span>
bin/
CHANGELOG.md
composer.json
CONTRIBUTING.md
demos/
INSTALL.md
library/
<xhtml:span class="hljs-code">    Zend/</xhtml:span>
<xhtml:span class="hljs-code">        {component directories}</xhtml:span>
LICENSE.txt
README-GIT.md
README.md
resources/
tests/
<xhtml:span class="hljs-code">    _autoload.php</xhtml:span>
<xhtml:span class="hljs-code">    Bootstrap.php</xhtml:span>
<xhtml:span class="hljs-code">    phpunit.xml.dist</xhtml:span>
<xhtml:span class="hljs-code">    run-tests.php</xhtml:span>
<xhtml:span class="hljs-code">    run-tests.sh</xhtml:span>
<xhtml:span class="hljs-code">    TestConfiguration.php.dist</xhtml:span>
<xhtml:span class="hljs-code">    TestConfiguration.php.travis</xhtml:span>
<xhtml:span class="hljs-code">    ZendTest/</xhtml:span>
<xhtml:span class="hljs-code">        {component directories}</xhtml:span>
</xhtml:code></xhtml:pre>
<xhtml:p>The structure follows <xhtml:a href="http://www.php-fig.org/psr/psr-0/">PSR-0</xhtml:a>, with each component
below the <xhtml:code>library/Zend/</xhtml:code> directory.</xhtml:p>
<xhtml:p>The goal is to have individual component repositories, each with
the following structure:</xhtml:p>
<xhtml:pre><xhtml:code class="language-asciidoc hljs asciidoc" data-lang="asciidoc"><xhtml:span class="hljs-title">.coveralls.yml</xhtml:span>
<xhtml:span class="hljs-title">.gitattributes</xhtml:span>
<xhtml:span class="hljs-title">.gitignore</xhtml:span>
<xhtml:span class="hljs-title">.php_cs</xhtml:span>
<xhtml:span class="hljs-title">.travis.yml</xhtml:span>
composer.json
CONTRIBUTING.md
src/
LICENSE.txt
phpunit.xml.dist
phpunit.xml.travis
README.md
test/
<xhtml:span class="hljs-code">    bootstrap.php</xhtml:span>
<xhtml:span class="hljs-code">    {component test cases}</xhtml:span>
</xhtml:code></xhtml:pre>
<xhtml:p>In the above structure, note the following differences:</xhtml:p>
<xhtml:ul>
<xhtml:li>Source and unit test files now follow <xhtml:a href="http://www.php-fig.org/psr/psr-4/">PSR-4</xhtml:a>, and can be found
directly beneath the new <xhtml:code>src/</xhtml:code> and <xhtml:code>test/</xhtml:code>
directories (which replace <xhtml:code>library/</xhtml:code> and
<xhtml:code>tests/</xhtml:code>, respectively), without any directory nesting
based on namespace (unless any subnamespaces are present).</xhtml:li>
<xhtml:li>The <xhtml:code>README.md</xhtml:code> file will need to be specific to the
component. Additionally, it can incorporate what was in the
<xhtml:code>INSTALL.md</xhtml:code> file originally.</xhtml:li>
<xhtml:li>The <xhtml:code>composer.json</xhtml:code> file will need to be for the
component, not the framework. Additionally, we don't currently list
dev/testing dependencies in our component repos, so those will need
to be added.</xhtml:li>
<xhtml:li>The <xhtml:code>TestConfiguration.php.*</xhtml:code> files define constants
referenced by the unit tests; those can be migrated to the
<xhtml:code>phpunit.xml.*</xhtml:code> files — which we can move to the project
root to simplify testing.</xhtml:li>
<xhtml:li>The <xhtml:code>.travis.yml</xhtml:code> file can be streamlined, as we're
now only testing one component.</xhtml:li>
<xhtml:li>Most testing infrastructure can be removed, as it's around
simplifying running tests for individual components within the
larger framework. The <xhtml:code>Bootstrap.php</xhtml:code> gets renamed to
<xhtml:code>bootstrap.php</xhtml:code> to avoid being confused with unit test
files.</xhtml:li>
<xhtml:li><xhtml:code>README-GIT.md</xhtml:code> gets replaced with a lengthier
<xhtml:code>CONTRIBUTING.md</xhtml:code> file.</xhtml:li>
</xhtml:ul>
<xhtml:p>On top of all this, we had the following requirements:</xhtml:p>
<xhtml:ul>
<xhtml:li>The components <xhtml:strong>MUST</xhtml:strong> have full history from
2.0.0rc7 forward. This is so those working on the components can
see the <xhtml:em>why</xhtml:em> and <xhtml:em>who</xhtml:em> behind commits.</xhtml:li>
<xhtml:li>Commit messages <xhtml:strong>MUST</xhtml:strong> reference original issues
and pull requests on the ZF2 repository; again, this is to
facilitate the <xhtml:em>why</xhtml:em> behind changes.</xhtml:li>
<xhtml:li>Ideally, history should contain <xhtml:em>only</xhtml:em> history for the
given component.</xhtml:li>
<xhtml:li>The directory structure in <xhtml:em>each</xhtml:em> commit, including (and
especially!) tags, <xhtml:strong>MUST</xhtml:strong> follow the proposed
structure.</xhtml:li>
</xhtml:ul>
<xhtml:h2>How we got there</xhtml:h2>
<xhtml:p>One of the huge benefits to using Git is the ability to rewrite
history. (It's also one of its scariest features.) It provides a
number of facilities for doing so, from <xhtml:code>rebase</xhtml:code> to
grafts to <xhtml:code>subtree</xhtml:code> to <xhtml:code>filter-branch</xhtml:code>. In
our component split research, we evaluated several solutions.</xhtml:p>
<xhtml:h3>Grafts</xhtml:h3>
<xhtml:p><xhtml:a href="https://git.wiki.kernel.org/index.php/GraftPoint">Grafts</xhtml:a>
provide a way to merge two different lines of history together,
but, for our purposes, also allow us to <xhtml:em>prune</xhtml:em> history. Why
would we do this? Because we don't really need history prior to
2.0.0 development at this point. In large part, this is because
it's irrelevant; files were moved around and changed so much
between forking from the 1.X tree and 2.0 that tracing the history
is quite difficult.</xhtml:p>
<xhtml:p>I eventually found a methodology for pruning that looks like
this:</xhtml:p>
<xhtml:pre><xhtml:code class="language-bash hljs bash" data-lang="bash">$ <xhtml:span class="hljs-built_in">echo</xhtml:span> bb50be26b24a9e0e62a8f4abecce53259d707b61 &gt; .git/info/grafts
$ git filter-branch --tag-name-filter cat -- --all
$ git reflog expire --expire=now --all
$ git gc --prune=now --aggressive
$ rm .git/info/grafts
</xhtml:code></xhtml:pre>
<xhtml:p>It's supposed to essentially remove history before the given
sha1. What I found was that by itself, I noticed little to no
change in the repository, other than size; I could still reach
earlier commits. However, when coupled with the final techniques we
used, it meant that we effectively saw no commits prior to this
point.</xhtml:p>
<xhtml:h3>subtree</xhtml:h3>
<xhtml:p><xhtml:code>git subtree</xhtml:code> is a "contributed" git command; it's
not available in default distributions of git, but often available
as an add-on package; if you install git from source, it's in the
<xhtml:code>contrib</xhtml:code> tree, where you can compile and install it.
Subtree provides a rich set of functionality around dealing with
repository subtrees, allowing you to split them off, add subtrees
from other projects, and even push commits back and forth between
them.</xhtml:p>
<xhtml:p>At first blush, it seems like an ideal, simple solution:</xhtml:p>
<xhtml:ul>
<xhtml:li>Split each of the <xhtml:code>library/</xhtml:code> and <xhtml:code>tests/</xhtml:code>
component subtrees into their own branches.</xhtml:li>
<xhtml:li>Create a new repository, and add each of the above as
subtrees.</xhtml:li>
</xhtml:ul>
<xhtml:pre><xhtml:code class="language-bash hljs bash" data-lang="bash">$ git <xhtml:span class="hljs-built_in">clone</xhtml:span> zendframework/zf2
$ git init zend-http
$ <xhtml:span class="hljs-built_in">cd</xhtml:span> zf2
$ git subtree split --prefix=library/Zend/Http -b src
$ git subtree split --prefix=tests/ZendTest/Http -b <xhtml:span class="hljs-built_in">test</xhtml:span>
$ <xhtml:span class="hljs-built_in">cd</xhtml:span> ../zend-http
$ <xhtml:span class="hljs-comment"># add in basic assets, and create initial commit</xhtml:span>
$ git remote add zf2 ../zf2
$ git subtree add --prefix=src/ zf2 src
$ git subtree add --prefix=<xhtml:span class="hljs-built_in">test</xhtml:span>/ zf2 <xhtml:span class="hljs-built_in">test</xhtml:span>
</xhtml:code></xhtml:pre>
<xhtml:p>Indeed, if you do the above, when done, the directory looks
exactly like it should! <xhtml:strong>However</xhtml:strong>, the history is
all wrong; if you check out any tags, you get the full ZF2 tree for
the tag. As such, subtree fails one of the most important criteria
right off the bat: that each commit and tag represent <xhtml:em>only</xhtml:em>
the component.</xhtml:p>
<xhtml:h3>subdirectory-filter</xhtml:h3>
<xhtml:p><xhtml:code>subdirectory-filter</xhtml:code> is one of the <xhtml:code>git
filter-branch</xhtml:code> strategies. It operates similarly to
<xhtml:code>subtree</xhtml:code>, but also rewrites history. We used <xhtml:a href="https://gist.github.com/ralphschindler/9494556">this approach</xhtml:a>
when splitting the various "service" (API wrapper) components from
the main repository prior to the first ZF2 stable release.</xhtml:p>
<xhtml:p>The basic idea is similar to that of <xhtml:code>subtree</xhtml:code>; the
difference is that you have to begin with separate checkouts for
each of the source and tests.</xhtml:p>
<xhtml:pre><xhtml:code class="language-bash hljs bash" data-lang="bash">$ git <xhtml:span class="hljs-built_in">clone</xhtml:span> zendframework/zf2 zend-http-src
$ git <xhtml:span class="hljs-built_in">clone</xhtml:span> zendframework/zf2 zend-http-test
$ <xhtml:span class="hljs-built_in">cd</xhtml:span> zend-http-src
$ git filter-branch --subdirectory-filter library/Zend/Http --tag-name-filter cat -- -all
$ <xhtml:span class="hljs-built_in">cd</xhtml:span> ../zend-http-test
$ git filter-branch --subdirectory-filter tests/ZendTest/Http --tag-name-filter cat -- -all
$ <xhtml:span class="hljs-built_in">cd</xhtml:span> ..
$ git init zend-http
$ <xhtml:span class="hljs-built_in">cd</xhtml:span> zend-http
<xhtml:span class="hljs-comment"># add in basic assets, and create initial commit</xhtml:span>
$ git remote add -f src ../zend-http-src
$ git remote add -f <xhtml:span class="hljs-built_in">test</xhtml:span> ../zend-http-test
$ git merge -s ours --no-commit src/master
$ git <xhtml:span class="hljs-built_in">read</xhtml:span>-tree -u --prefix=src/ src/master
$ git commit -m <xhtml:span class="hljs-string">'Merging src tree'</xhtml:span>
$ git merge -s ours --no-commit <xhtml:span class="hljs-built_in">test</xhtml:span>/master
$ git <xhtml:span class="hljs-built_in">read</xhtml:span>-tree -u --prefix=<xhtml:span class="hljs-built_in">test</xhtml:span>/ <xhtml:span class="hljs-built_in">test</xhtml:span>/master
$ git commit -m <xhtml:span class="hljs-string">'Merging test tree'</xhtml:span>
</xhtml:code></xhtml:pre>
<xhtml:p>Again, this looks great at first blush; all the contents for the
given component are rewritten perfectly. But when you start looking
at previous tags and commits, you see an interesting picture: based
on the commit and which remote you added first, you'll see a
completely different directory structure. Like
<xhtml:code>subtree</xhtml:code>, this fails our criteria that the repo be in a
usable state at any given commit.</xhtml:p>
<xhtml:h3>tree-filter</xhtml:h3>
<xhtml:p>Like <xhtml:code>subdirectory-filter</xhtml:code>, <xhtml:code>tree-filter</xhtml:code>
is a <xhtml:code>filter-branch</xhtml:code> strategy. <xhtml:code>tree-filter</xhtml:code>
allows you to rewrite the tree contents <xhtml:em>any way you want</xhtml:em>,
while retaining the commit message and metadata. This turned out to
be what we were looking for!</xhtml:p>
<xhtml:p>However, there were a few more pieces we needed to address:</xhtml:p>
<xhtml:ul>
<xhtml:li>Rewriting commit messages referencing issues and pull requests
to link to the main ZF2 repository.</xhtml:li>
<xhtml:li>Pruning empty commits.</xhtml:li>
<xhtml:li>Ensuring tags contain the expected tree.</xhtml:li>
</xhtml:ul>
<xhtml:p>Fortunately, <xhtml:code>filter-branch</xhtml:code> has other strategies for
just these purposes:</xhtml:p>
<xhtml:ul>
<xhtml:li><xhtml:code>msg-filter</xhtml:code> allows you to rewrite commit
messages.</xhtml:li>
<xhtml:li><xhtml:code>commit-filter</xhtml:code> provides tools for detecting and
removing empty commits.</xhtml:li>
<xhtml:li><xhtml:code>tag-name-filter</xhtml:code> ensures that tag references are
rewritten when the parent commits change or are removed.</xhtml:li>
</xhtml:ul>
<xhtml:p>So, what we ended up with was something like the following:</xhtml:p>
<xhtml:pre><xhtml:code class="language-bash hljs bash" data-lang="bash">git filter-branch -f \
    --tree-filter <xhtml:span class="hljs-string">"php /path/to/tree-filter.php"</xhtml:span> \
    --msg-filter <xhtml:span class="hljs-string">"sed -re 's/(^|[^a-zA-Z])(\#[1-9][0-9]*)/zendframework\/zf2/g'"</xhtml:span> \
    --commit-filter <xhtml:span class="hljs-string">'git_commit_non_empty_tree "$@"'</xhtml:span> \
    --tag-name-filter cat \
    -- --all
</xhtml:code></xhtml:pre>
<xhtml:p><xhtml:code>/path/to/tree-filter.php</xhtml:code> is a script that contains
the logic for re-arranging the directory structure, as well as
rewriting the contents of files as necessary (e.g., rewriting the
contents of <xhtml:code>composer.json</xhtml:code>, or filling in the name of
the component in the <xhtml:code>CONTRIBUTING.md</xhtml:code>). The
<xhtml:code>msg-filter</xhtml:code> looks for issue and pull request
identifiers (a <xhtml:code>#</xhtml:code> character followed by one or more
digits), and rewrites them to reference the repository. The
<xhtml:code>commit-filter</xhtml:code> checks to see if the repository contents
have changed in this commit, and, if not, instructs
<xhtml:code>git</xhtml:code> to ignore the commit (and, since
<xhtml:code>tree-filter</xhtml:code> always executes before
<xhtml:code>commit-filter</xhtml:code>, the comparison is always between
rewritten trees). The <xhtml:code>tag-name-filter</xhtml:code>
<xhtml:strong>MUST</xhtml:strong> be present, and essentially just ensures that
the tag is rewritten; if absent, tags are not rewritten, and refer
to the original contents!</xhtml:p>
<xhtml:h3>Stumbling blocks</xhtml:h3>
<xhtml:p>We had a few stumbling blocks getting the above to work. The
first was that, for purposes of testing, we had to specify a
<xhtml:em>commit range</xhtml:em>, instead of <xhtml:code>-- --all</xhtml:code>. This was
necessary because of the size of the repo; at ~27k commits, running
over every single commit can take between 5 and 12 hours, depending
on git version, HDD vs ramdisk, speed of I/O, etc. For small
subsets, we could get consistent results. When we expanded the
range, we started seeing strange errors, such as some tags not
getting written.</xhtml:p>
<xhtml:p>To compound the situation, we also made a last minute change to
only do history from the 2.0.0rc7 tag forward, and this is when
things completely fell apart. A large number of tags would not get
rewritten, the set of malformed tags varied between components, and
we couldn't figure out why.</xhtml:p>
<xhtml:p>At a certain point, I recalled that <xhtml:code>git</xhtml:code> stores
commits as a tree, and that's when I realized what was happening:
when we specified a commit range, we were essentially specifying a
specific path through the commits. If a tag was made on a branch
falling outside that path, it would not get rewritten.</xhtml:p>
<xhtml:p>This meant that the only way to get consistent results that met
our criteria was to run a test over the full history. Fortunately,
sometime around that point, a community member, <xhtml:a href="http://www.renatomefi.com.br/">Renato</xhtml:a>, suggested I try a run
using a <xhtml:a href="http://en.wikipedia.org/wiki/Tmpfs">tmpfs
filesystem</xhtml:a> — essentially a ramdisk. This sped up runs by a
factor of 2, and I was able to validate my hypothesis within an
evening.</xhtml:p>
<xhtml:p>Another stumbling block was empty commits. We originally used
<xhtml:code>filter-branch</xhtml:code>'s <xhtml:code>--prune-empty</xhtml:code> switch, but
found it was generally unreliable when used with
<xhtml:code>tree-filter</xhtml:code>. The solution to this problem is the
<xhtml:code>commit-filter</xhtml:code> as listed above; it did a stellar
job.</xhtml:p>
<xhtml:h3>Empty merge commits</xhtml:h3>
<xhtml:p>There was one lingering issue, however: when inspecting the
filtered repository, we still had a large number of empty merge
commits that had nothing to do with the component. After a lot of
searching, I found this gem:</xhtml:p>
<xhtml:pre><xhtml:code class="language-bash hljs bash" data-lang="bash">$ git filter-branch -f \
&gt; --commit-filter <xhtml:span class="hljs-string">'
&gt;    if [ z$1 = z`git rev-parse $3^{tree}` ];then
&gt;        skip_commit "$@";
&gt;    else
&gt;        git commit-tree "$@";
&gt; fi'</xhtml:span> \
&gt; --tag-name-filter cat -- --all
$ git reflog expire --expire=now --all
$ git gc --prune=now --aggressive
</xhtml:code></xhtml:pre>
<xhtml:p>The above uses a <xhtml:code>commit-filter</xhtml:code> which internally
uses <xhtml:code>rev-parse</xhtml:code> to determine if the commit is a merge
and that both parents are present in the repository; if not, it
skips (removes) the commit. The <xhtml:code>reflog expire</xhtml:code> and
<xhtml:code>gc</xhtml:code> commands clean up and remove any objects in the
repository that are now no longer reachable.</xhtml:p>
<xhtml:h2>Final Solution</xhtml:h2>
<xhtml:p>With a working <xhtml:code>graft</xhtml:code>, <xhtml:code>tree-filter</xhtml:code>, and
<xhtml:code>commit-filter</xhtml:code> in place, we could finally proceed. We
created a repository containing all scripts we needed, as well as
the assets necessary for rewriting the component repository trees.
We then had a tool that could be executed as simply as:</xhtml:p>
<xhtml:pre><xhtml:code class="language-bash hljs bash" data-lang="bash">$ ./bin/split.sh -c Authentication 2&gt;&amp;1 | tee authentication.log
</xhtml:code></xhtml:pre>
<xhtml:p>And with that, we could sit back and watch the component get
split, and push the results when done.</xhtml:p>
<xhtml:p>You can see the work in our <xhtml:a href="https://github.com/zendframework/component-split">component-split</xhtml:a>
repository.</xhtml:p>
<xhtml:h2>But what about the speed?</xhtml:h2>
<xhtml:p>"But didn't you say it takes between 5 and 12 hours to run per
component? And aren't there something like 50 components? That
would take weeks!"</xhtml:p>
<xhtml:p>You're quite astute! And for that, we had a secret weapon: a
community contributor, <xhtml:a href="https://github.com/gianarb">Gianluca Arbezzano</xhtml:a> working for an
AWS partner, <xhtml:a href="http://www.corley.it">Corley</xhtml:a>, which
sponsored splitting all components in parallel at once, allowing us
to complete the entire effort in a single day. I'll let others tell
that story, though!</xhtml:p>
<xhtml:h2>The results</xhtml:h2>
<xhtml:p>I'm quite pleased with the results. The ZF2 repository has ~27k
commits, 67 releases, and over 700 contributors; a clean checkout
is around 150MB. As a contrast, the rewritten
<xhtml:code>zend-http</xhtml:code> component repository ended up with ~1.7k
commits, 50 releases, ~160 contributors, and a clean checkout
clocks in at 5.4MB! So the individual components are substantially
leaner! Additionally, they contain all the QA tooling necessary to
start developing against for those wanting to patch issues or
create features, making development a simpler process.</xhtml:p>
<xhtml:p>The lessons learned:</xhtml:p>
<xhtml:ul>
<xhtml:li><xhtml:code>tree-filter</xhtml:code> is your friend, if your restructuring
involves more than one directory and/or adding or removing
files.</xhtml:li>
<xhtml:li><xhtml:code>tag-name-filter</xhtml:code> <xhtml:strong>MUST</xhtml:strong> be used
anytime you use <xhtml:code>filter-branch</xhtml:code>; otherwise your tags may
end up invalid!</xhtml:li>
<xhtml:li><xhtml:code>filter-branch</xhtml:code> should be used on ranges
<xhtml:em>sparingly</xhtml:em>, and ideally only if you're not worried about
tags. In most cases, you want to run over the entire history.</xhtml:li>
<xhtml:li><xhtml:code>commit-filter</xhtml:code> is your best option for ensuring
empty commits of any type are stripped, particularly if you're
using <xhtml:code>tree-filter</xhtml:code>; the <xhtml:code>--prune-empty</xhtml:code> flag
is not terribly reliable.</xhtml:li>
<xhtml:li>Always do a full test run. It's tempting to use a commit range
to verify that your filters work, but the results will differ from
running over the entire history. Which leads to:</xhtml:li>
<xhtml:li>Schedule plenty of time, particularly if your repository is
large. Those full test runs will take time, and, if you follow the
scientific process and make one change at a time, you may need
quite a few iterations to get your scripts right.</xhtml:li>
</xhtml:ul>
<xhtml:p>All-in-all, this was a stressful, time-consuming, thankless
task. But I <xhtml:em>am</xhtml:em> quite happy with the results; our
components look like they are and were always developed as
first-class components, and have a rich history referencing their
original development as part of the encompassing framework.</xhtml:p>
<xhtml:h2>Kudos!</xhtml:h2>
<xhtml:p>I cannot thank Gianluca and Corley enough for their generous
efforts! What looked like a task that would take days and/or weeks
happened literally overnight, allowing us to complete a major task
in Zend Framework 3 development, and setting the stage for a ton of
new features. Grazie!</xhtml:p>
<xhtml:div class="h-entry"><xhtml:img class="u-photo photo" width="50" src="https://avatars0.githubusercontent.com/u/25943?v=3&amp;u=79dd2ea1d4d8855944715d09ee4c86215027fa80&amp;s=140" alt="matthew"/> <xhtml:a class="u-url u-uid p-name" href="https://mwop.net/blog/2015-05-15-splitting-components-with-git.html">
Splitting the ZF2 Components</xhtml:a> was originally published
<xhtml:time class="dt-published" datetime="2015-05-15T19:30:00-05:00">15
May 2015</xhtml:time> on <xhtml:a href="https://mwop.net">https://mwop.net</xhtml:a>
by <xhtml:a rel="author" class="p-author" href="https://mwop.net">Matthew
Weier O'Phinney</xhtml:a>.</xhtml:div>
</xhtml:div>
    </content>
  </entry>
  <entry xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <title type="html"><![CDATA[Automatic deployment with git and gitolite]]></title>
    <published>2012-06-24T21:50:00-05:00</published>
    <updated>2012-06-24T21:50:00-05:00</updated>
    <link rel="alternate" type="text/html" href="https://mwop.net/blog/2012-06-24-git-deploy.html"/>
    <id>https://mwop.net/blog/2012-06-24-git-deploy.html</id>
    <author>
      <name>Matthew Weier O'Phinney</name>
      <email>contact@mwop.net</email>
      <uri>https://mwop.net</uri>
    </author>
    <content xmlns:xhtml="http://www.w3.org/1999/xhtml" type="xhtml">
      <xhtml:div xmlns:xhtml="http://www.w3.org/1999/xhtml"><xhtml:p>I read a <xhtml:a href="http://seancoates.com/blogs/deploy-on-push-from-github">post
recently by Sean Coates about deploy on push</xhtml:a>. The concept is
nothing new: you set up a hook that listens for commits on specific
branches or tags, and it then deploys your site from that
revision.</xhtml:p>
<xhtml:p>Except I'd not done it myself. This is how I got there.</xhtml:p>
<xhtml:p>Sean's approach uses <xhtml:a href="https://help.github.com/articles/post-receive-hooks">Github
webhooks</xhtml:a>, which are a fantastic concept. Basically, once your
commit completes, Github will send a JSON-encoded payload to a
specific URI. Sean uses this to trigger an API call to a specific
page in his website, which will then trigger a deployment
activity.</xhtml:p>
<xhtml:p>Awesome, this should be easy; I already have a deploy script
written that I trigger manually.</xhtml:p>
<xhtml:p>One small problem: my site, while in Git, is not on Github. I
maintain it on my own <xhtml:a href="https://github.com/sitaramc/gitolite">Gitolite</xhtml:a> repository.
Which means I needed to write my own hooks.</xhtml:p>
<xhtml:p>I originally went down the route of using a post-receive hook.
However, I had problems determining what branch the given commit
was on, despite a variety of advice I found on the subject on
<xhtml:a href="http://stackoverflow.com/">StackOverflow</xhtml:a> and git
mailing lists. I ended up finding a great example using
<xhtml:code>post-update</xhtml:code>, which was actually perfect for my
needs.</xhtml:p>
<xhtml:p>In order to keep the <xhtml:code>post-update</xhtml:code> script
non-blocking when I commit, I made it do very little: It simply
determines what branch the commit was on, and if it was the master
branch, it touches a specific file on the filesystem and finishes.
The entire hook looks like this:</xhtml:p>
<xhtml:pre><xhtml:code class="language-bash hljs bash" data-lang="bash"><xhtml:span class="hljs-meta">#!/bin/bash</xhtml:span>
branch=$(git rev-parse --symbolic --abbrev-ref <xhtml:span class="hljs-variable">$1</xhtml:span>)
<xhtml:span class="hljs-built_in">echo</xhtml:span> <xhtml:span class="hljs-string">"Commit was for branch <xhtml:span class="hljs-variable">$branch</xhtml:span>"</xhtml:span>
<xhtml:span class="hljs-keyword">if</xhtml:span> [[ <xhtml:span class="hljs-string">"<xhtml:span class="hljs-variable">$branch</xhtml:span>"</xhtml:span> == <xhtml:span class="hljs-string">"master"</xhtml:span> ]];<xhtml:span class="hljs-keyword">then</xhtml:span>
    <xhtml:span class="hljs-built_in">echo</xhtml:span> <xhtml:span class="hljs-string">"Preparing to deploy"</xhtml:span>
    <xhtml:span class="hljs-built_in">echo</xhtml:span> <xhtml:span class="hljs-string">"1"</xhtml:span> &gt; /var/<xhtml:span class="hljs-built_in">local</xhtml:span>/mwop.net.update
<xhtml:span class="hljs-keyword">fi</xhtml:span>
</xhtml:code></xhtml:pre>
<xhtml:p>Now I needed something to detect such a push, and act on it.</xhtml:p>
<xhtml:p>I considered using cron for this; it'd be relatively easy to
have it fire up once a minute, and simply act on it. But I decided
instead to write a simple little daemon using perl. Perl daemons
are trivially easy to write, and if you use module such as
<xhtml:code>Proc::Daemon</xhtml:code> and follow a few trivial defensive coding
practices, you can keep memory leaks contained (or at least
minimal). Besides, it gave me a chance to dust off my perl
chops.</xhtml:p>
<xhtml:p>I decided I'd have it check for the file in 30 second intervals,
simply sleeping if no changes were detected. If the file was found,
however, it should attempt to deploy. Additionally, I wanted it to
quit if it was unable to remove the file (as this could lead to
multiple deploy attempts), and log success and failure status of
the deploy. The full script looks like this:</xhtml:p>
<xhtml:pre><xhtml:code class="language-perl hljs perl" data-lang="perl"><xhtml:span class="hljs-comment">#!/usr/bin/perl</xhtml:span>
<xhtml:span class="hljs-keyword">use</xhtml:span> strict;
<xhtml:span class="hljs-keyword">use</xhtml:span> warnings;
<xhtml:span class="hljs-keyword">use</xhtml:span> Proc::Daemon;

Proc::Daemon::Init;

<xhtml:span class="hljs-keyword">my</xhtml:span> $continue = <xhtml:span class="hljs-number">1</xhtml:span>;
$SIG<xhtml:span class="hljs-string">{TERM}</xhtml:span> = <xhtml:span class="hljs-function"><xhtml:span class="hljs-keyword">sub</xhtml:span> </xhtml:span>{ $continue = <xhtml:span class="hljs-number">0</xhtml:span> };

<xhtml:span class="hljs-keyword">my</xhtml:span> $updateFile   = <xhtml:span class="hljs-string">"/var/local/mwop.net.update"</xhtml:span>;
<xhtml:span class="hljs-keyword">my</xhtml:span> $updateScript = <xhtml:span class="hljs-string">"/home/matthew/bin/deploy-mwop"</xhtml:span>;
<xhtml:span class="hljs-keyword">my</xhtml:span> $logFile      = <xhtml:span class="hljs-string">"/var/local/mwop.net-deploy.log"</xhtml:span>;
<xhtml:span class="hljs-keyword">while</xhtml:span> ($continue) {
    <xhtml:span class="hljs-comment"># 30s intervals between iterations</xhtml:span>
    <xhtml:span class="hljs-keyword">sleep</xhtml:span> <xhtml:span class="hljs-number">30</xhtml:span>;

    <xhtml:span class="hljs-comment"># Check for update file, and restart loop if not found</xhtml:span>
    <xhtml:span class="hljs-keyword">unless</xhtml:span> (-e $updateFile) {
        <xhtml:span class="hljs-keyword">next</xhtml:span>;
    }

    <xhtml:span class="hljs-comment"># Remove update file</xhtml:span>
    <xhtml:span class="hljs-keyword">if</xhtml:span> (!<xhtml:span class="hljs-keyword">unlink</xhtml:span>($updateFile)) {
        <xhtml:span class="hljs-comment"># If unable to unlink, we need to quit</xhtml:span>
        <xhtml:span class="hljs-keyword">system</xhtml:span>(<xhtml:span class="hljs-string">'echo "'</xhtml:span> . <xhtml:span class="hljs-keyword">time</xhtml:span>() . <xhtml:span class="hljs-string">': Failed to REMOVE '</xhtml:span> . $updateFile . <xhtml:span class="hljs-string">'" &gt;&gt; '</xhtml:span> . $logFile);
        $continue = <xhtml:span class="hljs-number">0</xhtml:span>;
        <xhtml:span class="hljs-keyword">next</xhtml:span>;
    }

    <xhtml:span class="hljs-comment"># Deploy</xhtml:span>
    <xhtml:span class="hljs-keyword">system</xhtml:span>($updateScript);
    <xhtml:span class="hljs-keyword">if</xhtml:span> ( $? == -<xhtml:span class="hljs-number">1</xhtml:span> ) {
        <xhtml:span class="hljs-keyword">system</xhtml:span>(<xhtml:span class="hljs-string">'echo "'</xhtml:span> . <xhtml:span class="hljs-keyword">time</xhtml:span>() . <xhtml:span class="hljs-string">': FAILED to deploy: '</xhtml:span> . $! . <xhtml:span class="hljs-string">'" &gt;&gt; '</xhtml:span> .  $logFile);
    } <xhtml:span class="hljs-keyword">else</xhtml:span> {
        <xhtml:span class="hljs-keyword">system</xhtml:span>(<xhtml:span class="hljs-string">'echo "'</xhtml:span> . <xhtml:span class="hljs-keyword">time</xhtml:span>() . <xhtml:span class="hljs-string">': Successfully DEPLOYED" &gt;&gt; '</xhtml:span> . $logFile);
    }
}
</xhtml:code></xhtml:pre>
<xhtml:p>The <xhtml:code>system()</xhtml:code> calls for logging could have been done
using Perl, but I didn't want to deal with additional error
handling and file pointers; simply proxying to the system seemed
reasonable and expedient.</xhtml:p>
<xhtml:p>When all was ready, I started the above listener, which
automatically daemonizes itself. I then installed the
<xhtml:code>post-update</xhtml:code> hook into my bare repository, and tested
it out. And it runs! When I push to master, my site is
automatically deployed, typically within 15-20 seconds from
completion.</xhtml:p>
<xhtml:h4>Caveats</xhtml:h4>
<xhtml:p>This solution, of course, relies on a daemonized process. If
that process were to terminate, I'd have no idea until I discovered
my site didn't refresh after the most recent push. Clearly, some
sort of monitor checking for the status of the daemon should be in
place.</xhtml:p>
<xhtml:p>Also, note that I'm having this update on changes to the master
branch; you may need to adapt it for your own needs, depending on
your branching strategy.</xhtml:p>
<xhtml:p>Finally, this approach does not address issues that might
require a roll-back. Ideally, the script should probably log what
revision was current prior to the deployment, allowing roll-back to
the previous state. Alternately, the deployment script should
create a new clone of the site and swap symlinks to allow quick
roll-back when required.</xhtml:p>
<xhtml:div class="h-entry"><xhtml:img class="u-photo photo" width="50" src="https://avatars0.githubusercontent.com/u/25943?v=3&amp;u=79dd2ea1d4d8855944715d09ee4c86215027fa80&amp;s=140" alt="matthew"/> <xhtml:a class="u-url u-uid p-name" href="https://mwop.net/blog/2012-06-24-git-deploy.html">Automatic
deployment with git and gitolite</xhtml:a> was originally published
<xhtml:time class="dt-published" datetime="2012-06-24T21:50:00-05:00">24
June 2012</xhtml:time> on <xhtml:a href="https://mwop.net">https://mwop.net</xhtml:a>
by <xhtml:a rel="author" class="p-author" href="https://mwop.net">Matthew
Weier O'Phinney</xhtml:a>.</xhtml:div>
</xhtml:div>
    </content>
  </entry>
  <entry xmlns:xhtml="http://www.w3.org/1999/xhtml">
    <title type="html"><![CDATA[git-svn Tip: don't use core.autocrlf]]></title>
    <published>2008-09-24T12:16:27-05:00</published>
    <updated>2008-09-24T12:16:27-05:00</updated>
    <link rel="alternate" type="text/html" href="https://mwop.net/blog/191-git-svn-Tip-dont-use-core.autocrlf.html"/>
    <id>https://mwop.net/blog/191-git-svn-Tip-dont-use-core.autocrlf.html</id>
    <author>
      <name>Matthew Weier O'Phinney</name>
      <email>contact@mwop.net</email>
      <uri>https://mwop.net</uri>
    </author>
    <content xmlns:xhtml="http://www.w3.org/1999/xhtml" type="xhtml">
      <xhtml:div xmlns:xhtml="http://www.w3.org/1999/xhtml"><xhtml:p>I've been playing around with <xhtml:a href="http://git.or.cz/">Git</xhtml:a> in the past couple months, and have
been really enjoying it. Paired with subversion, I get the best of
all worlds — distributed source control when I want it (working on
new features or trying out performance tuning), and non-distributed
source control for my public commits.</xhtml:p>
<xhtml:p><xhtml:a href="http://github.com/guides/dealing-with-newlines-in-git">Github</xhtml:a>
suggests that when working with remote repositories, you turn on
the <xhtml:code>autocrlf</xhtml:code> option, which ensures that changes in
line endings do not get accounted for when pushing to and pulling
from the remote repo. However, when working with
<xhtml:code>git-svn</xhtml:code>, this actually causes issues. After turning
this option on, I started getting the error "Delta source ended
unexpectedly" from <xhtml:code>git-svn</xhtml:code>. After a bunch of aimless
tinkering, I finally asked myself the questions, "When did this
start happening?" and, "Have I changed anything with Git lately?"
Once I'd backed out the config change, all started working
again.</xhtml:p>
<xhtml:p>In summary: don't use <xhtml:code>git config --global core.autocrlf
true</xhtml:code> when using <xhtml:code>git-svn</xhtml:code>.</xhtml:p>
<xhtml:div class="h-entry"><xhtml:img class="u-photo photo" width="50" src="https://avatars0.githubusercontent.com/u/25943?v=3&amp;u=79dd2ea1d4d8855944715d09ee4c86215027fa80&amp;s=140" alt="matthew"/> <xhtml:a class="u-url u-uid p-name" href="https://mwop.net/blog/191-git-svn-Tip-dont-use-core.autocrlf.html">
git-svn Tip: don't use core.autocrlf</xhtml:a> was originally published
<xhtml:time class="dt-published" datetime="2008-09-24T12:16:27-05:00">24
September 2008</xhtml:time> on <xhtml:a href="https://mwop.net">https://mwop.net</xhtml:a> by <xhtml:a rel="author" class="p-author" href="https://mwop.net">Matthew Weier
O'Phinney</xhtml:a>.</xhtml:div>
</xhtml:div>
    </content>
  </entry>
</feed>
