2
<html lang="en" itemscope itemtype="http://schema.org/WebPage" itemid="https://ryanfb.github.io/etc/2015/07/29/git_strategies_for_docker.html">
5
<meta itemprop="datePublished" content="2015-07-29">
6
<title>Ryan Baumann - /etc - Git strategies for Docker</title>
7
<link rel="alternate" type="application/rss+xml" title="RSS" href="https://ryanfb.github.io/etc/feed.xml">
8
<meta name="description" content="Ryan Baumann">
9
<meta name="author" content="Ryan Baumann">
10
<!-- Enables twitter cards on posts -->
11
<meta name="twitter:card" content="summary" />
12
<meta name="twitter:site" content="@ryanfb" />
13
<meta name="twitter:creator" content="@ryanfb" />
14
<meta name="twitter:title" content="Git strategies for Docker" />
15
<meta name="twitter:description" content="<p>There are a few different strategies for getting your Git source code into a <a href="https://www.docker.com/">Docker</a> build. Many of these have different ways of interacting with Docker’s caching mechanisms, and may be more or less appropriately suited to your project and how you intend to use Docker. Perhaps surprisingly, I haven’t been able to locate an overview of these strategies collected in one place, and it’s not covered in the <a href="https://docs.docker.com/articles/dockerfile_best-practices/">Dockerfile best practices guide</a>.</p>
18
<meta name="twitter:url" content="https://ryanfb.github.io/etc/2015/07/29/git_strategies_for_docker.html" />
19
<meta name="twitter:image" content="http://www.gravatar.com/avatar/5c60848658ff9b47c42196635fe0449b.jpg" />
21
<link rel='stylesheet' type='text/css' href='https://fonts.googleapis.com/css?family=EB+Garamond'>
22
<link rel="stylesheet" href="//maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css">
23
<link rel="pingback" href="https://webmention.io/ryanfb.github.io/xmlrpc" />
24
<link rel="webmention" href="https://webmention.io/ryanfb.github.io/webmention" />
25
<link rel="icon" href="/favicon.ico" type="image/x-icon">
26
<style type="text/css">
28
font-family:'Palatino', 'EB Garamond', serif;
31
font-family:'EB Garamond', serif;
38
text-decoration: none;
39
border-bottom: 1px solid #ddd;
42
border-bottom: 1px solid black;
55
display: inline-block;
93
<script type="text/javascript">
95
var _gaq = _gaq || [];
96
_gaq.push(['_setAccount', 'UA-32369790-1']);
97
_gaq.push(['_trackPageview']);
100
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
101
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
102
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
105
function setAccessedDate() {
106
if (document.getElementById('accessed-on')) {
107
var now = new Date();
108
var months = ['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'];
109
var formattedDate = now.getDate()+" "+months[now.getMonth()]+" "+now.getFullYear();
110
document.getElementById('accessed-on').textContent = " (accessed " + formattedDate + ")";
115
<body onload="setAccessedDate();">
117
<!-- Enable COinS -->
118
<span class="Z3988" title="ctx_ver=Z39.88-2004&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Adc&rft.title=Git+strategies+for+Docker&rft.aulast=Baumann&rft.aufirst=Ryan&rft.source=Ryan+Baumann+-+%2Fetc&rft.date=2015-07-29T00:00:00+00:00&rft.type=blogPost&rft.format=text&rft.identifier=https%3A%2F%2Fryanfb.github.io%2Fetc%2F2015%2F07%2F29%2Fgit_strategies_for_docker.html&rft.language=English"></span>
120
<h1><a href="/etc/">/etc</a><span style="float:right"><a href="https://ryanfb.github.io/" rel="author">ryanfb.github.io</a></span></h1>
121
<div class="title"><h2>↳ <a href="/etc/2015/07/29/git_strategies_for_docker.html" rel="bookmark">Git strategies for Docker</a></h2></div>
122
<div class="tags"><span>tags: <a href="/etc/tags/docker.html">docker</a></span></div>
125
<p>There are a few different strategies for getting your Git source code into a <a href="https://www.docker.com/">Docker</a> build. Many of these have different ways of interacting with Docker’s caching mechanisms, and may be more or less appropriately suited to your project and how you intend to use Docker. Perhaps surprisingly, I haven’t been able to locate an overview of these strategies collected in one place, and it’s not covered in the <a href="https://docs.docker.com/articles/dockerfile_best-practices/">Dockerfile best practices guide</a>.</p>
127
<p>Here are the strategies I’ve come across so far:</p>
130
<li><a href="#run-git-clone"><code class="highlighter-rouge">RUN git clone</code></a></li>
131
<li><a href="#run-curl-or-add-a-tagcommit-tarball-url"><code class="highlighter-rouge">RUN curl</code> or <code class="highlighter-rouge">ADD</code> a tag/commit tarball URL</a></li>
132
<li><a href="#git-submodules-inside-dockerfile-repository">Git submodules inside <code class="highlighter-rouge">Dockerfile</code> repository</a></li>
133
<li><a href="#dockerfile-inside-git-repository"><code class="highlighter-rouge">Dockerfile</code> inside git repository</a></li>
134
<li><a href="#volume-mapping">Volume mapping</a></li>
137
<h2 id="run-git-clone"><code class="highlighter-rouge">RUN git clone</code></h2>
139
<p>If you’re like me, this is the approach that first springs to mind when you see the commands available to you in a <code class="highlighter-rouge">Dockerfile</code>. The trouble with this is that it can interact in several unintuitive ways with Docker’s build caching mechanisms. For example, if you make an update to your git repository, and then re-run the <code class="highlighter-rouge">docker build</code> which has a <code class="highlighter-rouge">RUN git clone</code> command, you may or may not get the new commit(s) depending on if the preceding <code class="highlighter-rouge">Dockerfile</code> commands have invalidated the cache.</p>
141
<p>One way to get around this is to use <code class="highlighter-rouge">docker build --no-cache</code>, but then if there are any time-intensive commands preceding the <code class="highlighter-rouge">clone</code> they’ll have to run again too.</p>
143
<p>Another issue is that you (or someone you’ve distributed your <code class="highlighter-rouge">Dockerfile</code> to) may unexpectedly come back to a broken build later on when the upstream git repository updates.</p>
145
<p>A two-birds-one-stone approach to this while still using <code class="highlighter-rouge">RUN git clone</code> is to put it on one line<sup id="fnref:oneline"><a href="#fn:oneline" class="footnote">1</a></sup> with a specific revision checkout, e.g.:</p>
147
<div class="highlighter-rouge"><pre class="highlight"><code>RUN git clone https://github.com/example/example.git && cd example && git checkout 0123abcdef
151
<p>Then updating the revision to check out in the <code class="highlighter-rouge">Dockerfile</code> will invalidate the cache at that line and cause the <code class="highlighter-rouge">clone</code>/<code class="highlighter-rouge">checkout</code> to run.</p>
153
<p>One possible drawback to this approach in general is that you have to have <code class="highlighter-rouge">git</code> installed in your container.</p>
155
<h2 id="run-curl-or-add-a-tagcommit-tarball-url"><code class="highlighter-rouge">RUN curl</code> or <code class="highlighter-rouge">ADD</code> a tag/commit tarball URL</h2>
157
<p>This avoids having to have <code class="highlighter-rouge">git</code> installed in your container environment, and can benefit from being explicit about when the cache will break (i.e. if the tag/revision is part of the URL, that URL change will bust the cache). Note that if you use <a href="https://docs.docker.com/reference/builder/#add">the <code class="highlighter-rouge">Dockerfile</code> <code class="highlighter-rouge">ADD</code> command</a> to copy from a remote URL, the file will be downloaded every time you run the build, and the HTTP <code class="highlighter-rouge">Last-Modified</code> header will also be used to invalidate the cache.</p>
159
<p>You can see this approach used in <a href="https://github.com/docker-library/golang/blob/1a422afd7db928a821e97906ed27ed606e2f072a/1.3/Dockerfile">the golang <code class="highlighter-rouge">Dockerfile</code></a>.</p>
161
<h2 id="git-submodules-inside-dockerfile-repository">Git submodules inside <code class="highlighter-rouge">Dockerfile</code> repository</h2>
163
<p>If you keep your <code class="highlighter-rouge">Dockerfile</code> and Docker build in a separate repository from your source code, or your Docker build requires multiple source repositories, using <a href="https://git-scm.com/book/en/v2/Git-Tools-Submodules">git submodules</a> (or <a href="http://blogs.atlassian.com/2013/05/alternatives-to-git-submodule-git-subtree/">git subtrees</a>) in this repository may be a valid way to get your source repos into your build context. This avoids some concerns with Docker caching and upstream updating, as you lock the upstream revision in your submodule/subtree specification. Updating them will break your Docker cache as it changes the build context.</p>
165
<p>Note that this only gets the files into your Docker build context, you still need to use <a href="https://docs.docker.com/reference/builder/#add"><code class="highlighter-rouge">ADD</code> commands in your <code class="highlighter-rouge">Dockerfile</code></a> to copy those paths to where you expect them in the container.</p>
167
<p>I use this approach in my <a href="https://github.com/ryanfb/tesseract_latinocr_docker">Docker for Latin OCR training</a> repository.</p>
169
<h2 id="dockerfile-inside-git-repository"><code class="highlighter-rouge">Dockerfile</code> inside git repository</h2>
171
<p>Here, you just have your <code class="highlighter-rouge">Dockerfile</code> in the same git repository alongside the code you want to build/test/deploy, so it automatically gets sent as part of the build context, so you can e.g. <code class="highlighter-rouge">ADD . /project</code> to copy the context into the container. The advantage to this is that you can test changes without having to potentially commit/push them to get them into a test <code class="highlighter-rouge">docker build</code>; the disadvantage is that every time you modify any files in your working directory it will invalidate the cache at the <code class="highlighter-rouge">ADD</code> command. Sending the build context for a large source/data directory can also be time-consuming. So if you use this approach, you may also want to make judicious use of <a href="https://docs.docker.com/reference/builder/#dockerignore-file">the <code class="highlighter-rouge">.dockerignore</code> file</a>, including doing things like ignoring everything in your <code class="highlighter-rouge">.gitignore</code> and possibly the <code class="highlighter-rouge">.git</code> directory itself. You may also want to ignore the <code class="highlighter-rouge">Dockerfile</code> in your <code class="highlighter-rouge">.dockerignore</code>, as you are unlikely to be using the <code class="highlighter-rouge">Dockerfile</code> inside the container, and otherwise it will invalidate the cache at the <code class="highlighter-rouge">ADD</code> line every time you change your <code class="highlighter-rouge">Dockerfile</code>.</p>
173
<h2 id="volume-mapping">Volume mapping</h2>
175
<p>If you’re using Docker to set up a dev/test environment that you want to share among a wide variety of source repos on your host machine, <a href="https://docs.docker.com/userguide/dockervolumes/#mount-a-host-directory-as-a-data-volume">mounting a host directory as a data volume</a> may be a viable strategy. This gives you the ability to specify which directories you want to include at <code class="highlighter-rouge">docker run</code>-time, and avoids concerns about <code class="highlighter-rouge">docker build</code> caching, but none of this will be shared among other users of your <code class="highlighter-rouge">Dockerfile</code> or container image.</p>
177
<h3 id="footnotes">Footnotes</h3>
178
<div class="footnotes">
181
<p>The reason to put this on one line is the same reason <a href="https://docs.docker.com/articles/dockerfile_best-practices/#run-https-docs-docker-com-reference-builder-run">you shouldn’t put <code class="highlighter-rouge">RUN apt-get update</code> on a single line</a>. Consider instead the form:</p>
183
<div class="highlighter-rouge"><pre class="highlight"><code> RUN git clone https://github.com/example/example.git
184
RUN cd example && git checkout 0123abcdef
188
<p>Here, if you update the revision, the <code class="highlighter-rouge">clone</code> command will still use the cache while the <code class="highlighter-rouge">checkout</code> won’t, and you may try to check out a revision which isn’t in the cache. <a href="#fnref:oneline" class="reversefootnote">↩</a></p>
197
<span>Originally published on 2015-07-29 by <a href="https://ryanfb.github.io/">Ryan Baumann</a></span>
198
<span style="float:right">Feedback? <a href="mailto:ryan.baumann@gmail.com">e-mail</a> / <a href="https://twitter.com/intent/tweet?text=%40ryanfb%20">twitter</a> / <a href="https://github.com/ryanfb/etc/issues">github</a></span>
200
<span><a href="https://github.com/ryanfb/etc/commits/gh-pages/_posts/2015-07-29-git_strategies_for_docker.md">Revision History</a></span>
202
Suggested citation:<br/>
203
<span class="citation"><!-- Suggested citation code -->
204
Baumann, Ryan. “Git strategies for Docker.” <em>Ryan Baumann - /etc</em> (blog), 29 Jul 2015, https://ryanfb.github.io/etc/2015/07/29/git_strategies_for_docker.html<span id="accessed-on"></span>.
206
<br/><br/><a rel="license" href="http://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by/4.0/80x15.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.