Jekyll2023-06-20T11:54:19+00:00https://lipanski.com/feed.xmlFlorin LipanMy personal websiteThe smallest Docker image to serve static websites2021-02-28T00:00:00+00:002021-02-28T00:00:00+00:00https://lipanski.com/posts/smallest-docker-image-static-website<h1 id="the-smallest-docker-image-to-serve-static-websites">The smallest Docker image to serve static websites</h1>
<p>Until recently, I used to think that serving static websites from Docker would be a waste of bandwith and storage. Bundling nginx or various other heavy runtimes inside a Docker image for the sole purpose of serving static files didn’t seem like the best idea - Netlify or Github Pages can handle this much better. But my hobby server was sad and cried digital tears.</p>
<p>A recent HackerNews post about <a href="https://justine.lol/redbean/index.html">redbean</a>, a single-binary, super tiny, static file server got me thinking. So begins my journey to find the most time/storage efficient Docker image to serve a static website.</p>
<p>After evaluating a few static file servers with similar specs, I initially opted for <a href="https://www.acme.com/software/thttpd/">thttpd</a>, which comes with a similar small footprint but seems a bit more battle-tested. This got me to a whooping <strong>186KB</strong> image and you can read more about it in the <a href="https://github.com/lipanski/lipanski.github.io/blob/8aa8994299d0314b4d113ea481c60561f97c2940/_posts/2021-02-28-smallest-docker-image-static-website.md">previous version</a> of this post.</p>
<p>A later comment (thanks Sergey Ponomarev) suggested the <a href="https://git.busybox.net/busybox/tree/networking/httpd.c">BusyBox httpd</a> file server, which seemed fairly small and more feature-rich so I gave it a try. Let’s see if it can produce an even smaller image (spoiler alert: it can).</p>
<p><a href="https://www.busybox.net/">BusyBox</a> is much more than just a file server - it’s a set of lightweight replacements for many common UNIX utilities, like <em>shell</em>, <em>gzip</em>, or <em>echo</em>.</p>
<p>Running the BusyBox httpd server goes like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>busybox httpd -f -v -p 3000
</code></pre></div></div>
<p>This will launch the server in foreground (<code class="language-plaintext highlighter-rouge">-f</code>), listening on port <code class="language-plaintext highlighter-rouge">3000</code> (<code class="language-plaintext highlighter-rouge">-p</code>), serving all files inside the current directory. Access logs will be printend to <code class="language-plaintext highlighter-rouge">STDOUT</code> (<code class="language-plaintext highlighter-rouge">-v</code>). It comes with a few other neat features, like serving gzipped content, custom error pages, basic auth, allow/deny rules, and reverse proxying, which can be enabled by adding a <code class="language-plaintext highlighter-rouge">httpd.conf</code> file. You can read more about it in the <a href="https://git.busybox.net/busybox/tree/networking/httpd.c">source code comments</a>.</p>
<p>My first attempt used the official <code class="language-plaintext highlighter-rouge">busybox</code> image:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> busybox:1.35</span>
<span class="c"># Create a non-root user to own the files and run our server</span>
<span class="k">RUN </span>adduser <span class="nt">-D</span> static
<span class="k">USER</span><span class="s"> static</span>
<span class="k">WORKDIR</span><span class="s"> /home/static</span>
<span class="c"># Copy the static website</span>
<span class="c"># Use the .dockerignore file to control what ends up inside the image!</span>
<span class="k">COPY</span><span class="s"> . .</span>
<span class="c"># Run BusyBox httpd</span>
<span class="k">CMD</span><span class="s"> ["busybox", "httpd", "-f", "-v", "-p", "3000"]</span>
</code></pre></div></div>
<p>You can build and run the image by calling:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker build <span class="nt">-t</span> static:latest <span class="nb">.</span>
docker run <span class="nt">-it</span> <span class="nt">--rm</span> <span class="nt">--init</span> <span class="nt">-p</span> 3000:3000 static:latest
</code></pre></div></div>
<p>…then browse to <code class="language-plaintext highlighter-rouge">http://localhost:3000</code>.</p>
<p>The image builds quickly and, at <strong>1.25MB</strong>, is fairly small:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> docker images | grep static
static latest 854054cff457 1 second ago 1.25MB
</code></pre></div></div>
<p>The image is already <a href="https://github.com/docker-library/busybox/tree/master/stable/musl">built</a> using <a href="https://hub.docker.com/_/scratch"><code class="language-plaintext highlighter-rouge">scratch</code></a>, which is basically a <em>no-op</em> image, light as vacuum. It contains only the statically compiled BusyBox binary and nothing else. There’s not much we can optimize there.</p>
<p>Then again BusyBox comes packaged with much more than just the static file server - it contains all these other UNIX utilities. We can create a custom build of BusyBox limiting it to only <em>httpd</em> and thus reducing its size.</p>
<p>We start by downloading the BusyBox source code:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone git://busybox.net/busybox.git
</code></pre></div></div>
<p>Then create a default <code class="language-plaintext highlighter-rouge">.config</code> file for the build with all features disabled:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make allnoconfig
</code></pre></div></div>
<p>Next we call <code class="language-plaintext highlighter-rouge">make menuconfig</code> and select the <code class="language-plaintext highlighter-rouge">httpd</code> features from within “Network Utilities”. Since we don’t want to depend on other OS libraries, we also need to check “Build static binary” from within “Settings”. The resulting <code class="language-plaintext highlighter-rouge">.config</code> file looks like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># ...
CONFIG_STATIC=y
# ...
CONFIG_HTTPD=y
CONFIG_FEATURE_HTTPD_PORT_DEFAULT=80
CONFIG_FEATURE_HTTPD_RANGES=y
CONFIG_FEATURE_HTTPD_SETUID=y
CONFIG_FEATURE_HTTPD_BASIC_AUTH=y
CONFIG_FEATURE_HTTPD_AUTH_MD5=y
CONFIG_FEATURE_HTTPD_CGI=y
CONFIG_FEATURE_HTTPD_CONFIG_WITH_SCRIPT_INTERPR=y
CONFIG_FEATURE_HTTPD_SET_REMOTE_PORT_TO_ENV=y
CONFIG_FEATURE_HTTPD_ENCODE_URL_STR=y
CONFIG_FEATURE_HTTPD_ERROR_PAGES=y
CONFIG_FEATURE_HTTPD_PROXY=y
CONFIG_FEATURE_HTTPD_GZIP=y
CONFIG_FEATURE_HTTPD_ETAG=y
CONFIG_FEATURE_HTTPD_LAST_MODIFIED=y
CONFIG_FEATURE_HTTPD_DATE=y
CONFIG_FEATURE_HTTPD_ACL_IP=y
# ...
</code></pre></div></div>
<p>Since building the <code class="language-plaintext highlighter-rouge">.config</code> is a bit tedious, we’ll save it for later use. You can find a sample on <a href="https://github.com/lipanski/docker-static-website/blob/master/.config">my Github</a>.</p>
<p>Finally, we compile the binary:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make
make install
</code></pre></div></div>
<p>…which will be placed at <code class="language-plaintext highlighter-rouge">_install/bin/busybox</code>.</p>
<p>If you’ve built the binary in a glibc environment (e.g. Ubuntu), it would take up around <strong>1.5MB</strong>, which is not that great. Actually, it’s worse than the official image containing all BusyBox utilities.</p>
<p>Let’s try and build it on <strong>musl</strong>, inside an Alpine container:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> alpine:3.13.2</span>
<span class="c"># Install all dependencies required for compiling busybox</span>
<span class="k">RUN </span>apk add gcc musl-dev make perl
<span class="c"># Download busybox sources</span>
<span class="k">RUN </span>wget https://busybox.net/downloads/busybox-1.35.0.tar.bz2 <span class="se">\
</span> <span class="o">&&</span> <span class="nb">tar </span>xf busybox-1.35.0.tar.bz2 <span class="se">\
</span> <span class="o">&&</span> <span class="nb">mv</span> /busybox-1.35.0 /busybox
<span class="k">WORKDIR</span><span class="s"> /busybox</span>
<span class="c"># Copy the busybox build config (limited to httpd)</span>
<span class="k">COPY</span><span class="s"> .config .</span>
<span class="c"># Compile and install busybox</span>
<span class="k">RUN </span>make <span class="o">&&</span> make <span class="nb">install</span>
</code></pre></div></div>
<p>The binary size looks much better now: <strong>177KB</strong>!</p>
<p>We can improve further by dropping some unneeded <em>httpd</em> features from the <code class="language-plaintext highlighter-rouge">.config</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CONFIG_HTTPD=y
CONFIG_FEATURE_HTTPD_PORT_DEFAULT=80
# CONFIG_FEATURE_HTTPD_RANGES is not set
# CONFIG_FEATURE_HTTPD_SETUID is not set
CONFIG_FEATURE_HTTPD_BASIC_AUTH=y
# CONFIG_FEATURE_HTTPD_AUTH_MD5 is not set
# CONFIG_FEATURE_HTTPD_CGI is not set
# CONFIG_FEATURE_HTTPD_CONFIG_WITH_SCRIPT_INTERPR is not set
# CONFIG_FEATURE_HTTPD_SET_REMOTE_PORT_TO_ENV is not set
# CONFIG_FEATURE_HTTPD_ENCODE_URL_STR is not set
CONFIG_FEATURE_HTTPD_ERROR_PAGES=y
CONFIG_FEATURE_HTTPD_PROXY=y
CONFIG_FEATURE_HTTPD_GZIP=y
CONFIG_FEATURE_HTTPD_ETAG=y
CONFIG_FEATURE_HTTPD_LAST_MODIFIED=y
CONFIG_FEATURE_HTTPD_DATE=y
CONFIG_FEATURE_HTTPD_ACL_IP=y
</code></pre></div></div>
<blockquote>
<p>You can disable most of these features but in my experience the biggest impact comes from dropping MD5 support for basic auth and CGI.</p>
</blockquote>
<p>We’ve now reached <strong>149KB</strong>. It’s time to wrap things up and copy the static BusyBox binary to a Docker <a href="https://hub.docker.com/_/scratch"><code class="language-plaintext highlighter-rouge">scratch</code></a> image. Using the <code class="language-plaintext highlighter-rouge">scratch</code> image usually requires a multi-stage approach. We start from <code class="language-plaintext highlighter-rouge">alpine</code>, download and compile <em>BusyBox</em> as a static binary, create a user, then copy these assets over to <code class="language-plaintext highlighter-rouge">scratch</code> and add our static files to the mix:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> alpine:3.13.2 AS builder</span>
<span class="c"># Install all dependencies required for compiling busybox</span>
<span class="k">RUN </span>apk add gcc musl-dev make perl
<span class="c"># Download busybox sources</span>
<span class="k">RUN </span>wget https://busybox.net/downloads/busybox-1.35.0.tar.bz2 <span class="se">\
</span> <span class="o">&&</span> <span class="nb">tar </span>xf busybox-1.35.0.tar.bz2 <span class="se">\
</span> <span class="o">&&</span> <span class="nb">mv</span> /busybox-1.35.0 /busybox
<span class="k">WORKDIR</span><span class="s"> /busybox</span>
<span class="c"># Copy the busybox build config (limited to httpd)</span>
<span class="k">COPY</span><span class="s"> .config .</span>
<span class="c"># Compile and install busybox</span>
<span class="k">RUN </span>make <span class="o">&&</span> make <span class="nb">install</span>
<span class="c"># Create a non-root user to own the files and run our server</span>
<span class="k">RUN </span>adduser <span class="nt">-D</span> static
<span class="c"># Switch to the scratch image</span>
<span class="k">FROM</span><span class="s"> scratch</span>
<span class="k">EXPOSE</span><span class="s"> 3000</span>
<span class="c"># Copy over the user</span>
<span class="k">COPY</span><span class="s"> --from=builder /etc/passwd /etc/passwd</span>
<span class="c"># Copy the busybox static binary</span>
<span class="k">COPY</span><span class="s"> --from=builder /busybox/_install/bin/busybox /</span>
<span class="c"># Use our non-root user</span>
<span class="k">USER</span><span class="s"> static</span>
<span class="k">WORKDIR</span><span class="s"> /home/static</span>
<span class="c"># Uploads a blank default httpd.conf</span>
<span class="c"># This is only needed in order to set the `-c` argument in this base file</span>
<span class="c"># and save the developer the need to override the CMD line in case they ever</span>
<span class="c"># want to use a httpd.conf</span>
<span class="k">COPY</span><span class="s"> httpd.conf .</span>
<span class="c"># Copy the static website</span>
<span class="c"># Use the .dockerignore file to control what ends up inside the image!</span>
<span class="k">COPY</span><span class="s"> . .</span>
<span class="c"># Run busybox httpd</span>
<span class="k">CMD</span><span class="s"> ["/busybox", "httpd", "-f", "-v", "-p", "3000", "-c", "httpd.conf"]</span>
</code></pre></div></div>
<p>Let’s have another look at those numbers:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>> docker images | grep static
static latest 9b08b9509c32 1 second ago 154kB
</code></pre></div></div>
<p><img src="/assets/images/excellent.png" alt="Excellent" /></p>
<p>The <strong>154KB</strong> we’re left with correspond to the size of the <em>BusyBox httpd</em> static binary and the static files that were copied over, which in my case was just one file containing the text <code class="language-plaintext highlighter-rouge">hello world</code>. Note that the <code class="language-plaintext highlighter-rouge">alpine</code> step of the multi-stage build is actually quite large in size (<em>~185MB</em>), but it can be reused across builds and doesn’t get pushed to the registry. In order to skip the <code class="language-plaintext highlighter-rouge">alpine</code> step entirely, I pushed the resulting image to the Docker registry.</p>
<p>You can download it from <a href="https://hub.docker.com/r/lipanski/docker-static-website">Docker Hub</a> and use it to serve your static websites:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> lipanski/docker-static-website:latest</span>
<span class="k">COPY</span><span class="s"> . .</span>
</code></pre></div></div>
<p>This produces a single-layer image of <strong>154KB</strong> + whatever the size of your static website and <em>nothing else</em>. If you need to configure <em>httpd</em> in a different way, you can just override the <code class="language-plaintext highlighter-rouge">CMD</code> line:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> lipanski/docker-static-website:latest</span>
<span class="k">COPY</span><span class="s"> . .</span>
<span class="k">CMD</span><span class="s"> ["/busybox", "httpd", "-f", "-v", "-p", "3000", "-c", "httpd.conf"]</span>
</code></pre></div></div>
<p>The code and an FAQ about configuring <em>httpd</em> is available at <a href="https://github.com/lipanski/docker-static-website">https://github.com/lipanski/docker-static-website</a>.</p>
<p>To conclude, Docker <em>can</em> be used efficiently to package and serve static websites.</p>The smallest Docker image to serve static websitesSome things you should know about eager loading in ActiveRecord2020-10-06T00:00:00+00:002020-10-06T00:00:00+00:00https://lipanski.com/posts/activerecord-eager-loading<h1 id="some-things-you-should-know-about-eager-loading-in-activerecord">Some things you should know about eager loading in ActiveRecord</h1>
<p>Tracking down <em>all</em> the associations that need to be eager loaded in order to prevent N+1 queries can be tedious. Your code has to be <em>instrumented properly</em> and most of the times you need to reason about every single query, <em>one by one</em>. On top of that, eager loading can be fussy: calling <code class="language-plaintext highlighter-rouge">where</code>, <code class="language-plaintext highlighter-rouge">order</code> or <code class="language-plaintext highlighter-rouge">limit</code> on your associations might invalidate your eager loading efforts in some <em>unexpected</em> ways.</p>
<p>This article will present an <a href="#automatic-eager-loading"><strong>automated way of dealing with N+1</strong></a> queries and it will explain <a href="#things-that-break-eager-loading-where-order-limit"><strong>how to go around some of the limitations of eager loading</strong></a> in ActiveRecord. Furthermore, it will show you <a href="#use-the-cache-luke"><strong>how to use the query cache to your benefit</strong></a> and <a href="#preventing-n1-regressions-with-tests"><strong>how to write tests</strong></a> to prevent those sneaky N+1 queries from coming back.</p>
<h2 id="automatic-eager-loading">Automatic eager loading</h2>
<p><a href="https://github.com/salsify/goldiloader">Goldiloader</a> is a gem that eager loads your associations <em>automatically</em> and <em>only when needed</em>.</p>
<p>Just add it in your <code class="language-plaintext highlighter-rouge">Gemfile</code>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">gem</span> <span class="s2">"goldiloader"</span>
</code></pre></div></div>
<p>…and watch as your associations are <em>magically</em> eager loaded:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> <span class="n">users</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">limit</span><span class="p">(</span><span class="mi">5</span><span class="p">).</span><span class="nf">to_a</span>
<span class="c1"># SELECT "users".* FROM "users" LIMIT 5</span>
<span class="o">></span> <span class="n">users</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">user</span><span class="o">|</span> <span class="n">user</span><span class="p">.</span><span class="nf">posts</span><span class="p">.</span><span class="nf">to_a</span> <span class="p">}</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1,2,3,4,5)</span>
</code></pre></div></div>
<p>Notice how there was <strong>no need to explicitly call <code class="language-plaintext highlighter-rouge">includes(:posts)</code></strong> when querying users. Without Goldiloader, that second call would have triggered <em>five</em> queries instead of one. With Goldiloader, you basically don’t have to think about calling <code class="language-plaintext highlighter-rouge">includes</code> any more.</p>
<p><strong>Goldiloader pairs very nicely with GraphQL APIs</strong>. The moment your API allows for querying associations or associations of associations, you have a little <em>N+1 nightmare</em> on your hands. GraphQL APIs with various different clients are hard to optimize because they might be used in so many different ways. Integrating something like <a href="https://github.com/shopify/graphql-batch">graphql-batch</a> could address the problem, but you have to apply it to every individual case and it’s a more intrusive solution.</p>
<p>On top of that, if you’re working on a large code base or if you’re inexperienced about dealing with N+1 queries, Goldiloader might give you a nice performance boost at a low cost, albeit with some limitations which we will discuss in the next part.</p>
<h2 id="things-that-break-eager-loading-where-order-limit">Things that break eager loading: <code class="language-plaintext highlighter-rouge">where</code>, <code class="language-plaintext highlighter-rouge">order</code>, <code class="language-plaintext highlighter-rouge">limit</code></h2>
<p>Applying <code class="language-plaintext highlighter-rouge">where</code>, <code class="language-plaintext highlighter-rouge">order</code> or <code class="language-plaintext highlighter-rouge">limit</code> clauses on your ActiveRecord associations will break eager loading, whether you’re using Goldiloader or not.</p>
<blockquote>
<p>In order to make the following examples more generic, I’ll be using <code class="language-plaintext highlighter-rouge">includes</code> calls instead of Goldiloader. If you’re using Goldiloader, simply remove them or consider them redundant.</p>
</blockquote>
<p>Let’s see what happens when we try to <code class="language-plaintext highlighter-rouge">order</code> our posts:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> <span class="n">users</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">includes</span><span class="p">(</span><span class="ss">:posts</span><span class="p">).</span><span class="nf">limit</span><span class="p">(</span><span class="mi">5</span><span class="p">).</span><span class="nf">to_a</span>
<span class="c1"># SELECT "users".* FROM "users" LIMIT 5</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1,2,3,4,5)</span>
<span class="o">></span> <span class="n">users</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">user</span><span class="o">|</span> <span class="n">user</span><span class="p">.</span><span class="nf">posts</span><span class="p">.</span><span class="nf">order</span><span class="p">(</span><span class="ss">:created_at</span><span class="p">).</span><span class="nf">to_a</span> <span class="p">}</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 1 ORDER BY "posts"."created_at" ASC</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 2 ORDER BY "posts"."created_at" ASC</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 3 ORDER BY "posts"."created_at" ASC</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 4 ORDER BY "posts"."created_at" ASC</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 5 ORDER BY "posts"."created_at" ASC</span>
</code></pre></div></div>
<p>Even though calling <code class="language-plaintext highlighter-rouge">includes(:posts)</code> produced an <code class="language-plaintext highlighter-rouge">IN</code> query which seems to cover all our posts, applying the <code class="language-plaintext highlighter-rouge">order</code> clause on our association ignored this and triggered a bunch of N+1 queries. In order for eager loading to work, <strong>the eager loaded query should match the query required to fetch your association</strong>.</p>
<p>One way to avoid this is by moving the <code class="language-plaintext highlighter-rouge">order</code> inside a <strong>default scope</strong>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Post</span>
<span class="n">default_scope</span> <span class="p">{</span> <span class="n">order</span><span class="p">(</span><span class="ss">:created_at</span><span class="p">)</span> <span class="p">}</span>
<span class="k">end</span>
<span class="o">></span> <span class="n">users</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">includes</span><span class="p">(</span><span class="ss">:posts</span><span class="p">).</span><span class="nf">limit</span><span class="p">(</span><span class="mi">5</span><span class="p">).</span><span class="nf">to_a</span>
<span class="c1"># SELECT "users".* FROM "users" LIMIT 5</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1,2,3,4,5) ORDER BY "posts"."created_at" ASC</span>
<span class="o">></span> <span class="n">users</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">user</span><span class="o">|</span> <span class="n">user</span><span class="p">.</span><span class="nf">posts</span><span class="p">.</span><span class="nf">to_a</span> <span class="p">}</span>
<span class="c1"># No queries here, they've been eager loaded already</span>
</code></pre></div></div>
<p>…another way is by moving the <code class="language-plaintext highlighter-rouge">order</code> inside the <strong>parent association</strong>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">User</span>
<span class="n">has_many</span> <span class="ss">:posts</span><span class="p">,</span> <span class="o">-></span> <span class="p">{</span> <span class="n">order</span><span class="p">(</span><span class="ss">:created_at</span><span class="p">)</span> <span class="p">}</span>
<span class="k">end</span>
<span class="o">></span> <span class="n">users</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">includes</span><span class="p">(</span><span class="ss">:posts</span><span class="p">).</span><span class="nf">limit</span><span class="p">(</span><span class="mi">5</span><span class="p">).</span><span class="nf">to_a</span>
<span class="c1"># SELECT "users".* FROM "users" LIMIT 5</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1,2,3,4,5) ORDER BY "posts"."created_at" ASC</span>
<span class="o">></span> <span class="n">users</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">user</span><span class="o">|</span> <span class="n">user</span><span class="p">.</span><span class="nf">posts</span><span class="p">.</span><span class="nf">to_a</span> <span class="p">}</span>
<span class="c1"># No queries here, they've been eager loaded already</span>
</code></pre></div></div>
<p>…and yet another way is by moving the <code class="language-plaintext highlighter-rouge">order</code> inside a <strong>scoped parent association</strong>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">User</span>
<span class="n">has_many</span> <span class="ss">:ordered_posts</span><span class="p">,</span> <span class="o">-></span> <span class="p">{</span> <span class="n">order</span><span class="p">(</span><span class="ss">:created_at</span><span class="p">)</span> <span class="p">},</span> <span class="ss">class_name: </span><span class="s2">"Post"</span>
<span class="k">end</span>
<span class="o">></span> <span class="n">users</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">includes</span><span class="p">(</span><span class="ss">:ordered_posts</span><span class="p">).</span><span class="nf">limit</span><span class="p">(</span><span class="mi">5</span><span class="p">).</span><span class="nf">to_a</span>
<span class="c1"># SELECT "users".* FROM "users" LIMIT 5</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1,2,3,4,5) ORDER BY "posts"."created_at" ASC</span>
<span class="o">></span> <span class="n">users</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">user</span><span class="o">|</span> <span class="n">user</span><span class="p">.</span><span class="nf">ordered_posts</span><span class="p">.</span><span class="nf">to_a</span> <span class="p">}</span>
<span class="c1"># No queries here, they've been eager loaded already</span>
</code></pre></div></div>
<p>Let’s see what happens when we apply a <code class="language-plaintext highlighter-rouge">where</code> condition to our posts:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> <span class="n">users</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">includes</span><span class="p">(</span><span class="ss">:posts</span><span class="p">).</span><span class="nf">limit</span><span class="p">(</span><span class="mi">5</span><span class="p">).</span><span class="nf">to_a</span>
<span class="c1"># SELECT "users".* FROM "users" LIMIT 5</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" IN (1,2,3,4,5)</span>
<span class="o">></span> <span class="n">users</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">user</span><span class="o">|</span> <span class="n">user</span><span class="p">.</span><span class="nf">posts</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="s2">"created_at < ?"</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="nf">week</span><span class="p">.</span><span class="nf">ago</span><span class="p">).</span><span class="nf">to_a</span> <span class="p">}</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 1 AND (created_at < '2020-09-25 08:55:05.919824')</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 2 AND (created_at < '2020-09-25 08:55:05.919824')</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 3 AND (created_at < '2020-09-25 08:55:05.919824')</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 4 AND (created_at < '2020-09-25 08:55:05.919824')</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE "posts"."user_id" = 5 AND (created_at < '2020-09-25 08:55:05.919824')</span>
</code></pre></div></div>
<p>Ok, that didn’t work. But we can fix it exactly the same way.</p>
<p>You can move the <code class="language-plaintext highlighter-rouge">where</code> condition inside a <strong>default scope</strong>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Post</span>
<span class="n">default_scope</span> <span class="p">{</span> <span class="n">where</span><span class="p">(</span><span class="s2">"created_at < ?"</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="nf">week</span><span class="p">.</span><span class="nf">ago</span><span class="p">)</span> <span class="p">}</span>
<span class="k">end</span>
<span class="o">></span> <span class="n">users</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">includes</span><span class="p">(</span><span class="ss">:posts</span><span class="p">).</span><span class="nf">limit</span><span class="p">(</span><span class="mi">5</span><span class="p">).</span><span class="nf">to_a</span>
<span class="o">></span> <span class="n">users</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">user</span><span class="o">|</span> <span class="n">user</span><span class="p">.</span><span class="nf">posts</span><span class="p">.</span><span class="nf">to_a</span> <span class="p">}</span>
</code></pre></div></div>
<p>…or inside the <strong>parent association</strong>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">User</span>
<span class="n">has_many</span> <span class="ss">:posts</span><span class="p">,</span> <span class="o">-></span> <span class="p">{</span> <span class="n">where</span><span class="p">(</span><span class="s2">"created_at < ?"</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="nf">week</span><span class="p">.</span><span class="nf">ago</span><span class="p">)</span> <span class="p">}</span>
<span class="k">end</span>
<span class="o">></span> <span class="n">users</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">includes</span><span class="p">(</span><span class="ss">:posts</span><span class="p">).</span><span class="nf">limit</span><span class="p">(</span><span class="mi">5</span><span class="p">).</span><span class="nf">to_a</span>
<span class="o">></span> <span class="n">users</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">user</span><span class="o">|</span> <span class="n">user</span><span class="p">.</span><span class="nf">posts</span><span class="p">.</span><span class="nf">to_a</span> <span class="p">}</span>
</code></pre></div></div>
<p>…or inside a <strong>scoped parent association</strong>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">User</span>
<span class="n">has_many</span> <span class="ss">:posts_from_one_week_ago</span><span class="p">,</span> <span class="o">-></span> <span class="p">{</span> <span class="n">where</span><span class="p">(</span><span class="s2">"created_at < ?"</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="nf">week</span><span class="p">.</span><span class="nf">ago</span><span class="p">)</span> <span class="p">}</span>
<span class="k">end</span>
<span class="o">></span> <span class="n">users</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">includes</span><span class="p">(</span><span class="ss">:posts_from_one_week_ago</span><span class="p">).</span><span class="nf">limit</span><span class="p">(</span><span class="mi">5</span><span class="p">).</span><span class="nf">to_a</span>
<span class="o">></span> <span class="n">users</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">user</span><span class="o">|</span> <span class="n">user</span><span class="p">.</span><span class="nf">posts_from_one_week_ago</span><span class="p">.</span><span class="nf">to_a</span> <span class="p">}</span>
</code></pre></div></div>
<p>Mastering eager loading basically means <strong>relying more on default scopes</strong> and <strong>applying scopes inside your associations</strong> rather than using those scopes directly.</p>
<p>What about <code class="language-plaintext highlighter-rouge">limit</code> and <code class="language-plaintext highlighter-rouge">offset</code>? Depending on your use case, you could apply the same techniques. But there’s one big use case that just doesn’t fit here: <strong>pagination</strong>. How would you supply the page number? And while we’re at it, how would you deal with <code class="language-plaintext highlighter-rouge">where</code> conditions or <strong>scopes that contain a variable</strong>?</p>
<h2 id="turn-your-queries-around">Turn your queries around</h2>
<p>Unlike scopes, <strong>default scopes and scoped associations don’t take arguments</strong>. If we’d like to provide our associations with an outside parameter while avoiding N+1 queries, we’ll have to think of something else.</p>
<p>Consider the following code which produces N+1 queries:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># An outside parameter</span>
<span class="n">time</span> <span class="o">=</span> <span class="mi">1</span><span class="p">.</span><span class="nf">week</span><span class="p">.</span><span class="nf">ago</span>
<span class="n">users</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">includes</span><span class="p">(</span><span class="ss">:posts</span><span class="p">).</span><span class="nf">to_a</span>
<span class="n">users</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">user</span><span class="o">|</span> <span class="n">user</span><span class="p">.</span><span class="nf">posts</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="s2">"created_at < ?"</span><span class="p">,</span> <span class="n">time</span><span class="p">).</span><span class="nf">to_a</span> <span class="p">}</span> <span class="c1"># N+1!</span>
</code></pre></div></div>
<p>In such cases, you can turn your queries around and <strong>let the association become the main subject of that query</strong>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># An outside parameter</span>
<span class="n">time</span> <span class="o">=</span> <span class="mi">1</span><span class="p">.</span><span class="nf">week</span><span class="p">.</span><span class="nf">ago</span>
<span class="n">posts</span> <span class="o">=</span> <span class="no">Post</span><span class="p">.</span><span class="nf">includes</span><span class="p">(</span><span class="ss">:user</span><span class="p">).</span><span class="nf">where</span><span class="p">(</span><span class="s2">"created_at < ?"</span><span class="p">,</span> <span class="n">time</span><span class="p">).</span><span class="nf">to_a</span>
<span class="c1"># SELECT "posts".* FROM "posts" WHERE (created_at < '2020-09-27 10:03:23.478773')</span>
<span class="c1"># SELECT "users".* FROM "users" WHERE "users"."id" IN (1,2,3,4,5)</span>
<span class="n">users</span> <span class="o">=</span> <span class="n">posts</span><span class="p">.</span><span class="nf">map</span><span class="p">(</span><span class="o">&</span><span class="ss">:user</span><span class="p">)</span>
<span class="c1"># No queries here, they've been eager loaded already</span>
</code></pre></div></div>
<h2 id="use-the-cache-luke">Use the cache, Luke!</h2>
<p>Let’s say you’d like to query the <strong>total number of posts for every user</strong>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">users</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">all</span>
<span class="c1"># SELECT "users".* from "users"</span>
<span class="n">users</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">user</span><span class="o">|</span> <span class="n">user</span><span class="p">.</span><span class="nf">posts</span><span class="p">.</span><span class="nf">count</span> <span class="p">}</span>
<span class="c1"># SELECT COUNT(*) FROM "posts" WHERE user_id = 1</span>
<span class="c1"># SELECT COUNT(*) FROM "posts" WHERE user_id = 2</span>
<span class="c1"># SELECT COUNT(*) FROM "posts" WHERE user_id = 3</span>
<span class="c1"># SELECT COUNT(*) FROM "posts" WHERE user_id = 4</span>
<span class="c1"># SELECT COUNT(*) FROM "posts" WHERE user_id = 5</span>
</code></pre></div></div>
<p>As you assumed, it triggered a bunch of N+1 queries. But this time there’s no way to eager load these aggregate values by calling <code class="language-plaintext highlighter-rouge">includes</code>.</p>
<p>Instead you can make use of the <strong>ActiveRecord query cache</strong>. By default, ActiveRecord caches results for every individual SQL query, ensuring that <strong>subsequent calls placed within the same web request or background job will not hit the database</strong>.</p>
<p>Our <code class="language-plaintext highlighter-rouge">COUNT</code> queries differ though – every distinct <code class="language-plaintext highlighter-rouge">user_id</code> will break the caching. Then again, there’s nothing speaking against <strong>rewriting our queries to produce the same SQL every time</strong>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">users</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">all</span>
<span class="n">users</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">user</span><span class="o">|</span>
<span class="n">posts_count_per_user</span> <span class="o">=</span> <span class="no">Post</span><span class="p">.</span><span class="nf">group</span><span class="p">(</span><span class="ss">:user_id</span><span class="p">).</span><span class="nf">count</span> <span class="c1"># Returns a Hash</span>
<span class="n">posts_count_per_user</span><span class="p">[</span><span class="n">user</span><span class="p">.</span><span class="nf">id</span><span class="p">]</span> <span class="o">||</span> <span class="mi">0</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Running this code from within a web request will produce the following log output:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># (1.4ms) SELECT "users".* from "users"
# (2.0ms) SELECT COUNT(*) AS count_all, "posts"."user_id" AS posts_user_id FROM "posts" GROUP BY "posts"."user_id"
# CACHE (0.1ms) SELECT COUNT(*) AS count_all, "posts"."user_id" AS posts_user_id FROM "posts" GROUP BY "posts"."user_id"
# CACHE (0.1ms) SELECT COUNT(*) AS count_all, "posts"."user_id" AS posts_user_id FROM "posts" GROUP BY "posts"."user_id"
# CACHE (0.1ms) SELECT COUNT(*) AS count_all, "posts"."user_id" AS posts_user_id FROM "posts" GROUP BY "posts"."user_id"
# CACHE (0.1ms) SELECT COUNT(*) AS count_all, "posts"."user_id" AS posts_user_id FROM "posts" GROUP BY "posts"."user_id"
</code></pre></div></div>
<p>The first iteration triggered a <code class="language-plaintext highlighter-rouge">COUNT</code> query, but <strong>all subsequent calls were cached</strong>, which means they didn’t hit the database and the N+1 situation was avoided.</p>
<p>Keep in mind that relying on the query cache too much might have a <strong>potential impact on the amount of allocations</strong> and consequently memory usage of your app, especially when triggering queries that have to initialize many ActiveRecord models. For this reason, prefer using <code class="language-plaintext highlighter-rouge">includes</code> when possible or write your <em>aggregate</em> queries in such a way that they resolve to simple structures (hashes or arrays of primitive values).</p>
<p>If you’d like to enable the query cache outside of web requests or background jobs or if you’d like to try it out in your <code class="language-plaintext highlighter-rouge">rails console</code>, you can call:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span><span class="p">.</span><span class="nf">connection</span><span class="p">.</span><span class="nf">enable_query_cache!</span>
<span class="c1"># Any queries triggered here might be cached</span>
<span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span><span class="p">.</span><span class="nf">connection</span><span class="p">.</span><span class="nf">disable_query_cache!</span>
</code></pre></div></div>
<h2 id="preventing-n1-regressions-with-tests">Preventing N+1 regressions with tests</h2>
<p>Given a large code base, introducing more N+1 queries is fairly easy – as easy as writing <code class="language-plaintext highlighter-rouge">user.posts</code> inside a loop, somewhere deep inside a template, without remembering to also eager load the association in the controller. But you can write tests to prevent that…</p>
<p>First, let’s settle on the behaviour we’d like to test: <strong>any request should trigger at most one SQL query per table</strong>. In order to track and count all these queries, we could use ActiveSupport instrumentation to hook up to the <code class="language-plaintext highlighter-rouge">sql.active_record</code> event, the same way ActiveRecord is <a href="https://github.com/rails/rails/blob/6-0-stable/activerecord/test/cases/test_case.rb">tested</a>.</p>
<p>I already packaged this in a tiny gem called <a href="https://github.com/lipanski/sql-spy">sql_spy</a>.</p>
<p>Just add it to your <code class="language-plaintext highlighter-rouge">Gemfile</code>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">gem</span> <span class="s2">"sql_spy"</span>
</code></pre></div></div>
<p>…and let’s write our first <strong>controller test</strong>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"sql_spy"</span>
<span class="k">class</span> <span class="nc">PostsControllerTest</span> <span class="o"><</span> <span class="no">ActionDispatch</span><span class="o">::</span><span class="no">IntegrationTest</span>
<span class="nb">test</span> <span class="s2">"GET /posts should not trigger N+1 queries"</span> <span class="k">do</span>
<span class="c1"># Add some realistic test data here</span>
<span class="n">queries</span> <span class="o">=</span> <span class="no">SqlSpy</span><span class="p">.</span><span class="nf">track</span> <span class="k">do</span>
<span class="n">get</span> <span class="s2">"/posts"</span>
<span class="k">end</span>
<span class="n">select_queries_by_model</span> <span class="o">=</span> <span class="n">queries</span><span class="p">.</span><span class="nf">select</span><span class="p">(</span><span class="o">&</span><span class="ss">:select?</span><span class="p">).</span><span class="nf">group_by</span><span class="p">(</span><span class="o">&</span><span class="ss">:model_name</span><span class="p">)</span>
<span class="c1"># We only want the SELECT queries and we'd like them grouped by model</span>
<span class="n">assert</span> <span class="n">select_queries_by_model</span><span class="p">.</span><span class="nf">all?</span> <span class="p">{</span> <span class="o">|</span><span class="n">_</span><span class="p">,</span> <span class="n">queries</span><span class="o">|</span> <span class="n">queries</span><span class="p">.</span><span class="nf">count</span> <span class="o"><=</span> <span class="mi">1</span> <span class="p">}</span>
<span class="c1"># Our tolerance rate is 1 query per table</span>
<span class="c1"># You can increase this value depending on your business logic</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Some things are worth mentioning here. <strong>Your test is only as correct as the test data you provide it with</strong>. Ideally try to keep a realistic set of fixtures around (3-4 examples of each core model) or create a realistic environment before the test run.</p>
<p>Some requests might genuinely be triggering <strong>more than one query per table</strong>. For example, paginated requests usually produce two queries: one to fetch the records, another one for the page count. In such cases, you can <strong>increase the tested tolerance rate, while also increasing the number of records per table</strong> in your test setup, to make sure you’re still catching those N+1 queries.</p>
<p>You can introduce these N+1 regression tests to <strong>every critical or data-intensive controller action</strong>. Their setup is fairly cheap, while N+1 queries can come at a very high cost for your database and your response times.</p>
<h2 id="conclusion">Conclusion</h2>
<p>The best results will come from applying a combination of the solutions above. Golidloader will get rid of some of your N+1 queries, but you’ll also need to start writing your associations with eager loading in mind. Testing for N+1 regressions and proper instrumentation will keep your hard-won performance improvements intact.</p>
<p>What do we say to the God of N+1 Queries? <em>Not today.</em></p>
<h2 id="links">Links</h2>
<ul>
<li><a href="https://guides.rubyonrails.org/active_record_querying.html#eager-loading-associations">Rails Guides: Eager Loading Associations</a></li>
<li><a href="https://github.com/salsify/goldiloader">Goldiloader</a>: Automatic eager loading for ActiveRecord</li>
<li><a href="https://github.com/lipanski/sql-spy">SqlSpy</a>: Track SQL queries performed inside a block of code</li>
<li><a href="https://sequel.jeremyevans.net/rdoc-plugins/classes/Sequel/Plugins/TacticalEagerLoading.html">Automatic eager loading for the Sequel ORM</a></li>
<li><a href="https://github.com/flyerhzm/bullet">Bullet</a>: N+1 monitoring on steroids</li>
<li><a href="https://github.com/nepalez/rspec-sqlimit">rspec-sqlimit</a>: An RSpec matcher to check the total number of performed SQL queries</li>
<li><a href="https://stackoverflow.com/a/46421504">https://stackoverflow.com/a/46421504</a></li>
<li><a href="https://github.com/rails/rails/blob/6-0-stable/activerecord/test/cases/test_case.rb">https://github.com/rails/rails/blob/6-0-stable/activerecord/test/cases/test_case.rb</a></li>
<li><a href="https://stackoverflow.com/a/5492207/801186">https://stackoverflow.com/a/5492207/801186</a></li>
</ul>Some things you should know about eager loading in ActiveRecordServing ActiveStorage uploads through a CDN with Rails direct routes2020-06-23T00:00:00+00:002020-06-23T00:00:00+00:00https://lipanski.com/posts/activestorage-cdn-rails-direct-route<h1 id="serving-activestorage-uploads-through-a-cdn-with-rails-direct-routes">Serving ActiveStorage uploads through a CDN with Rails direct routes</h1>
<p>ActiveStorage makes it really easy to upload files from Rails to an S3 bucket or an S3-compatible service, like DigitalOcean Spaces. Refer to the <a href="https://edgeguides.rubyonrails.org/active_storage_overview.html">official documentation</a> if you’d like to know more about setting up ActiveStorage.</p>
<p>If your uploads are meant to be public and you were thinking of serving them directly through the CDN sitting in front of your S3 bucket, you’ll soon notice a problem: ActiveStorage URLs are built to always go through your Rails app, mainly through <a href="https://github.com/rails/rails/blob/bc9fb9cf8b5dbe8ecf399ffd5d48d84bdb96a9db/activestorage/app/controllers/active_storage/blobs_controller.rb#L10-L13">ActiveStorage::BlobsController</a>. This controller is responsible for setting the cache headers and redirecting to the bucket URL. Your Rails app will be the first point of contact even if it’s just to retrieve the bucket URL. On top of that, there’s no place to specify a CDN host to replace the bucket host.</p>
<p>Fortunately, there is an easy way to go around this problem. In order to translate stored files into URLs, Rails provides the URL helper <a href="https://edgeguides.rubyonrails.org/active_storage_overview.html#linking-to-files">rails_blob_url</a>, which basically resolves to this <code class="language-plaintext highlighter-rouge">ActiveStorage::BlobsController</code>. We’d like to introduce a new helper that points directly to our CDN host.</p>
<p>Though there are different ways of solving this problem, I found using Rails direct routes an elegant solution. <a href="https://guides.rubyonrails.org/routing.html#direct-routes">Rails direct routes</a> provide a way to create URL helpers directly from your <code class="language-plaintext highlighter-rouge">config/routes.rb</code>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># config/routes.rb</span>
<span class="n">direct</span> <span class="ss">:rails_public_blob</span> <span class="k">do</span> <span class="o">|</span><span class="n">blob</span><span class="o">|</span>
<span class="no">File</span><span class="p">.</span><span class="nf">join</span><span class="p">(</span><span class="s2">"https://cdn.example.com"</span><span class="p">,</span> <span class="n">blob</span><span class="p">.</span><span class="nf">key</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>You can call this route the same way you’d call the original Rails URL helper:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">User</span>
<span class="n">has_one_attached</span> <span class="ss">:profile_picture</span>
<span class="k">end</span>
<span class="n">rails_public_blob_url</span><span class="p">(</span><span class="no">User</span><span class="p">.</span><span class="nf">first</span><span class="p">.</span><span class="nf">profile_picture</span><span class="p">)</span>
<span class="c1"># => https://cdn.example.com/j8rte71tp8xpq5afr3uqxlcqtkzn</span>
<span class="c1"># You can also use this outside views</span>
<span class="no">Rails</span><span class="p">.</span><span class="nf">application</span><span class="p">.</span><span class="nf">routes</span><span class="p">.</span><span class="nf">url_helpers</span><span class="p">.</span><span class="nf">rails_public_blob_url</span><span class="p">(</span><span class="no">User</span><span class="p">.</span><span class="nf">first</span><span class="p">.</span><span class="nf">profile_picture</span><span class="p">)</span>
</code></pre></div></div>
<p>Let’s refactor our route a bit:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># config/routes.rb</span>
<span class="n">direct</span> <span class="ss">:rails_public_blob</span> <span class="k">do</span> <span class="o">|</span><span class="n">blob</span><span class="o">|</span>
<span class="c1"># Preserve the behaviour of `rails_blob_url` inside these environments</span>
<span class="c1"># where S3 or the CDN might not be configured</span>
<span class="k">if</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="p">.</span><span class="nf">development?</span> <span class="o">||</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="p">.</span><span class="nf">test?</span>
<span class="n">route_for</span><span class="p">(</span><span class="ss">:rails_blob</span><span class="p">,</span> <span class="n">blob</span><span class="p">)</span>
<span class="k">else</span>
<span class="c1"># Use an environment variable instead of hard-coding the CDN host</span>
<span class="c1"># You could also use the Rails.configuration to achieve the same</span>
<span class="no">File</span><span class="p">.</span><span class="nf">join</span><span class="p">(</span><span class="no">ENV</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="s2">"CDN_HOST"</span><span class="p">),</span> <span class="n">blob</span><span class="p">.</span><span class="nf">key</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<h2 id="proxy-mode">Proxy mode</h2>
<p>Rails <a href="https://github.com/rails/rails/pull/42305">recently</a> introduced a way to configure a CDN host for your ActiveStorage assets. This requires a <em>proxy-enabled</em> CDN (Cloudflare, CloudFront, nginx etc.) - so using <em>only</em> S3 or DigitalOcean Spaces as public file servers is excluded. On top of that, the CDN will fall back to the Rails backend once per uncached file (and again every time the CDN cache is invalidated). Yes, the backend request is fairly cheap (it’s just a redirect), but it can get delayed by other slower requests to your backend during peak times.</p>
<p>The solution proposed in my article can serve assets directly from S3 or DigitalOcean Spaces, using these services as public static file servers. At the end of the day, it all depends what kind of CDN you are using, how much you are willing to add to your infrastructure and at which level you’d like to optimize. For your average website I think serving assets directly from S3 or DigitalOcean Spaces is perfectly fine.</p>
<p>You can read more about proxy mode <a href="https://edgeguides.rubyonrails.org/active_storage_overview.html#proxy-mode">here</a>.</p>
<h2 id="variants">Variants</h2>
<p>If you’re using variants, things will look a bit different in your development environment. Running the following code:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">image</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">first</span><span class="p">.</span><span class="nf">profile_picture</span>
<span class="n">rails_blob_url</span><span class="p">(</span><span class="n">image</span><span class="p">.</span><span class="nf">variant</span><span class="p">(</span><span class="ss">resize_to_limit: </span><span class="p">[</span><span class="mi">100</span><span class="p">,</span> <span class="mi">100</span><span class="p">]).</span><span class="nf">processed</span><span class="p">)</span>
</code></pre></div></div>
<p>…will produce an error: <code class="language-plaintext highlighter-rouge">NoMethodError (undefined method 'signed_id' for #<ActiveStorage::Variant>)</code>.</p>
<p>According to <a href="https://github.com/rails/rails/issues/32500#issuecomment-380004250">this comment</a>, the recommended way for accessing variants directly is by using the <code class="language-plaintext highlighter-rouge">rails_representation_url</code> helper. The following call should work:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">image</span> <span class="o">=</span> <span class="no">User</span><span class="p">.</span><span class="nf">first</span><span class="p">.</span><span class="nf">profile_picture</span>
<span class="n">rails_representation_url</span><span class="p">(</span><span class="n">image</span><span class="p">.</span><span class="nf">variant</span><span class="p">(</span><span class="ss">resize_to_limit: </span><span class="p">[</span><span class="mi">100</span><span class="p">,</span> <span class="mi">100</span><span class="p">]).</span><span class="nf">processed</span><span class="p">)</span>
</code></pre></div></div>
<p>Let’s update our direct route to accomodate the logic for variants:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># config/routes.rb</span>
<span class="n">direct</span> <span class="ss">:rails_public_blob</span> <span class="k">do</span> <span class="o">|</span><span class="n">blob</span><span class="o">|</span>
<span class="c1"># Preserve the behaviour of `rails_blob_url` inside these environments</span>
<span class="c1"># where S3 or the CDN might not be configured</span>
<span class="k">if</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="p">.</span><span class="nf">development?</span> <span class="o">||</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="p">.</span><span class="nf">test?</span>
<span class="n">route</span> <span class="o">=</span>
<span class="c1"># ActiveStorage::VariantWithRecord was introduced in Rails 6.1</span>
<span class="c1"># Remove the second check if you're using an older version</span>
<span class="k">if</span> <span class="n">blob</span><span class="p">.</span><span class="nf">is_a?</span><span class="p">(</span><span class="no">ActiveStorage</span><span class="o">::</span><span class="no">Variant</span><span class="p">)</span> <span class="o">||</span> <span class="n">blob</span><span class="p">.</span><span class="nf">is_a?</span><span class="p">(</span><span class="no">ActiveStorage</span><span class="o">::</span><span class="no">VariantWithRecord</span><span class="p">)</span>
<span class="ss">:rails_representation</span>
<span class="k">else</span>
<span class="ss">:rails_blob</span>
<span class="k">end</span>
<span class="n">route_for</span><span class="p">(</span><span class="n">route</span><span class="p">,</span> <span class="n">blob</span><span class="p">)</span>
<span class="k">else</span>
<span class="c1"># Use an environment variable instead of hard-coding the CDN host</span>
<span class="no">File</span><span class="p">.</span><span class="nf">join</span><span class="p">(</span><span class="no">ENV</span><span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="s2">"CDN_HOST"</span><span class="p">),</span> <span class="n">blob</span><span class="p">.</span><span class="nf">key</span><span class="p">)</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Note that the <em>production</em> version using the CDN works the same for both the original attachment as well as the variants.</p>
<h2 id="conclusion">Conclusion</h2>
<p>You can use this new URL helper whenever your ActiveStorage files should be served directly through a CDN without having to deploy this setup to your development environment.</p>
<p>Rails 6.1 will allow defining multiple storage services for the same environment, which means you’ll be able to use both public and private buckets from your code. This makes using public buckets and CDNs an even more viable option than before. See this <a href="https://github.com/rails/rails/pull/34935">PR</a> for more details.</p>
<p>Thanks to Eduardo Álvarez for raising the variants issue in the comments.</p>Serving ActiveStorage uploads through a CDN with Rails direct routesSpeed up your Docker builds with –cache-from2020-04-24T00:00:00+00:002020-04-24T00:00:00+00:00https://lipanski.com/posts/speed-up-your-docker-builds-with-cache-from<h1 id="speed-up-your-docker-builds-with-cache-from">Speed up your Docker builds with –cache-from</h1>
<p>Using the Docker cache efficiently can result in significantly faster build times. In some environments though, like CI/CD systems, individual builds happen independent of each other and the build cache is never preserved. Every build starts from zero which can be slow and wasteful. This article will try to provide some solutions for these cases.</p>
<p>As long as you’re pushing images to a remote registry, you can always use a previously built image as a cache layer for a new build. You can achieve this by setting the <code class="language-plaintext highlighter-rouge">--cache-from</code> option on the <code class="language-plaintext highlighter-rouge">docker build</code> call. For versions of Docker that don’t include BuildKit, you’ll have to pull the image yourself before running <code class="language-plaintext highlighter-rouge">docker build</code>. Assuming the latter, here’s how things would look like:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># This is the full image name, including the registry</span>
<span class="nv">IMAGE</span><span class="o">=</span><span class="s2">"my-docker-registry.example.com/my-docker-image"</span>
<span class="c"># Pull an older, existing version from the registry</span>
docker pull <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:0.1.0
<span class="c"># Build a new version by using the older version as a cache</span>
docker build <span class="nt">--cache-from</span> <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:0.1.0 <span class="nt">-t</span> <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:0.2.0 <span class="nb">.</span>
<span class="c"># Push the new version to the registry so that we can use it as a cache for future builds</span>
docker push <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:0.2.0
</code></pre></div></div>
<h2 id="maximize-your-chances-of-hitting-the-cache">Maximize your chances of hitting the cache</h2>
<p>You can pass the <code class="language-plaintext highlighter-rouge">--cache-from</code> option <strong>several times</strong>, to provide different images to use as a cache. Let’s assume your remote registry contains version builds (<code class="language-plaintext highlighter-rouge">1.0.0</code>), which you build once a month, and branch builds, which are built whenever you push code to a branch. Ideally you’d use the branch build images, because those are fresher, but if no branch was built yet you’d like to fall back to a version build image. You can call <code class="language-plaintext highlighter-rouge">--cache-from</code> several times to fetch the most suitable image:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">IMAGE</span><span class="o">=</span><span class="s2">"my-docker-registry.example.com/my-docker-image"</span>
<span class="c"># Between `current-branch`, `master` and a tagged version `1.0.0`, we prefer current-branch</span>
<span class="c"># If there was no previously built `current-branch` image, we fetch `master`</span>
<span class="c"># If there's no `master` image, we fall back to the tagged version</span>
<span class="c"># We don't need all 3 images, just the most suitable one, hence the `||`</span>
docker pull <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:current-branch <span class="o">||</span> <span class="se">\</span>
docker pull <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:master <span class="o">||</span> <span class="se">\</span>
docker pull <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:1.0.0 <span class="o">||</span> <span class="se">\</span>
<span class="nb">true</span>
<span class="c"># Build a new version while mentioning all possible cache sources</span>
docker build <span class="se">\</span>
<span class="nt">--cache-from</span> <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:current-branch <span class="se">\</span>
<span class="nt">--cache-from</span> <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:master <span class="se">\</span>
<span class="nt">--cache-from</span>:<span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:1.0.0 <span class="se">\</span>
<span class="nt">-t</span> <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:current-branch <span class="nb">.</span>
<span class="c"># Push the new version to the registry so that we can use it as a cache for future builds</span>
docker push <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:current-branch
</code></pre></div></div>
<p>At this point you’ll have to do the math: depending on your build infrastructure, if the time to fetch the remote images and build with <code class="language-plaintext highlighter-rouge">--cache-from</code> is less than the time it takes to build without using the cache, then this was worth it. If you’re build is fast anyway or downloading the images comes at a high cost, then it might not be something for you.</p>
<h2 id="multi-stage-builds">Multi-stage builds</h2>
<p>With multi-stage builds things get a little more complicated. The intermediate build stages are never pushed to the remote registry so you can’t use them as cache.</p>
<p>Consider the following <em>Dockerfile</em>:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># The builder stage</span>
<span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine AS builder</span>
<span class="k">RUN </span>apk add <span class="nt">--update</span> libxml2-dev
<span class="k">COPY</span><span class="s"> Gemfile ./</span>
<span class="k">RUN </span>bundle <span class="nb">install</span>
<span class="c"># The final stage</span>
<span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine</span>
<span class="k">COPY</span><span class="s"> --from=builder /usr/local/bundle/ /usr/local/bundle/</span>
<span class="k">COPY</span><span class="s"> app ./</span>
<span class="k">CMD</span><span class="s"> ["rackup"] </span>
</code></pre></div></div>
<p>Any change to the <code class="language-plaintext highlighter-rouge">Gemfile</code> will require a full build, including the line that installs <code class="language-plaintext highlighter-rouge">libxml2-dev</code>. Only a change restricted to the <code class="language-plaintext highlighter-rouge">app/</code> directory will be able to use the cache.</p>
<p>One possible solution is <strong>storing intermediate build stages in the registry</strong>. Your new build process could look something like this:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">IMAGE</span><span class="o">=</span>my-image
<span class="c"># Build the builder image</span>
docker build <span class="nt">--target</span> builder <span class="nt">-t</span> <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:builder <span class="nb">.</span>
<span class="c"># Build the final image</span>
docker build <span class="nt">-t</span> <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:final <span class="nb">.</span>
<span class="c"># Push both builder and final images to the remote registry</span>
docker push <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:builder
docker push <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:final
</code></pre></div></div>
<p>The build itself will be just as fast as if you’d make only one call, because the second <code class="language-plaintext highlighter-rouge">docker build</code> can use the local build cache for the <code class="language-plaintext highlighter-rouge">builder</code> stage. The potential bandwidth/speed penalty comes from having to push the additional image to the registry.</p>
<p>The full multi-stage build including the <code class="language-plaintext highlighter-rouge">--cache-from</code> usage would end up looking something like this:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">IMAGE</span><span class="o">=</span><span class="s2">"my-docker-registry.example.com/my-docker-image"</span>
<span class="c"># Pull older versions of the builder and final images from the registry (if any)</span>
docker pull <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:builder <span class="o">||</span> <span class="nb">true
</span>docker pull <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:final <span class="o">||</span> <span class="nb">true</span>
<span class="c"># Build the builder image by using the older builder image as a cache</span>
docker build <span class="nt">--cache-from</span> <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:builder <span class="nt">-t</span> <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:builder <span class="nb">.</span>
<span class="c"># Build the final image by using the older final image as a cache</span>
<span class="c"># ...but also the local cache from the previous builder build</span>
docker build <span class="nt">--cache-from</span> <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:final <span class="nt">-t</span> <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:final <span class="nb">.</span>
<span class="c"># Push both images so that we can use them as a cache for future builds</span>
docker push <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:builder
docker push <span class="k">${</span><span class="nv">IMAGE</span><span class="k">}</span>:final
</code></pre></div></div>
<p>On top of that, as explained in the previous section you can call <code class="language-plaintext highlighter-rouge">--cache-from</code> several times, in order to identify the best image pair (builder/final) to use as cache.</p>
<p>This is where you have to do the math again: is the bandwidth/speed penalty incurring from pushing and pulling these intermediate images worth it? Should you optimize for build time or for deployment/scaling time? Would you trade multi-stage builds against simplifying the build process?</p>
<h2 id="same-but-different-docker-loadsave">Same but different: docker load/save</h2>
<p>If your build environment has access to some shared storage (e.g. S3, EBS or just a shared directory), you can use the <code class="language-plaintext highlighter-rouge">docker save</code> and <code class="language-plaintext highlighter-rouge">docker load</code> commands to store and retrieve images. You can later reuse these images in order to enhance your local build cache. The <code class="language-plaintext highlighter-rouge">docker save</code> command saves one or more images as a tar file, which can be placed inside your shared storage. Before your next build, you can retrieve this file and unpack the images back into the local registry by calling <code class="language-plaintext highlighter-rouge">docker load</code>. During the build, point the <code class="language-plaintext highlighter-rouge">--cache-from</code> option to the loaded image. Here’s how it goes:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Before running the build, unpack and load images from my-image.tar into the local registry</span>
docker load <span class="nt">-i</span> /some/shared/directory/my-image.tar <span class="o">||</span> <span class="nb">true</span>
<span class="c"># Run the build with the --cache-from option pointing to the saved image</span>
docker build <span class="nt">--cache-from</span> my-image:latest <span class="nt">-t</span> my-image:latest <span class="nb">.</span>
<span class="c"># Pack and save the freshly built image inside the shared directory </span>
docker save <span class="nt">-o</span> /some/shared/directory/my-image.tar my-image:latest
</code></pre></div></div>
<p>This approach can be a bit more flexible in environments where it’s hard to access the remote registry.</p>
<h2 id="buildkit">BuildKit</h2>
<p>If your Docker version has access to <a href="https://docs.docker.com/develop/develop-images/build_enhancements/">BuildKit</a>, check out the improvements around <code class="language-plaintext highlighter-rouge">BUILDKIT_INLINE_CACHE</code>, which can save you an expensive <code class="language-plaintext highlighter-rouge">docker pull</code> operation.</p>
<h2 id="further-reading">Further reading</h2>
<p>Check out my other article on <a href="/posts/dockerfile-ruby-best-practices">Best practices when writing a Dockerfile</a>.</p>Speed up your Docker builds with –cache-fromPersistence in (AWS ElastiCache) Redis2020-04-01T00:00:00+00:002020-04-01T00:00:00+00:00https://lipanski.com/posts/persistence-in-elasticache-redis<h1 id="persistence-in-aws-elasticache-redis">Persistence in (AWS ElastiCache) Redis</h1>
<p>If you restart your Redis server and expect your data to still be there when the server comes back, you might be in for a surprise.</p>
<p>This article will give you a brief overview of how to ensure data persistence across restarts in Redis. Then it will focus on how to achieve the same in AWS ElastiCache Redis clusters, while discussing some of the limitations.</p>
<p>Before we start, it’s worth stating that Redis <em>can</em> be used as a persistent data store and it <em>can</em> provide you with strong persistence guarantees, if you chose to enable them.</p>
<p>If you’re using Redis solely as a cache and can afford losing the data between restarts or crashes, then this article might not be for you.</p>
<p>On the other hand, if you’re using Sidekiq or other similar tools backed by Redis, this article might help protect your data on the long run.</p>
<h2 id="persistence-in-redis">Persistence in Redis</h2>
<p>There are two ways to ensure data persistence in Redis across restarts: through regular database snapshots (RDB) or by enabling the <em>append-only file</em> (AOF), a log capturing all the performed operations. Both methods, if enabled, will be replayed automatically when your Redis server starts.</p>
<p><strong>Regular snapshots (RDB)</strong> can be enabled by configuring save points inside your <code class="language-plaintext highlighter-rouge">redis.conf</code> file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># after 900 sec (15 min) if at least 1 key changed
save 900 1
# after 300 sec (5 min) if at least 10 keys changed
save 300 10
# after 60 sec if at least 10000 keys changed
save 60 10000
</code></pre></div></div>
<p>You will have to restart the server in order to apply these changes. To verify your configuration, call <code class="language-plaintext highlighter-rouge">info persistence</code> inside Redis and look for the lines starting with <code class="language-plaintext highlighter-rouge">rdb</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rdb_changes_since_last_save:2
rdb_bgsave_in_progress:0
rdb_last_save_time:1582116473
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
</code></pre></div></div>
<p>Depending on the frequency of your configured save points, you might have to account with some data loss if your Redis server crashes before it gets to save your most recent data to disk.</p>
<p><strong>Append-only file (AOF)</strong> persistence can be enabled from within your <code class="language-plaintext highlighter-rouge">redis.conf</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>appendonly yes
appendfsync always|everysec|no
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">appendfsync</code> option tells Redis how often it should flush the log file to disk. If you set it to <code class="language-plaintext highlighter-rouge">no</code>, it will leave this up to your operating system. If you set it to <code class="language-plaintext highlighter-rouge">always</code>, every new entry (basically every performed operation), will be flushed to disk immediately. This is definitely the safest option but it almost invalidates Redis as an in-memory data store, as it has to write to disk on every call.</p>
<p>A good middle ground is using the <code class="language-plaintext highlighter-rouge">everysec</code> option, which flushes the log entries to disk every second. This means that, during a crash, you can lose at most one second of data.</p>
<p>You will have to restart the server in order to apply this configuration and you can validate it by calling <code class="language-plaintext highlighter-rouge">info persistence</code> and looking for the lines starting with <code class="language-plaintext highlighter-rouge">aof</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_last_cow_size:0
aof_current_size:88
aof_base_size:88
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0
</code></pre></div></div>
<p>You can <strong>combine the RDB and AOF methods</strong>. The <a href="https://redis.io/topics/persistence">Redis documentation</a> makes following recommendations:</p>
<blockquote>
<p>The general indication is that you should use both persistence methods if you want a degree of data safety comparable to what PostgreSQL can provide you.</p>
</blockquote>
<blockquote>
<p>If you care a lot about your data, but still can live with a few minutes of data loss in case of disasters, you can simply use RDB alone.</p>
</blockquote>
<h2 id="what-happens-when-redis-runs-out-of-memory">What happens when Redis runs out of memory?</h2>
<p>When discussing persistence and data integrity, it’s good to understand what happens when you’re trying to push a new value to Redis and your server is out of memory.</p>
<p>The answer to this question resides in the value of the <code class="language-plaintext highlighter-rouge">maxmemory-policy</code> setting. When set to <code class="language-plaintext highlighter-rouge">noeviction</code>, Redis will raise an error when the memory limit was reached. This is <strong>the safest choice</strong> if you care about the data inside your Redis cluster.</p>
<p>Other options include <code class="language-plaintext highlighter-rouge">allkeys-lru</code>, <code class="language-plaintext highlighter-rouge">volatile-lru</code>, <code class="language-plaintext highlighter-rouge">allkeys-random</code>, <code class="language-plaintext highlighter-rouge">volatile-random</code> or <code class="language-plaintext highlighter-rouge">volatile-ttl</code> and they will all make room for the new key by evicting older, existing ones and thus causing data loss.</p>
<p>You can check your current configuration by calling <code class="language-plaintext highlighter-rouge">info memory</code> inside Redis. You can update the directive from within your <code class="language-plaintext highlighter-rouge">redis.conf</code>.</p>
<p>Last but not least, you can control the amount of memory Redis has access to by making use of the <code class="language-plaintext highlighter-rouge">maxmemory</code> configuration directive.</p>
<h2 id="persistence-in-aws-elasticache-redis-1">Persistence in AWS ElastiCache Redis</h2>
<p>Persistence in AWS ElastiCache Redis clusters is a more complicated story. They really live by that <em>Cache</em> in <em>ElastiCache</em>. For the most basic, single node deployment using the default parameter group, persistence is not guaranteed: after a restart or a crash, your data is gone.</p>
<p>Then again, the <a href="https://aws.amazon.com/elasticache/redis/faqs/">AWS ElastiCache FAQ</a> hint at achieving persistence is not very helpful:</p>
<blockquote>
<p>Does Amazon ElastiCache for Redis support Redis persistence? Yes, you can achieve persistence by snapshotting your Redis data using the Backup and Restore feature.</p>
</blockquote>
<p>Enabling <strong>daily backups</strong> is a good start, especially when you’ve got one daily snapshot per cluster free of charge. The general price is pretty low and you can go up to 35 days retention. The problem with these backups is that <strong>you can not replay them automatically after a restart or a crash</strong>. Furthermore you can’t replay a backup manually on top of an <em>existing</em> cluster either. You may only restore a backup manually and to a new cluster.</p>
<p>On top of daily backups, you can enable <strong>append-only file (AOF)</strong> persistence but with some severe limitations:</p>
<ul>
<li>AOF is not supported on Redis 2.8.22 and later</li>
<li>AOF is not supported for <em>cache.t1</em>, <em>cache.t2</em> or <em>cache.t3</em> instances</li>
<li>AOF is not supported on Multi-AZ replication groups</li>
</ul>
<p>…and depending on your use case, these restrictions might speak strongly against using AOF.</p>
<p>The last option is probably also the most attractive one: <strong>Multi-AZ with auto-failover</strong>. This is basically a primary/replica system across availability zones with automatic failover (replica promotion). When the primary crashes or requires an upgrade, the replica is promoted in its place. You can also decide to promote a replica manually. Multi-AZ with auto-failover has its own limitations:</p>
<ul>
<li>Not supported for Redis 2.8.5 or earlier</li>
<li>Not supported for <em>cache.t1</em> instances</li>
<li>The failover comes with a little downtime – I assume due to DNS propagation</li>
<li>When failing over, a small amount of data might be lost due to replication lag</li>
</ul>
<p>Then again, these limitations are less of a burden than the AOF restrictions and having your Redis cluster running across different availability zone is definitely a plus.</p>
<p>The price for the <strong>cheapest instance that can enable AOF</strong> is $74.16 per month for a previous generation <em>cache.m3.medium</em> instance or $133.92 per month for a current generation <em>cache.m5.large</em> instance.</p>
<p>The price for the <strong>cheapest instance that you can hook up into a Multi-AZ auto-failover system</strong> is $13.68 per month for a <em>cache.t3.micro</em> instance, which makes the cheapest Multi-AZ auto-failover cluster cost $27.36 per month.</p>
<p>If you’re looking for cheap/small but persistent, then Multi-AZ is the way to go.</p>
<p>Last but not least, the default parameter groups in ElastiCache are tailored for <strong>volatile caches</strong>, so <code class="language-plaintext highlighter-rouge">maxmemory-policy</code> is set to <code class="language-plaintext highlighter-rouge">volatile-lru</code>. You might want to change this to <code class="language-plaintext highlighter-rouge">noeviction</code> if you’d like to control when your data gets deleted.</p>
<p>In <strong>conclusion</strong>, the solution I’d recommend in order to keep your AWS ElastiCache Redis data as persistent as possible would be:</p>
<ul>
<li>Opting for a Multi-AZ auto-failover system with at least 2 nodes.</li>
<li>Enabling daily backups with as much retention as needed.</li>
<li>Setting the <code class="language-plaintext highlighter-rouge">maxmemory-policy</code> directive to <code class="language-plaintext highlighter-rouge">noeviction</code> inside your ElastiCache parameter group.</li>
</ul>
<p><em>The information describe above was collected around February 2020. You should consult the official AWS documentation for any changes beyond this date. All prices correspond to the Frankfurt region.</em></p>
<h2 id="reference">Reference</h2>
<ul>
<li><a href="https://redis.io/topics/persistence">https://redis.io/topics/persistence</a></li>
<li><a href="https://redis.io/topics/lru-cache">https://redis.io/topics/lru-cache</a></li>
<li><a href="https://aws.amazon.com/elasticache/redis/faqs/">https://aws.amazon.com/elasticache/redis/faqs/</a></li>
<li><a href="https://github.com/awsdocs/amazon-elasticache-docs/blob/master/doc_source/redis/AutoFailover.md">https://github.com/awsdocs/amazon-elasticache-docs/blob/master/doc_source/redis/AutoFailover.md</a></li>
<li><a href="https://aws.amazon.com/premiumsupport/knowledge-center/fault-tolerance-elasticache/">https://aws.amazon.com/premiumsupport/knowledge-center/fault-tolerance-elasticache/</a></li>
<li><a href="https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/RedisAOF.html">https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/RedisAOF.html</a></li>
</ul>Persistence in (AWS ElastiCache) RedisBest practices when writing a Dockerfile for a Ruby application2019-09-20T00:00:00+00:002019-09-20T00:00:00+00:00https://lipanski.com/posts/dockerfile-ruby-best-practices<h1 class="no_toc" id="best-practices-when-writing-a-dockerfile-for-a-ruby-application">Best practices when writing a Dockerfile for a Ruby application</h1>
<p>The simplicity of the <em>Dockerfile</em> format is one of the reasons why Docker managed to become so popular in the first place. Getting something working is fairly easy. Producing a clean, small, secure image that will not leak secrets might not be as straight-forward though.</p>
<p>This post will try to share some best practices when writing a <em>Dockerfile</em> for a Ruby app, though most of these points should apply to any other runtime as well. Towards the end, I will provide full examples for three different use cases.</p>
<p>Here’s a summary of what’s coming:</p>
<ul id="markdown-toc">
<li><a href="#1-pin-your-base-image-version" id="markdown-toc-1-pin-your-base-image-version">1. Pin your base image version</a></li>
<li><a href="#2-use-only-trusted-or-official-base-images" id="markdown-toc-2-use-only-trusted-or-official-base-images">2. Use only trusted or official base images</a></li>
<li><a href="#3-pin-your-application-dependencies" id="markdown-toc-3-pin-your-application-dependencies">3. Pin your application dependencies</a></li>
<li><a href="#4-add-a-dockerignore-file-to-your-repository" id="markdown-toc-4-add-a-dockerignore-file-to-your-repository">4. Add a .dockerignore file to your repository</a></li>
<li><a href="#5-group-commands-by-how-likely-they-are-to-change-individually" id="markdown-toc-5-group-commands-by-how-likely-they-are-to-change-individually">5. Group commands by how likely they are to change individually</a></li>
<li><a href="#6-place-the-least-likely-to-change-commands-at-the-top" id="markdown-toc-6-place-the-least-likely-to-change-commands-at-the-top">6. Place the least likely to change commands at the top</a></li>
<li><a href="#7-avoid-running-your-application-as-root" id="markdown-toc-7-avoid-running-your-application-as-root">7. Avoid running your application as root</a></li>
<li><a href="#8-when-running-copy-or-add-as-a-different-user-use-chown" id="markdown-toc-8-when-running-copy-or-add-as-a-different-user-use-chown">8. When running COPY or ADD (as a different user) use –chown</a></li>
<li><a href="#9-avoid-leaking-secrets-inside-your-image" id="markdown-toc-9-avoid-leaking-secrets-inside-your-image">9. Avoid leaking secrets inside your image</a></li>
<li><a href="#10-always-clean-up-injected-secrets-within-the-same-build-step" id="markdown-toc-10-always-clean-up-injected-secrets-within-the-same-build-step">10. Always clean up injected secrets within the same build step</a></li>
<li><a href="#11-fetching-private-dependencies-via-a-github-token-injected-through-the-gitconfig" id="markdown-toc-11-fetching-private-dependencies-via-a-github-token-injected-through-the-gitconfig">11. Fetching private dependencies via a Github token injected through the gitconfig</a></li>
<li><a href="#12-minimize-image-size-by-opting-for-small-base-images-when-possible" id="markdown-toc-12-minimize-image-size-by-opting-for-small-base-images-when-possible">12. Minimize image size by opting for small base images when possible</a></li>
<li><a href="#13-use-multi-stage-builds-to-reduce-the-size-of-your-image" id="markdown-toc-13-use-multi-stage-builds-to-reduce-the-size-of-your-image">13. Use multi-stage builds to reduce the size of your image</a></li>
<li><a href="#14-use-multi-stage-builds-to-avoid-leaking-secrets-inside-your-docker-history" id="markdown-toc-14-use-multi-stage-builds-to-avoid-leaking-secrets-inside-your-docker-history">14. Use multi-stage builds to avoid leaking secrets inside your docker history</a></li>
<li><a href="#15-when-setting-the-cmd-instruction-prefer-the-exec-format-over-the-shell-format" id="markdown-toc-15-when-setting-the-cmd-instruction-prefer-the-exec-format-over-the-shell-format">15. When setting the CMD instruction, prefer the exec format over the shell format</a></li>
<li><a href="#16-avoid-installing-development-or-test-dependencies-in-your-production-builds" id="markdown-toc-16-avoid-installing-development-or-test-dependencies-in-your-production-builds">16. Avoid installing development or test dependencies in your production builds</a></li>
<li><a href="#17-optional-combine-production-test-and-development-build-processes-into-a-single-dockerfile-by-using-multi-stage-builds" id="markdown-toc-17-optional-combine-production-test-and-development-build-processes-into-a-single-dockerfile-by-using-multi-stage-builds">17. Optional: Combine production, test and development build processes into a single Dockerfile by using multi-stage builds</a></li>
<li><a href="#18-bonus-running-migrations" id="markdown-toc-18-bonus-running-migrations">18. Bonus: Running migrations</a></li>
<li><a href="#putting-it-all-together" id="markdown-toc-putting-it-all-together">Putting it all together…</a> <ul>
<li><a href="#dockerfile-for-a-plain-ruby-app-or-a-rails-app-without-assets" id="markdown-toc-dockerfile-for-a-plain-ruby-app-or-a-rails-app-without-assets">Dockerfile for a plain Ruby app or a Rails app without assets</a></li>
<li><a href="#dockerfile-for-a-rails-app-with-assets" id="markdown-toc-dockerfile-for-a-rails-app-with-assets">Dockerfile for a Rails app with assets</a></li>
<li><a href="#dockerfile-for-a-rails-app-with-assets-and-private-dependencies" id="markdown-toc-dockerfile-for-a-rails-app-with-assets-and-private-dependencies">Dockerfile for a Rails app with assets and private dependencies</a></li>
</ul>
</li>
</ul>
<p>The code presented here can also be found at <a href="https://github.com/lipanski/ruby-dockerfile-example">https://github.com/lipanski/ruby-dockerfile-example</a>.</p>
<p>Let’s begin:</p>
<h2 id="1-pin-your-base-image-version">1. Pin your base image version</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby</span>
<span class="k">CMD</span><span class="s"> ruby -e "puts 1 + 2"</span>
</code></pre></div></div>
<p>Good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="k">CMD</span><span class="s"> ruby -e "puts 1 + 2"</span>
</code></pre></div></div>
<p>If you want <strong>reproducible builds</strong> (and trust me, you want them), make sure to pin down the version of your base image. Try to be as accurate as possible by specifying every digit, including the patch version.</p>
<p>If you want to update your base image, do it in a <strong>controlled, explicit</strong> manner which can also be reverted easily (e.g. via a pull request). It will save you a lot of debugging pain in the future.</p>
<h2 id="2-use-only-trusted-or-official-base-images">2. Use only trusted or official base images</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> random-dude-on-the-internet/ruby:2.5.5</span>
</code></pre></div></div>
<p>Good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
</code></pre></div></div>
<p>Also good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Assuming that is your own image registry, which you control entirely</span>
<span class="k">FROM</span><span class="s"> your-own-registry.com/ruby:2.5.5</span>
</code></pre></div></div>
<p>When using images from <a href="https://hub.docker.com">https://hub.docker.com</a>, prefer <strong>official images</strong> and/or try to checksum their contents. All <em>official</em> images are marked with the phrase <em>Docker Official Images</em> next to their title - like the <a href="https://hub.docker.com/_/ruby">official Ruby image</a>.</p>
<p>For anything that’s not available as an official image, build and host the base image yourself - ideally by starting from a trusted/official one.</p>
<p>Keep in mind that Docker Hub does not prevent modifying images or tags by their authors over time, which is why you probably shouldn’t trust everything in there.</p>
<h2 id="3-pin-your-application-dependencies">3. Pin your application dependencies</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="k">RUN </span>gem <span class="nb">install </span>sinatra
</code></pre></div></div>
<p>Good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="k">RUN </span>gem <span class="nb">install </span>sinatra <span class="nt">-v</span> 2.0.5
</code></pre></div></div>
<p>Similary to the base image version, if you don’t pin your application dependencies you might be in for surprises the next time you build your image. This doesn’t mean you should never update your dependencies but that you should do it in a controlled manner.</p>
<p>Most package managers have a way to pin dependencies, be it <code class="language-plaintext highlighter-rouge">Gemfile.lock</code>, <code class="language-plaintext highlighter-rouge">package-lock.json</code>, <code class="language-plaintext highlighter-rouge">yarn.lock</code> or <code class="language-plaintext highlighter-rouge">Pipfile.lock</code> - use it to guarantee <strong>reproducible builds</strong>.</p>
<h2 id="4-add-a-dockerignore-file-to-your-repository">4. Add a .dockerignore file to your repository</h2>
<p>The Docker <code class="language-plaintext highlighter-rouge">COPY</code> instruction doesn’t honour the <code class="language-plaintext highlighter-rouge">.gitignore</code> file. This means that whenever you call <code class="language-plaintext highlighter-rouge">COPY .</code> with a wildcard argument, you could be leaking unwanted files inside your Docker image.</p>
<p>Fortunately you can add a <code class="language-plaintext highlighter-rouge">.dockerignore</code> file to your code base, which works pretty much the same way the <code class="language-plaintext highlighter-rouge">.gitignore</code> file does. In addition to copying over the contents of your <code class="language-plaintext highlighter-rouge">.gitignore</code> file, you might want to include the <code class="language-plaintext highlighter-rouge">.git/</code> directory as well to the list of files ignored by Docker.</p>
<h2 id="5-group-commands-by-how-likely-they-are-to-change-individually">5. Group commands by how likely they are to change individually</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="k">RUN </span>apt update
<span class="k">RUN </span>apt <span class="nb">install</span> <span class="nt">-y</span> mysql-client
<span class="k">RUN </span>apt <span class="nb">install</span> <span class="nt">-y</span> postgresql-client
<span class="k">RUN </span>apt <span class="nb">install</span> <span class="nt">-y</span> nginx
<span class="k">RUN </span>bundle <span class="nb">install</span>
<span class="k">CMD</span><span class="s"> ruby -e "puts 1 + 2"</span>
</code></pre></div></div>
<p>Good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="c"># We usually only need to run this once</span>
<span class="k">RUN </span>apt update <span class="o">&&</span> <span class="se">\
</span> apt <span class="nb">install</span> <span class="nt">-y</span> mysql-client postgresql-client nginx
<span class="c"># We usually run this every time we add a new dependency</span>
<span class="k">RUN </span>bundle <span class="nb">install</span>
<span class="k">CMD</span><span class="s"> ruby -e "puts 1 + 2"</span>
</code></pre></div></div>
<p>Less build steps means less intermediary images that Docker needs to keep in storage. On the other hand, you need to be careful not to group tasks together that don’t change at the same time - otherwise you might be running all tasks when only one requires changes, which results in a slower build process.</p>
<h2 id="6-place-the-least-likely-to-change-commands-at-the-top">6. Place the least likely to change commands at the top</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="c"># Source code</span>
<span class="k">COPY</span><span class="s"> my-code/ /srv/</span>
<span class="c"># Application dependencies</span>
<span class="k">COPY</span><span class="s"> Gemfile Gemfile.lock ./</span>
<span class="k">RUN </span>bundle <span class="nb">install</span>
<span class="c"># Operating system dependencies</span>
<span class="k">RUN </span>apt update <span class="o">&&</span> <span class="se">\
</span> apt <span class="nb">install</span> <span class="nt">-y</span> mysql-client postgresql-client nginx
</code></pre></div></div>
<p>Good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="c"># Operating system dependencies</span>
<span class="k">RUN </span>apt update <span class="o">&&</span> <span class="se">\
</span> apt <span class="nb">install</span> <span class="nt">-y</span> mysql-client postgresql-client nginx
<span class="c"># Application dependencies</span>
<span class="k">COPY</span><span class="s"> Gemfile Gemfile.lock ./</span>
<span class="k">RUN </span>bundle <span class="nb">install</span>
<span class="c"># Source code</span>
<span class="k">COPY</span><span class="s"> my-source-code /srv/</span>
</code></pre></div></div>
<p>Docker will rebuild all steps top to bottom, starting from the one where <em>changes</em> were detected. A <em>change</em> means usually that a line inside your <em>Dockerfile</em> changed with the exception of the <code class="language-plaintext highlighter-rouge">COPY</code> command, which also checks whether the files provided as argument were modified.</p>
<p>Placing the least likely to change commands at the top ensures an efficient usage of the Docker cache and results in shorter build times.</p>
<h2 id="7-avoid-running-your-application-as-root">7. Avoid running your application as root</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine</span>
<span class="k">RUN </span>gem <span class="nb">install </span>sinatra <span class="nt">-v</span> 2.0.5
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'require "sinatra"; run Sinatra::Application.run!'</span> <span class="o">></span> config.ru
<span class="c"># By default this is run as root</span>
<span class="k">CMD</span><span class="s"> rackup</span>
</code></pre></div></div>
<p>Good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine</span>
<span class="k">RUN </span>gem <span class="nb">install </span>sinatra <span class="nt">-v</span> 2.0.5
<span class="c"># Create a dedicated user for running the application</span>
<span class="k">RUN </span>adduser <span class="nt">-D</span> my-sinatra-user
<span class="c"># Set the user for RUN, CMD or ENTRYPOINT calls from now on</span>
<span class="c"># Note that this doesn't apply to COPY or ADD, which use a --chown argument instead</span>
<span class="k">USER</span><span class="s"> my-sinatra-user</span>
<span class="c"># Set the base directory that will be used from now on</span>
<span class="k">WORKDIR</span><span class="s"> /home/my-sinatra-user</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'require "sinatra"; run Sinatra::Application.run!'</span> <span class="o">></span> config.ru
<span class="c"># This is run under the my-sinatra-user user</span>
<span class="k">CMD</span><span class="s"> rackup</span>
</code></pre></div></div>
<p>Running your application as root introduces an <strong>additional vector of attack</strong>. If an attacker gains remote code execution through an application vulnerability, there are ways to escape the container environment. Much more damage can be inflicted by the root user than when running your application as an unprivileged user.</p>
<h2 id="8-when-running-copy-or-add-as-a-different-user-use-chown">8. When running COPY or ADD (as a different user) use –chown</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine</span>
<span class="k">RUN </span>adduser <span class="nt">-D</span> my-sinatra-user
<span class="k">USER</span><span class="s"> my-sinatra-user</span>
<span class="k">WORKDIR</span><span class="s"> /home/my-sinatra-user</span>
<span class="c"># The files will be owned by the root user!</span>
<span class="k">COPY</span><span class="s"> Gemfile Gemfile.lock ./</span>
<span class="k">RUN </span>bundle <span class="nb">install</span>
<span class="k">CMD</span><span class="s"> rackup</span>
</code></pre></div></div>
<p>Good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine</span>
<span class="k">RUN </span>adduser <span class="nt">-D</span> my-sinatra-user
<span class="k">USER</span><span class="s"> my-sinatra-user</span>
<span class="k">WORKDIR</span><span class="s"> /home/my-sinatra-user</span>
<span class="c"># The files will be owned by my-sinatra-user</span>
<span class="k">COPY</span><span class="s"> --chown=my-sinatra-user Gemfile Gemfile.lock ./</span>
<span class="k">RUN </span>bundle <span class="nb">install</span>
<span class="k">CMD</span><span class="s"> rackup</span>
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">USER</code> directive described in the previous step only applies to <code class="language-plaintext highlighter-rouge">RUN</code>, <code class="language-plaintext highlighter-rouge">CMD</code> or <code class="language-plaintext highlighter-rouge">ENTRYPOINT</code>. For <code class="language-plaintext highlighter-rouge">COPY</code> and <code class="language-plaintext highlighter-rouge">ADD</code> you have to use the <code class="language-plaintext highlighter-rouge">--chown</code> argument.</p>
<h2 id="9-avoid-leaking-secrets-inside-your-image">9. Avoid leaking secrets inside your image</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="k">ENV</span><span class="s"> DB_PASSWORD "secret stuff"</span>
</code></pre></div></div>
<p>Secrets should never appear inside your <em>Dockerfile</em> in plain text. Instead, they should be injected via:</p>
<ul>
<li>Build-time arguments: the <code class="language-plaintext highlighter-rouge">ARG</code> command and <code class="language-plaintext highlighter-rouge">--build-arg</code> Docker argument.</li>
<li>Environment variables: the <code class="language-plaintext highlighter-rouge">-e</code> or <code class="language-plaintext highlighter-rouge">--env-file</code> Docker arguments.</li>
<li>Kubernetes secrets or similar methods.</li>
</ul>
<p><strong>Note:</strong> Whenever you use one of these <code class="language-plaintext highlighter-rouge">docker build</code> arguments, be it <code class="language-plaintext highlighter-rouge">--build-arg</code> or <code class="language-plaintext highlighter-rouge">-e</code>, the full command (including the secret values) will show up in your <code class="language-plaintext highlighter-rouge">docker history</code>. Depending on the environment where the build happens, you might want to avoid this. A solution to this problem is detailed in step 14.</p>
<h2 id="10-always-clean-up-injected-secrets-within-the-same-build-step">10. Always clean up injected secrets within the same build step</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="k">ARG</span><span class="s"> PRIVATE_SSH_KEY</span>
<span class="c"># This build step will retain the private SSH key</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s2">"</span><span class="k">${</span><span class="nv">PRIVATE_SSH_KEY</span><span class="k">}</span><span class="s2">"</span> <span class="o">></span> /root/.ssh/id_rsa
<span class="c"># This build step will retain the private SSH key</span>
<span class="k">RUN </span>bundle <span class="nb">install</span>
<span class="k">RUN </span><span class="nb">rm</span> /root/.ssh/id_rsa
</code></pre></div></div>
<p>Good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="k">ARG</span><span class="s"> PRIVATE_SSH_KEY</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s2">"</span><span class="k">${</span><span class="nv">PRIVATE_SSH_KEY</span><span class="k">}</span><span class="s2">"</span> <span class="o">></span> /root/.ssh/id_rsa <span class="o">&&</span> <span class="se">\
</span> bundle <span class="nb">install</span> <span class="o">&&</span> <span class="se">\
</span> <span class="nb">rm</span> /root/.ssh/id_rsa
</code></pre></div></div>
<p>The first example produces two build steps that retain the injected secret. If anyone has access to your build history, they will be able to retrieve your secret. The suggested solution groups together the actions that inject and require the secret with the one that cleans it up. This produces a clean build history.</p>
<h2 id="11-fetching-private-dependencies-via-a-github-token-injected-through-the-gitconfig">11. Fetching private dependencies via a Github token injected through the gitconfig</h2>
<p>Quite often your application will require private dependencies, usually hosted inside private repositories, be it Ruby gems or NPM packages.</p>
<p>There are various ways to fetch them during the build process but as long as you are using Github, I recommend the following:</p>
<ol>
<li>Set up a machine user on Github.</li>
<li>Allow your machine user read-only access to your private dependencies.</li>
<li>Generate <a href="https://github.com/settings/tokens">a personal Github access token</a> for this user.</li>
<li>Use the Github token to pull dependencies during <code class="language-plaintext highlighter-rouge">bundle install</code> or <code class="language-plaintext highlighter-rouge">npm install</code>.</li>
<li>Clean up the Github token from the build.</li>
</ol>
<p>By default, your <code class="language-plaintext highlighter-rouge">Gemfile</code> or <code class="language-plaintext highlighter-rouge">package.json</code> files probably use the SSH protocol because it’s the most convenient one for development mode. If your private dependencies are referenced via <code class="language-plaintext highlighter-rouge">git@github.com</code> then this is the case.</p>
<p>Once you produced a working Github token, we can use a <code class="language-plaintext highlighter-rouge">.gitconfig</code> URL rewrite to tell Git to authenticate via your Github token instead of the default SSH protocol (which we still want in development). This is accomplished via the <code class="language-plaintext highlighter-rouge">insteadOf</code> Git option, which basically rewrites the repository URL to inject the token.</p>
<p>After we’ve successfully installed dependencies, it’s important to remove the <code class="language-plaintext highlighter-rouge">.gitconfig</code> file <em>within the same step</em>, to avoid leaking the Github token inside the built image.</p>
<p>Last but not least, we’ll be injecting the Github token into the build process via a build argument.</p>
<p>Let’s put it all together:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="k">ARG</span><span class="s"> GITHUB_TOKEN</span>
<span class="c"># This is a private gem that GITHUB_TOKEN has access to</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'source "https://rubygems.org"; gem "some-private-gem", git: "git@github.com:some-user/some-private-gem"'</span> <span class="o">></span> Gemfile
<span class="c"># First rewrite is for Gemfile, second is for package.json</span>
<span class="c"># This should cover most other package managers as well</span>
<span class="c"># Note the usage of --add (which avoids overwriting the option)</span>
<span class="c"># Also note that we're cleaning up the .gitconfig file within the same build step</span>
<span class="k">RUN </span>git config <span class="nt">--global</span> url.<span class="s2">"https://</span><span class="k">${</span><span class="nv">GITHUB_TOKEN</span><span class="k">}</span><span class="s2">:x-oauth-basic@github.com/some-user"</span>.insteadOf git@github.com:some-user <span class="o">&&</span> <span class="se">\
</span> git config <span class="nt">--global</span> <span class="nt">--add</span> url.<span class="s2">"https://</span><span class="k">${</span><span class="nv">GITHUB_TOKEN</span><span class="k">}</span><span class="s2">:x-oauth-basic@github.com/some-user"</span>.insteadOf ssh://git@github <span class="o">&&</span> <span class="se">\
</span> bundle <span class="nb">install</span> <span class="o">&&</span> <span class="se">\
</span> <span class="nb">rm</span> ~/.gitconfig
</code></pre></div></div>
<p>You can build this image by calling: <code class="language-plaintext highlighter-rouge">docker build --build-arg GITHUB_TOKEN=xxx .</code></p>
<blockquote>
<p>Another way of achieving the same is by using SSH keys. It saves you the hassle of rewriting the <code class="language-plaintext highlighter-rouge">Gemfile</code> URLs but you still need to clean up the private key at the end.</p>
</blockquote>
<h2 id="12-minimize-image-size-by-opting-for-small-base-images-when-possible">12. Minimize image size by opting for small base images when possible</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="k">CMD</span><span class="s"> ruby -e "puts 1 + 2"</span>
</code></pre></div></div>
<p>Good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine</span>
<span class="k">CMD</span><span class="s"> ruby -e "puts 1 + 2"</span>
</code></pre></div></div>
<p>What’s the difference?</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> docker images <span class="nt">-a</span> | <span class="nb">grep </span>base-image
normal-base-image 869MB
small-base-image 45.3MB
</code></pre></div></div>
<p>Some base images are larger than others. A small base image produces a small final image which ultimately speeds up your deployments and saves you the additional storage costs.</p>
<p><a href="https://alpinelinux.org/">Alpine Linux</a> is usally a safe bet when looking for small-footprint operating systems.</p>
<p>When opting for a compact operating system, pay special attention to:</p>
<ul>
<li>The <strong>package manager</strong> and the available packages. Building things from source is tedious, use the package manager to your benefit.</li>
<li>The <strong>choice of shell</strong> - some base images might not even provide a shell!</li>
<li>The <strong>security</strong> implications and the <strong>stability</strong> of the underlying operating systems: avoid environments that are experimental or not battle-tested.</li>
</ul>
<h2 id="13-use-multi-stage-builds-to-reduce-the-size-of-your-image">13. Use multi-stage builds to reduce the size of your image</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine</span>
<span class="c"># Nokogiri's build dependencies</span>
<span class="k">RUN </span>apk add <span class="nt">--update</span> <span class="se">\
</span> build-base <span class="se">\
</span> libxml2-dev <span class="se">\
</span> libxslt-dev
<span class="c"># Nokogiri, yikes</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'source "https://rubygems.org"; gem "nokogiri"'</span> <span class="o">></span> Gemfile
<span class="k">RUN </span>bundle <span class="nb">install</span>
<span class="k">CMD</span><span class="s"> /bin/sh</span>
</code></pre></div></div>
<p>Good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># The "builder" image will build nokogiri</span>
<span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine AS builder</span>
<span class="c"># Nokogiri's build dependencies</span>
<span class="k">RUN </span>apk add <span class="nt">--update</span> <span class="se">\
</span> build-base <span class="se">\
</span> libxml2-dev <span class="se">\
</span> libxslt-dev
<span class="c"># Nokogiri, yikes</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'source "https://rubygems.org"; gem "nokogiri"'</span> <span class="o">></span> Gemfile
<span class="k">RUN </span>bundle <span class="nb">install</span>
<span class="c"># The final image: we start clean</span>
<span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine</span>
<span class="c"># We copy over the entire gems directory for our builder image, containing the already built artifact</span>
<span class="k">COPY</span><span class="s"> --from=builder /usr/local/bundle/ /usr/local/bundle/</span>
<span class="k">CMD</span><span class="s"> /bin/sh</span>
</code></pre></div></div>
<p>Multi-stage builds are builds where the final image is composed of parts of different builds, which can potentially be based on completely different base images. You can use multi-stage builds to significantly reduce the size of your final images. Smaller artifacts ultimately means faster deployments and rollbacks.</p>
<p>In the current example we require the infamous <em>nokogiri</em> gem. In order to install this gem you usually need some relatively heavy OS dependencies (<em>libxml</em> and <em>libxslt</em>), though they are only useful at build time. The gem also needs to be built natively, which might produce additional build time garbage.</p>
<p>So what’s the difference?</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> docker images | <span class="nb">grep </span>nokogiri
nokogiri-simple 251MB
nokogiri-multi 70.1MB
</code></pre></div></div>
<p>As you can see, the difference can be quite signficant…</p>
<h2 id="14-use-multi-stage-builds-to-avoid-leaking-secrets-inside-your-docker-history">14. Use multi-stage builds to avoid leaking secrets inside your docker history</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine</span>
<span class="c"># This is a secret</span>
<span class="k">ARG</span><span class="s"> PRIVATE_SSH_KEY</span>
<span class="c"># Just a basic Gemfile to make bundle install happy</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'source "https://rubygems.org"; gem "sinatra"'</span> <span class="o">></span> Gemfile
<span class="c"># We require the secret for installing dependencies</span>
<span class="k">RUN </span><span class="nb">mkdir</span> <span class="nt">-p</span> /root/.ssh/ <span class="o">&&</span> <span class="se">\
</span> <span class="nb">echo</span> <span class="s2">"</span><span class="k">${</span><span class="nv">PRIVATE_SSH_KEY</span><span class="k">}</span><span class="s2">"</span> <span class="o">></span> /root/.ssh/id_rsa <span class="o">&&</span> <span class="se">\
</span> bundle <span class="nb">install</span> <span class="o">&&</span> <span class="se">\
</span> <span class="nb">rm</span> /root/.ssh/id_rsa
<span class="k">CMD</span><span class="s"> ruby -e "puts 1 + 2"</span>
</code></pre></div></div>
<p>Good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine AS builder</span>
<span class="c"># This is a secret</span>
<span class="k">ARG</span><span class="s"> PRIVATE_SSH_KEY</span>
<span class="c"># Just a basic Gemfile to make bundle install happy</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'source "https://rubygems.org"; gem "sinatra"'</span> <span class="o">></span> Gemfile
<span class="c"># We require the secret for installing dependencies</span>
<span class="k">RUN </span><span class="nb">mkdir</span> <span class="nt">-p</span> /root/.ssh/ <span class="o">&&</span> <span class="se">\
</span> <span class="nb">echo</span> <span class="s2">"</span><span class="k">${</span><span class="nv">PRIVATE_SSH_KEY</span><span class="k">}</span><span class="s2">"</span> <span class="o">></span> /root/.ssh/id_rsa <span class="o">&&</span> <span class="se">\
</span> bundle <span class="nb">install</span> <span class="o">&&</span> <span class="se">\
</span> <span class="nb">rm</span> /root/.ssh/id_rsa
<span class="c"># The final image doesn't need the secret</span>
<span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine</span>
<span class="k">COPY</span><span class="s"> --from=builder /usr/local/bundle/ /usr/local/bundle/</span>
<span class="k">CMD</span><span class="s"> ruby -e "puts 1 + 2"</span>
</code></pre></div></div>
<p>You can build both examples by passing the <code class="language-plaintext highlighter-rouge">PRIVATE_SSH_KEY</code> as a build argument:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker build <span class="nt">-t</span> my-fancy-image <span class="nt">--build-arg</span> <span class="nv">PRIVATE_SSH_KEY</span><span class="o">=</span>xxx <span class="nb">.</span>
</code></pre></div></div>
<p>As explained in step 9, you can use <em>Docker build arguments</em> to avoid leaking secrets inside your image. By default, however, this will still leak the secret inside your <em>Docker history</em>. Depending on your build environment, you might want to avoid this.</p>
<p>Let’s see what this means for our bad example:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Lines 3-4 contain your secret PRIVATE_SSH_KEY in clear text</span>
<span class="o">></span> docker <span class="nb">history </span>my-fancy-image
IMAGE CREATED CREATED BY SIZE COMMENT
67e60c0853ab 19 seconds ago /bin/sh <span class="nt">-c</span> <span class="c">#(nop) CMD ["/bin/sh" "-c" "ruby… 0B</span>
94dd778c4b5d 20 seconds ago |1 <span class="nv">PRIVATE_SSH_KEY</span><span class="o">=</span>xxx /bin/sh <span class="nt">-c</span> <span class="nb">mkdir</span> <span class="nt">-p</span> /… 30.9MB
32a993af7bfb About a minute ago |1 <span class="nv">PRIVATE_SSH_KEY</span><span class="o">=</span>xxx /bin/sh <span class="nt">-c</span> <span class="nb">echo</span> <span class="s1">'sour… 45B
2be964ad91c7 About a minute ago /bin/sh -c #(nop) ARG PRIVATE_SSH_KEY 0B
44723f3ab2bd 4 months ago /bin/sh -c #(nop) CMD ["irb"] 0B
<missing> 4 months ago /bin/sh -c mkdir -p "$GEM_HOME" && chmod 777… 0B
<missing> 4 months ago /bin/sh -c #(nop) ENV PATH=/usr/local/bundl… 0B
<missing> 4 months ago /bin/sh -c #(nop) ENV BUNDLE_PATH=/usr/loca… 0B
<missing> 4 months ago /bin/sh -c #(nop) ENV GEM_HOME=/usr/local/b… 0B
<missing> 4 months ago /bin/sh -c set -ex && apk add --no-cache -… 45.5MB
<missing> 4 months ago /bin/sh -c #(nop) ENV RUBYGEMS_VERSION=3.0.3 0B
<missing> 4 months ago /bin/sh -c #(nop) ENV RUBY_DOWNLOAD_SHA256=… 0B
<missing> 4 months ago /bin/sh -c #(nop) ENV RUBY_VERSION=2.5.5 0B
<missing> 4 months ago /bin/sh -c #(nop) ENV RUBY_MAJOR=2.5 0B
<missing> 4 months ago /bin/sh -c mkdir -p /usr/local/etc && { e… 45B
<missing> 4 months ago /bin/sh -c apk add --no-cache gmp-dev 3.4MB
<missing> 4 months ago /bin/sh -c #(nop) CMD ["/bin/sh"] 0B
<missing> 4 months ago /bin/sh -c #(nop) ADD file:a86aea1f3a7d68f6a… 5.53MB
</span></code></pre></div></div>
<p>Now let’s see what happens with the multi-stage build:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> docker <span class="nb">history </span>my-fancy-image
IMAGE CREATED CREATED BY SIZE COMMENT
2706a2f47816 8 seconds ago /bin/sh <span class="nt">-c</span> <span class="c">#(nop) CMD ["/bin/sh" "-c" "ruby… 0B</span>
86509dba3bd9 9 seconds ago /bin/sh <span class="nt">-c</span> <span class="c">#(nop) COPY dir:e110956912ddb292a… 3.16MB</span>
44723f3ab2bd 4 months ago /bin/sh <span class="nt">-c</span> <span class="c">#(nop) CMD ["irb"] 0B</span>
<missing> 4 months ago /bin/sh <span class="nt">-c</span> <span class="nb">mkdir</span> <span class="nt">-p</span> <span class="s2">"</span><span class="nv">$GEM_HOME</span><span class="s2">"</span> <span class="o">&&</span> <span class="nb">chmod </span>777… 0B
<missing> 4 months ago /bin/sh <span class="nt">-c</span> <span class="c">#(nop) ENV PATH=/usr/local/bundl… 0B</span>
<missing> 4 months ago /bin/sh <span class="nt">-c</span> <span class="c">#(nop) ENV BUNDLE_PATH=/usr/loca… 0B</span>
<missing> 4 months ago /bin/sh <span class="nt">-c</span> <span class="c">#(nop) ENV GEM_HOME=/usr/local/b… 0B</span>
<missing> 4 months ago /bin/sh <span class="nt">-c</span> <span class="nb">set</span> <span class="nt">-ex</span> <span class="o">&&</span> apk add <span class="nt">--no-cache</span> -… 45.5MB
<missing> 4 months ago /bin/sh <span class="nt">-c</span> <span class="c">#(nop) ENV RUBYGEMS_VERSION=3.0.3 0B</span>
<missing> 4 months ago /bin/sh <span class="nt">-c</span> <span class="c">#(nop) ENV RUBY_DOWNLOAD_SHA256=… 0B</span>
<missing> 4 months ago /bin/sh <span class="nt">-c</span> <span class="c">#(nop) ENV RUBY_VERSION=2.5.5 0B</span>
<missing> 4 months ago /bin/sh <span class="nt">-c</span> <span class="c">#(nop) ENV RUBY_MAJOR=2.5 0B</span>
<missing> 4 months ago /bin/sh <span class="nt">-c</span> <span class="nb">mkdir</span> <span class="nt">-p</span> /usr/local/etc <span class="o">&&</span> <span class="o">{</span> e… 45B
<missing> 4 months ago /bin/sh <span class="nt">-c</span> apk add <span class="nt">--no-cache</span> gmp-dev 3.4MB
<missing> 4 months ago /bin/sh <span class="nt">-c</span> <span class="c">#(nop) CMD ["/bin/sh"] 0B</span>
<missing> 4 months ago /bin/sh <span class="nt">-c</span> <span class="c">#(nop) ADD file:a86aea1f3a7d68f6a… 5.53MB</span>
</code></pre></div></div>
<p>The multi-stage build only retains the history of the final image. The builder history is ignored and thus your secret is safe.</p>
<h2 id="15-when-setting-the-cmd-instruction-prefer-the-exec-format-over-the-shell-format">15. When setting the CMD instruction, prefer the exec format over the shell format</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'source "https://rubygems.org"; gem "sinatra"'</span> <span class="o">></span> Gemfile
<span class="k">RUN </span>bundle <span class="nb">install</span>
<span class="c"># A simple Sinatra app which prints out HUUUUUP when the process receives the HUP signal.</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'require "sinatra"; set bind: "0.0.0.0"; Signal.trap("HUP") { puts "HUUUUUP" }; run Sinatra::Application.run!'</span> <span class="o">></span> config.ru
<span class="k">CMD</span><span class="s"> bundle exec rackup</span>
</code></pre></div></div>
<p>Good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'source "https://rubygems.org"; gem "sinatra"'</span> <span class="o">></span> Gemfile
<span class="k">RUN </span>bundle <span class="nb">install</span>
<span class="c"># A simple Sinatra app which prints out HUUUUUP when the process receives the HUP signal.</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'require "sinatra"; set bind: "0.0.0.0"; Signal.trap("HUP") { puts "HUUUUUP" }; run Sinatra::Application.run!'</span> <span class="o">></span> config.ru
<span class="k">CMD</span><span class="s"> ["bundle", "exec", "rackup"]</span>
</code></pre></div></div>
<p>There are two ways of using <code class="language-plaintext highlighter-rouge">CMD</code>, as explained <a href="https://docs.docker.com/engine/reference/builder/#cmd">here</a>:</p>
<ul>
<li>The <em>shell format</em>, which invokes the command from within a shell - e.g. <code class="language-plaintext highlighter-rouge">CMD bundle exec rackup</code></li>
<li>The <em>exec format</em>, which doesn’t invoke a command shell and takes the form of a JSON array - e.g. <code class="language-plaintext highlighter-rouge">CMD ["bundle", "exec", "rackup"]</code></li>
</ul>
<p>The recommended way of using <code class="language-plaintext highlighter-rouge">CMD</code> is in <em>exec format</em>. This ensures your process will run as PID 1 which in turn ensures that any received signals will also be handled properly.</p>
<p>Let’s see how the process tree looks like for the container run in <strong>shell format</strong> (the bad example):</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> docker <span class="nb">exec</span> <span class="si">$(</span>docker ps <span class="nt">-q</span><span class="si">)</span> ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.7 0.0 2388 756 pts/0 Ss+ 14:36 0:00 /bin/sh <span class="nt">-c</span> bundle <span class="nb">exec </span>rackup
root 6 0.8 0.2 43948 25556 pts/0 Sl+ 14:36 0:00 /usr/local/bundle/bin/rackup
root 13 0.0 0.0 7640 2588 ? Rs 14:37 0:00 ps aux
</code></pre></div></div>
<p>Our <code class="language-plaintext highlighter-rouge">bundle exec rackup</code> command is wrapped inside a <code class="language-plaintext highlighter-rouge">/bin/sh</code> call. The actual <code class="language-plaintext highlighter-rouge">rackup</code> call is not the PID 1 process. Sending a HUP signal to our container <strong>will not get propagated</strong> to the actual <code class="language-plaintext highlighter-rouge">rackup</code> process and will not print out <code class="language-plaintext highlighter-rouge">HUUUUUP</code>.</p>
<p>Now let’s see how the process tree looks like for the container run in <strong>exec format</strong> (the good example):</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">></span> docker <span class="nb">exec</span> <span class="si">$(</span>docker ps <span class="nt">-q</span><span class="si">)</span> ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 29.0 0.2 43988 25632 pts/0 Ssl+ 14:47 0:00 /usr/local/bundle/bin/rackup
root 8 0.0 0.0 7640 2668 ? Rs 14:47 0:00 ps aux
</code></pre></div></div>
<p>As you can see, the <code class="language-plaintext highlighter-rouge">rackup</code> process is not wrapped inside <code class="language-plaintext highlighter-rouge">/bin/sh</code> and is running as PID 1. Sending a HUP signal to our container will correctly print out <code class="language-plaintext highlighter-rouge">HUUUUUP</code>.</p>
<p>Why is this important? Some applications implement signals in order to exit gracefully or clean up resources. In the case of web servers, this usually means releasing connections from the database connection pool or finishing requests. If you want any of these you should care about signals.</p>
<p>Thanks to <a href="https://twitter.com/_y3ti">Kamil Grabowski</a> for pointing this out on Twitter.</p>
<h2 id="16-avoid-installing-development-or-test-dependencies-in-your-production-builds">16. Avoid installing development or test dependencies in your production builds</h2>
<p>Bad:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'source "https://rubygems.org"; gem "sinatra"'</span> <span class="o">></span> Gemfile
<span class="k">RUN </span>bundle <span class="nb">install</span>
</code></pre></div></div>
<p>Good:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'source "https://rubygems.org"; gem "sinatra"'</span> <span class="o">></span> Gemfile
<span class="k">RUN </span>bundle config <span class="nb">set </span>without <span class="s1">'development test'</span>
<span class="k">RUN </span>bundle <span class="nb">install</span>
</code></pre></div></div>
<p>By default, calling <code class="language-plaintext highlighter-rouge">bundle install</code> or <code class="language-plaintext highlighter-rouge">yarn install</code> will include development dependencies. Since these are usually not required in a typical production environment, excluding development and test dependencies from your production <em>Dockerfile</em> will speed up your builds and reduce the size of your images.</p>
<p>While we’re at it, when installing gems I recommend setting the <code class="language-plaintext highlighter-rouge">--jobs</code> and <code class="language-plaintext highlighter-rouge">--retry</code> arguments, as it will speed up the process and make it more resilient to network issues:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>bundle <span class="nb">install</span> <span class="nt">--jobs</span><span class="o">=</span>3 <span class="nt">--retry</span><span class="o">=</span>3
</code></pre></div></div>
<h2 id="17-optional-combine-production-test-and-development-build-processes-into-a-single-dockerfile-by-using-multi-stage-builds">17. Optional: Combine production, test and development build processes into a single Dockerfile by using multi-stage builds</h2>
<p>In many cases, your test, development and production build processes might slightly differ from each other. If you’re using Docker across all these environments, a common approach is to introduce one <code class="language-plaintext highlighter-rouge">Dockerfile</code> per environment. Keeping these files in sync can gradually become redundant, tedious or just easy to forget.</p>
<p>A different approach to having multiple Dockerfiles is using multi-stage builds to map all your environments or build processes within the same Dockerfile. Here’s an example:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine AS builder</span>
<span class="c"># A Gemfile that contains a test dependency (minitest)</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'source "https://rubygems.org"; gem "sinatra"; group(:test) { gem "minitest" }'</span> <span class="o">></span> Gemfile
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'require "sinatra"; run Sinatra::Application.run!'</span> <span class="o">></span> config.ru
<span class="c"># By default we don't install development or test dependencies</span>
<span class="k">RUN </span>bundle <span class="nb">install</span> <span class="nt">--without</span> development <span class="nb">test</span>
<span class="c"># A separate build stage installs test dependencies and runs your tests</span>
<span class="k">FROM</span><span class="s"> builder AS test</span>
<span class="c"># The test stage installs the test dependencies</span>
<span class="k">RUN </span>bundle <span class="nb">install</span> <span class="nt">--with</span> <span class="nb">test</span>
<span class="c"># Let's introduce a test that passes</span>
<span class="k">RUN </span><span class="nb">echo</span> <span class="s1">'require "minitest/spec"; require "minitest/autorun"; class TestIndex < MiniTest::Test; def test_it_passes; assert(true); end; end'</span> <span class="o">></span> test.rb
<span class="c"># The actual test run</span>
<span class="k">RUN </span>bundle <span class="nb">exec </span>ruby test.rb
<span class="c"># The production artifact doesn't contain any test dependencies</span>
<span class="k">FROM</span><span class="s"> ruby:2.5.5-alpine</span>
<span class="k">COPY</span><span class="s"> --from=builder /usr/local/bundle/ /usr/local/bundle/</span>
<span class="k">COPY</span><span class="s"> --from=builder /config.ru ./</span>
<span class="k">CMD</span><span class="s"> ["rackup"]</span>
</code></pre></div></div>
<p>You can build the production artifact (without the test dependencies) by calling:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">DOCKER_BUILDKIT</span><span class="o">=</span>1 docker build <span class="nb">.</span>
</code></pre></div></div>
<p>You can run your tests by explicitly asking for the <em>test</em> stage:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">DOCKER_BUILDKIT</span><span class="o">=</span>1 docker build <span class="nt">--target</span><span class="o">=</span><span class="nb">test</span> <span class="nb">.</span>
</code></pre></div></div>
<p>Notice the usage of the <a href="https://docs.docker.com/develop/develop-images/build_enhancements/">BuildKit</a> feature flag. Prior to Docker 18.09 or without adding the <code class="language-plaintext highlighter-rouge">DOCKER_BUILDKIT=1</code> flag, a full build would still build all stages, including the test stage. The final artifact would still contain only the production dependencies but the build would take a little bit longer.</p>
<h2 id="18-bonus-running-migrations">18. Bonus: Running migrations</h2>
<p>There are various ways to run migrations in Docker, but the simplest one is by creating a <em>start script</em> for your application:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">#!/bin/sh</span>
<span class="nb">set</span> <span class="nt">-e</span>
bundle <span class="nb">exec </span>rake db:migrate
bundle <span class="nb">exec </span>rackup
</code></pre></div></div>
<p>I usually save this as <code class="language-plaintext highlighter-rouge">bin/start</code> and use it as my <code class="language-plaintext highlighter-rouge">CMD</code>:</p>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CMD</span><span class="s"> ["bin/start"]</span>
</code></pre></div></div>
<p>Note that Rails prevents running migrations in parallel from different processes. Deploying two or more containers in parallel might cause all but one of these deployments to fail. Then again, if you deploy containers in parallel, you’re most likely using an automated solution (Kubernetes / Nomad / Docker Swarm), at which point your containers should get resurected and should eventually bypass the Rails migration lock.</p>
<h2 id="putting-it-all-together">Putting it all together…</h2>
<p>Enough with the theory! Let’s apply these best practices. No app is the same, so I’ll provide you with <em>Dockerfiles</em> for three different use cases, in order of their complexity.</p>
<p>We’ll start with the <code class="language-plaintext highlighter-rouge">.dockerignore</code> file, which is shared by all examples. The easiest way to produce a <code class="language-plaintext highlighter-rouge">.dockerignore</code> file is by mirroring your <code class="language-plaintext highlighter-rouge">.gitignore</code> file, then adding the <code class="language-plaintext highlighter-rouge">.git/</code> directory to the list:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Start from your .gitignore file</span>
<span class="nb">cp</span> .gitignore .dockerignore
<span class="c"># Exclude the .git/ directory from being copied over into your images</span>
<span class="nb">echo</span> <span class="s2">".git/"</span> <span class="o">>></span> .dockerignore
</code></pre></div></div>
<h3 id="dockerfile-for-a-plain-ruby-app-or-a-rails-app-without-assets">Dockerfile for a plain Ruby app or a Rails app without assets</h3>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Start from a small, trusted base image with the version pinned down</span>
<span class="k">FROM</span><span class="s"> ruby:2.7.1-alpine AS base</span>
<span class="c"># Install system dependencies required both at runtime and build time</span>
<span class="c"># The image uses Postgres but you can swap it with mariadb-dev (for MySQL) or sqlite-dev</span>
<span class="k">RUN </span>apk add <span class="nt">--update</span> <span class="se">\
</span> postgresql-dev <span class="se">\
</span> tzdata
<span class="c"># This stage will be responsible for installing gems</span>
<span class="k">FROM</span><span class="s"> base AS dependencies</span>
<span class="c"># Install system dependencies required to build some Ruby gems (pg)</span>
<span class="k">RUN </span>apk add <span class="nt">--update</span> build-base
<span class="k">COPY</span><span class="s"> Gemfile Gemfile.lock ./</span>
<span class="c"># Install gems (excluding development/test dependencies)</span>
<span class="k">RUN </span>bundle config <span class="nb">set </span>without <span class="s2">"development test"</span> <span class="o">&&</span> <span class="se">\
</span> bundle <span class="nb">install</span> <span class="nt">--jobs</span><span class="o">=</span>3 <span class="nt">--retry</span><span class="o">=</span>3
<span class="c"># We're back at the base stage</span>
<span class="k">FROM</span><span class="s"> base</span>
<span class="c"># Create a non-root user to run the app and own app-specific files</span>
<span class="k">RUN </span>adduser <span class="nt">-D</span> app
<span class="c"># Switch to this user</span>
<span class="k">USER</span><span class="s"> app</span>
<span class="c"># We'll install the app in this directory</span>
<span class="k">WORKDIR</span><span class="s"> /home/app</span>
<span class="c"># Copy over gems from the dependencies stage</span>
<span class="k">COPY</span><span class="s"> --from=dependencies /usr/local/bundle/ /usr/local/bundle/</span>
<span class="c"># Finally, copy over the code</span>
<span class="c"># This is where the .dockerignore file comes into play</span>
<span class="c"># Note that we have to use `--chown` here</span>
<span class="k">COPY</span><span class="s"> --chown=app . ./</span>
<span class="c"># Launch the server (or run some other Ruby command)</span>
<span class="k">CMD</span><span class="s"> ["bundle", "exec", "rackup"]</span>
</code></pre></div></div>
<p>You can build this by calling:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker build <span class="nt">-t</span> my-rails-app <span class="nb">.</span>
</code></pre></div></div>
<h3 id="dockerfile-for-a-rails-app-with-assets">Dockerfile for a Rails app with assets</h3>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Start from a small, trusted base image with the version pinned down</span>
<span class="k">FROM</span><span class="s"> ruby:2.7.1-alpine AS base</span>
<span class="c"># Install system dependencies required both at runtime and build time</span>
<span class="c"># The image uses Postgres but you can swap it with mariadb-dev (for MySQL) or sqlite-dev</span>
<span class="k">RUN </span>apk add <span class="nt">--update</span> <span class="se">\
</span> postgresql-dev <span class="se">\
</span> tzdata <span class="se">\
</span> nodejs <span class="se">\
</span> yarn
<span class="c"># This stage will be responsible for installing gems and npm packages</span>
<span class="k">FROM</span><span class="s"> base AS dependencies</span>
<span class="c"># Install system dependencies required to build some Ruby gems (pg)</span>
<span class="k">RUN </span>apk add <span class="nt">--update</span> build-base
<span class="k">COPY</span><span class="s"> Gemfile Gemfile.lock ./</span>
<span class="c"># Install gems (excluding development/test dependencies)</span>
<span class="k">RUN </span>bundle config <span class="nb">set </span>without <span class="s2">"development test"</span> <span class="o">&&</span> <span class="se">\
</span> bundle <span class="nb">install</span> <span class="nt">--jobs</span><span class="o">=</span>3 <span class="nt">--retry</span><span class="o">=</span>3
<span class="k">COPY</span><span class="s"> package.json yarn.lock ./</span>
<span class="c"># Install npm packages</span>
<span class="k">RUN </span>yarn <span class="nb">install</span> <span class="nt">--frozen-lockfile</span>
<span class="c"># We're back at the base stage</span>
<span class="k">FROM</span><span class="s"> base</span>
<span class="c"># Create a non-root user to run the app and own app-specific files</span>
<span class="k">RUN </span>adduser <span class="nt">-D</span> app
<span class="c"># Switch to this user</span>
<span class="k">USER</span><span class="s"> app</span>
<span class="c"># We'll install the app in this directory</span>
<span class="k">WORKDIR</span><span class="s"> /home/app</span>
<span class="c"># Copy over gems from the dependencies stage</span>
<span class="k">COPY</span><span class="s"> --from=dependencies /usr/local/bundle/ /usr/local/bundle/</span>
<span class="c"># Copy over npm packages from the dependencies stage</span>
<span class="c"># Note that we have to use `--chown` here</span>
<span class="k">COPY</span><span class="s"> --chown=app --from=dependencies /node_modules/ node_modules/</span>
<span class="c"># Finally, copy over the code</span>
<span class="c"># This is where the .dockerignore file comes into play</span>
<span class="c"># Note that we have to use `--chown` here</span>
<span class="k">COPY</span><span class="s"> --chown=app . ./</span>
<span class="c"># Install assets</span>
<span class="k">RUN </span><span class="nv">RAILS_ENV</span><span class="o">=</span>production <span class="nv">SECRET_KEY_BASE</span><span class="o">=</span>assets bundle <span class="nb">exec </span>rake assets:precompile
<span class="c"># Launch the server</span>
<span class="k">CMD</span><span class="s"> ["bundle", "exec", "rackup"]</span>
</code></pre></div></div>
<p>You can build this by calling:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker build <span class="nt">-t</span> my-rails-app <span class="nb">.</span>
</code></pre></div></div>
<h3 id="dockerfile-for-a-rails-app-with-assets-and-private-dependencies">Dockerfile for a Rails app with assets and private dependencies</h3>
<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Start from a small, trusted base image with the version pinned down</span>
<span class="k">FROM</span><span class="s"> ruby:2.7.1-alpine AS base</span>
<span class="c"># Install system dependencies required both at runtime and build time</span>
<span class="c"># The image uses Postgres but you can swap it with mariadb-dev (for MySQL) or sqlite-dev</span>
<span class="k">RUN </span>apk add <span class="nt">--update</span> <span class="se">\
</span> postgresql-dev <span class="se">\
</span> tzdata <span class="se">\
</span> nodejs <span class="se">\
</span> yarn
<span class="c"># This stage will be responsible for installing gems and npm packages</span>
<span class="k">FROM</span><span class="s"> base AS dependencies</span>
<span class="c"># The argument is required later, when installing private gems or npm packages</span>
<span class="k">ARG</span><span class="s"> GITHUB_TOKEN</span>
<span class="c"># Install system dependencies required to build some Ruby gems (pg)</span>
<span class="k">RUN </span>apk add <span class="nt">--update</span> build-base
<span class="k">COPY</span><span class="s"> Gemfile Gemfile.lock ./</span>
<span class="c"># Don't install development or test dependencies</span>
<span class="k">RUN </span>bundle config <span class="nb">set </span>without <span class="s2">"development test"</span>
<span class="c"># Install gems (including private ones)</span>
<span class="c"># This uses the GITHUB_TOKEN argument, which is also cleaned up in the same step</span>
<span class="k">RUN </span>git config <span class="nt">--global</span> url.<span class="s2">"https://</span><span class="k">${</span><span class="nv">GITHUB_TOKEN</span><span class="k">}</span><span class="s2">:x-oauth-basic@github.com/some-user"</span>.insteadOf git@github.com:some-user <span class="o">&&</span> <span class="se">\
</span> git config <span class="nt">--global</span> <span class="nt">--add</span> url.<span class="s2">"https://</span><span class="k">${</span><span class="nv">GITHUB_TOKEN</span><span class="k">}</span><span class="s2">:x-oauth-basic@github.com/some-user"</span>.insteadOf ssh://git@github <span class="o">&&</span> <span class="se">\
</span> bundle <span class="nb">install</span> <span class="nt">--jobs</span><span class="o">=</span>3 <span class="nt">--retry</span><span class="o">=</span>3 <span class="o">&&</span> <span class="se">\
</span> <span class="nb">rm</span> ~/.gitconfig
<span class="k">COPY</span><span class="s"> package.json yarn.lock ./</span>
<span class="c"># Install npm packages (including private ones)</span>
<span class="c"># This uses the GITHUB_TOKEN argument, which is also cleaned up in the same step</span>
<span class="k">RUN </span>git config <span class="nt">--global</span> url.<span class="s2">"https://</span><span class="k">${</span><span class="nv">GITHUB_TOKEN</span><span class="k">}</span><span class="s2">:x-oauth-basic@github.com/some-user"</span>.insteadOf git@github.com:some-user <span class="o">&&</span> <span class="se">\
</span> git config <span class="nt">--global</span> <span class="nt">--add</span> url.<span class="s2">"https://</span><span class="k">${</span><span class="nv">GITHUB_TOKEN</span><span class="k">}</span><span class="s2">:x-oauth-basic@github.com/some-user"</span>.insteadOf ssh://git@github <span class="o">&&</span> <span class="se">\
</span> yarn <span class="nb">install</span> <span class="nt">--frozen-lockfile</span> <span class="se">\
</span> <span class="nb">rm</span> ~/.gitconfig
<span class="c"># We're back at the base stage</span>
<span class="k">FROM</span><span class="s"> base</span>
<span class="c"># Create a non-root user to run the app and own app-specific files</span>
<span class="k">RUN </span>adduser <span class="nt">-D</span> app
<span class="c"># Switch to this user</span>
<span class="k">USER</span><span class="s"> app</span>
<span class="c"># We'll install the app in this directory</span>
<span class="k">WORKDIR</span><span class="s"> /home/app</span>
<span class="c"># Copy over gems from the dependencies stage</span>
<span class="k">COPY</span><span class="s"> --from=dependencies /usr/local/bundle/ /usr/local/bundle/</span>
<span class="c"># Copy over npm packages from the dependencies stage</span>
<span class="c"># Note that we have to use `--chown` here</span>
<span class="k">COPY</span><span class="s"> --chown=app --from=dependencies /node_modules/ node_modules/</span>
<span class="c"># Finally, copy over the code</span>
<span class="c"># This is where the .dockerignore file comes into play</span>
<span class="c"># Note that we have to use `--chown` here</span>
<span class="k">COPY</span><span class="s"> --chown=app . ./</span>
<span class="c"># Install assets</span>
<span class="k">RUN </span><span class="nv">RAILS_ENV</span><span class="o">=</span>production <span class="nv">SECRET_KEY_BASE</span><span class="o">=</span>assets bundle <span class="nb">exec </span>rake assets:precompile
<span class="c"># Launch the server</span>
<span class="k">CMD</span><span class="s"> ["bundle", "exec", "rackup"]</span>
</code></pre></div></div>
<p>You can build this by calling:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker build <span class="nt">--build-arg</span> <span class="nv">GITHUB_TOKEN</span><span class="o">=</span>xxx <span class="nt">-t</span> my-rails-app <span class="nb">.</span>
</code></pre></div></div>
<p>The code presented here can also be found at <a href="https://github.com/lipanski/ruby-dockerfile-example">https://github.com/lipanski/ruby-dockerfile-example</a>.</p>
<h2 class="no_toc" id="further-reading">Further reading</h2>
<ul>
<li><a href="https://pythonspeed.com/articles/dockerizing-python-is-hard/">https://pythonspeed.com/articles/dockerizing-python-is-hard/</a></li>
<li><a href="https://blog.docker.com/2019/07/intro-guide-to-dockerfile-best-practices/">https://blog.docker.com/2019/07/intro-guide-to-dockerfile-best-practices/</a></li>
<li><a href="https://vsupalov.com/build-docker-image-clone-private-repo-ssh-key/">https://vsupalov.com/build-docker-image-clone-private-repo-ssh-key/</a></li>
<li><a href="https://evilmartians.com/chronicles/ruby-on-whales-docker-for-ruby-rails-development">https://evilmartians.com/chronicles/ruby-on-whales-docker-for-ruby-rails-development</a></li>
<li><a href="https://pythonspeed.com/articles/docker-caching-model/">https://pythonspeed.com/articles/docker-caching-model/</a></li>
<li><a href="https://gmaslowski.com/docker-shell-vs-exec/">https://gmaslowski.com/docker-shell-vs-exec/</a></li>
<li><a href="https://medium.com/capital-one-tech/multi-stage-builds-and-dockerfile-b5866d9e2f84">https://medium.com/capital-one-tech/multi-stage-builds-and-dockerfile-b5866d9e2f84</a></li>
</ul>Best practices when writing a Dockerfile for a Ruby applicationOne Ruby file to rule them all: inline gems and inline ActiveRecord migrations2019-02-20T00:00:00+00:002019-02-20T00:00:00+00:00https://lipanski.com/posts/one-ruby-file-to-rule-them-all<h1 id="one-ruby-file-to-rule-them-all-inline-gems-and-inline-activerecord-migrations">One Ruby file to rule them all: inline gems and inline ActiveRecord migrations</h1>
<p>Ruby is my go-to language for scripting. It’s simple, concise and delivers the expected results without much hassle. Though definitely not a good idea for large applications, having everything in one file can be pretty neat when using Ruby for scripting. It makes sharing, installing and running your Ruby code easier.</p>
<p>This post focuses on ways to build a self-contained single-file Ruby web app that uses a database and performs migrations at run time. This is not really what you’d call <em>scripting</em>, but the techniques are pretty much the same. Some of them should be taken with a grain of salt, but they are at least interesting.</p>
<h2 id="inline-gems">Inline gems</h2>
<p>A while back, Bundler <a href="https://bundler.io/v2.0/guides/bundler_in_a_single_file_ruby_script.html">introduced</a> the possibility to declare gems from within Ruby files. Running such a file would automatically install any missing gems.</p>
<p>Here’s how it works:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env ruby</span>
<span class="nb">require</span> <span class="s2">"bundler/inline"</span>
<span class="n">gemfile</span> <span class="k">do</span>
<span class="n">source</span> <span class="s2">"https://rubygems.org"</span>
<span class="n">gem</span> <span class="s2">"sinatra"</span><span class="p">,</span> <span class="s2">"2.0.5"</span>
<span class="k">end</span>
<span class="nb">puts</span> <span class="s2">"Sinatra was installed!"</span><span class="p">,</span> <span class="no">Sinatra</span><span class="o">::</span><span class="no">VERSION</span>
</code></pre></div></div>
<p>You can either make this script executable or call it via <code class="language-plaintext highlighter-rouge">ruby my-script.rb</code>. In any case, it will make sure to install <strong>and require</strong> the listed gems (in this case <code class="language-plaintext highlighter-rouge">sinatra</code>), before running the rest of your Ruby code.</p>
<p>Note that there is no concept of <code class="language-plaintext highlighter-rouge">Gemfile.lock</code> in the world of inline Bundler, so a good practice is being specific about the gem versions you want installed.</p>
<h2 id="inline-activerecord-migrations">Inline ActiveRecord migrations</h2>
<p>Let’s say your script or web app requires a database. ActiveRecord migrations can also be inlined, though it’s probably not very common. Here’s how:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Define a couple of migrations as part of the same file</span>
<span class="k">class</span> <span class="nc">CreateEventTableMigration</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migration</span><span class="p">[</span><span class="mf">5.2</span><span class="p">]</span>
<span class="c1"># Add the magic sauce</span>
<span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">version</span>
<span class="mi">1</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">change</span>
<span class="n">create_table</span> <span class="ss">:events</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
<span class="n">t</span><span class="p">.</span><span class="nf">string</span> <span class="ss">:name</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">class</span> <span class="nc">AddEventCreatedMigration</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migration</span><span class="p">[</span><span class="mf">5.2</span><span class="p">]</span>
<span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">version</span>
<span class="mi">2</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">change</span>
<span class="n">change_table</span> <span class="ss">:events</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
<span class="n">t</span><span class="p">.</span><span class="nf">datetime</span> <span class="ss">:created_at</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="c1"># Perform migrations</span>
<span class="n">migrations</span> <span class="o">=</span> <span class="p">[</span><span class="no">CreateEventTableMigration</span><span class="p">,</span> <span class="no">AddEventCreatedMigration</span><span class="p">]</span>
<span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migrator</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:up</span><span class="p">,</span> <span class="n">migrations</span><span class="p">).</span><span class="nf">migrate</span>
<span class="c1"># Define your model(s)</span>
<span class="k">class</span> <span class="nc">Event</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span><span class="p">;</span> <span class="k">end</span>
</code></pre></div></div>
<p>The main difference from writing migrations as separate files is the need to define the <code class="language-plaintext highlighter-rouge">version</code> class method. Under normal circumstances, this would point to the file name of the migration and this is how ActiveRecord keeps track of the performed migrations, but also their designated order. The method should therefore return something unique and sortable - like a number that you increase with every new migration.</p>
<p>In your normal Rails or Sinatra app, you’d perform migrations by running <code class="language-plaintext highlighter-rouge">rake db:migrate</code>. For our self-contained single-file Ruby app, we will perform them automatically. We do this by calling:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># You can maintain this list yourself or use `ActiveRecord::Migration[5.2].subclasses`</span>
<span class="n">migrations</span> <span class="o">=</span> <span class="p">[</span><span class="no">CreateEventTableMigration</span><span class="p">,</span> <span class="no">AddEventCreatedMigration</span><span class="p">]</span>
<span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migrator</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:up</span><span class="p">,</span> <span class="n">migrations</span><span class="p">).</span><span class="nf">migrate</span>
</code></pre></div></div>
<p>This operation is idempotent.</p>
<p>Last but not least, you might have noticed there’s no explicit database connection in our code, but we don’t want to add a <code class="language-plaintext highlighter-rouge">database.yml</code> file, as it goes against our self-imposed single-file mantra. There’s a little ActiveRecord convention that can help us out here: the <code class="language-plaintext highlighter-rouge">DATABASE_URL</code> environment variable. You can use it to specify the database of your choice:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">DATABASE_URL</span><span class="o">=</span>postgres://dbuser:dbpass@locahost:5432/dbname ./my-script.rb
</code></pre></div></div>
<h2 id="inline-activerecord-roll-backs">Inline ActiveRecord roll-backs</h2>
<p>Rolling back the migrations can be achieved by changing the direction argument on the <code class="language-plaintext highlighter-rouge">ActiveRecord::Migrator</code> call to <code class="language-plaintext highlighter-rouge">:down</code>:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">migrations_to_roll_back</span> <span class="o">=</span> <span class="p">[</span><span class="no">CreateEventTableMigration</span><span class="p">,</span> <span class="no">AddEventCreatedMigration</span><span class="p">]</span>
<span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migrator</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:down</span><span class="p">,</span> <span class="n">migrations_to_roll_back</span><span class="p">).</span><span class="nf">migrate</span>
</code></pre></div></div>
<h2 id="alternative-inline-activerecord-schema-loading">Alternative: Inline ActiveRecord schema loading</h2>
<p>As pointed out by Janko Marohnić in the comments, an alternative to the previously described database migration process would be to perform a schema loading, similar to what ActiveRecord does when you call <code class="language-plaintext highlighter-rouge">rake db:schema:load</code>. The result looks simpler:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Schema</span><span class="p">.</span><span class="nf">define</span> <span class="k">do</span>
<span class="n">create_table</span> <span class="ss">:events</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
<span class="n">t</span><span class="p">.</span><span class="nf">string</span> <span class="ss">:name</span>
<span class="k">end</span>
<span class="n">change_table</span> <span class="ss">:events</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
<span class="n">t</span><span class="p">.</span><span class="nf">datetime</span> <span class="ss">:created_at</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>…but there’s a catch: <strong>the code is not idempotent</strong> and will fail when run a second time.</p>
<p>Fortunately ActiveRecord does provide us with <a href="https://apidock.com/rails/v4.2.7/ActiveRecord/ConnectionAdapters/SchemaStatements/column_exists%3F">the means</a> to make this idempotent. You’ll only need to be a bit more explicit:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Schema</span><span class="p">.</span><span class="nf">define</span> <span class="k">do</span>
<span class="k">unless</span> <span class="n">table_exists?</span><span class="p">(</span><span class="ss">:events</span><span class="p">)</span>
<span class="n">create_table</span> <span class="ss">:events</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
<span class="n">t</span><span class="p">.</span><span class="nf">string</span> <span class="ss">:name</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">unless</span> <span class="n">column_exists?</span><span class="p">(</span><span class="ss">:events</span><span class="p">,</span> <span class="ss">:created_at</span><span class="p">)</span>
<span class="n">change_table</span> <span class="ss">:events</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
<span class="n">t</span><span class="p">.</span><span class="nf">datetime</span> <span class="ss">:created_at</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>
<p>In this case, I’m using the <code class="language-plaintext highlighter-rouge">table_exists?</code> and <code class="language-plaintext highlighter-rouge">column_exists?</code> methods to avoid running my migrations a second time. Note that I’ve also preserved the incremental nature of my migrations - new migrations can be added to the <code class="language-plaintext highlighter-rouge">#define</code> block without interfering with the old ones.</p>
<h2 id="inline-everything">Inline everything!</h2>
<p>Here’s how the final result looks like:</p>
<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">#!/usr/bin/env ruby</span>
<span class="nb">require</span> <span class="s2">"bundler/inline"</span>
<span class="n">gemfile</span> <span class="k">do</span>
<span class="n">source</span> <span class="s2">"https://rubygems.org"</span>
<span class="n">gem</span> <span class="s2">"sinatra"</span><span class="p">,</span> <span class="s2">"2.0.5"</span>
<span class="n">gem</span> <span class="s2">"sinatra-activerecord"</span><span class="p">,</span> <span class="s2">"2.0.13"</span>
<span class="n">gem</span> <span class="s2">"pg"</span><span class="p">,</span> <span class="s2">"1.1.4"</span>
<span class="k">end</span>
<span class="k">class</span> <span class="nc">CreateEventTableMigration</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migration</span><span class="p">[</span><span class="mf">5.2</span><span class="p">]</span>
<span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">version</span>
<span class="mi">1</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">change</span>
<span class="n">create_table</span> <span class="ss">:events</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
<span class="n">t</span><span class="p">.</span><span class="nf">string</span> <span class="ss">:name</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">class</span> <span class="nc">AddEventCreatedMigration</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migration</span><span class="p">[</span><span class="mf">5.2</span><span class="p">]</span>
<span class="k">def</span> <span class="nc">self</span><span class="o">.</span><span class="nf">version</span>
<span class="mi">2</span>
<span class="k">end</span>
<span class="k">def</span> <span class="nf">change</span>
<span class="n">change_table</span> <span class="ss">:events</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
<span class="n">t</span><span class="p">.</span><span class="nf">datetime</span> <span class="ss">:created_at</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="c1"># Perform migrations</span>
<span class="n">migrations</span> <span class="o">=</span> <span class="p">[</span><span class="no">CreateEventTableMigration</span><span class="p">,</span> <span class="no">AddEventCreatedMigration</span><span class="p">]</span>
<span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Migrator</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:up</span><span class="p">,</span> <span class="n">migrations</span><span class="p">).</span><span class="nf">migrate</span>
<span class="c1"># Define your model</span>
<span class="k">class</span> <span class="nc">Event</span> <span class="o"><</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span><span class="p">;</span> <span class="k">end</span>
<span class="n">set</span> <span class="ss">:port</span><span class="p">,</span> <span class="mi">3000</span>
<span class="n">get</span> <span class="s2">"/events/last"</span> <span class="k">do</span>
<span class="n">event</span> <span class="o">=</span> <span class="no">Event</span><span class="p">.</span><span class="nf">last</span>
<span class="k">next</span> <span class="s2">"{}"</span> <span class="k">unless</span> <span class="n">event</span>
<span class="n">event</span><span class="p">.</span><span class="nf">to_json</span>
<span class="k">end</span>
<span class="n">post</span> <span class="s2">"/events"</span> <span class="k">do</span>
<span class="n">event</span> <span class="o">=</span> <span class="no">Event</span><span class="p">.</span><span class="nf">create</span><span class="p">(</span><span class="ss">name: </span><span class="n">params</span><span class="p">[</span><span class="ss">:name</span><span class="p">]</span> <span class="o">||</span> <span class="s2">"unknown"</span><span class="p">,</span> <span class="ss">created_at: </span><span class="no">Time</span><span class="p">.</span><span class="nf">now</span><span class="p">)</span>
<span class="n">event</span><span class="p">.</span><span class="nf">to_json</span>
<span class="k">end</span>
<span class="no">Sinatra</span><span class="o">::</span><span class="no">Application</span><span class="p">.</span><span class="nf">run!</span>
</code></pre></div></div>
<p>Prerequisites:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Create the Postgres database</span>
createdb single-file-example
<span class="c"># Make the script executable</span>
<span class="nb">chmod</span> +x my-script.rb
</code></pre></div></div>
<p>This is how you’d run the script:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">DATABASE_URL</span><span class="o">=</span>postgres:///single-file-example ./my-script.rb
</code></pre></div></div>
<p>When executed, the script will:</p>
<ul>
<li>Install any missing gems.</li>
<li>Require the listed gems.</li>
<li>Perform database migrations on the database specified via <code class="language-plaintext highlighter-rouge">DATABASE_URL</code>.</li>
<li>Run a Sinatra application on your local port 3000 that responds to <code class="language-plaintext highlighter-rouge">GET /events/last</code> and <code class="language-plaintext highlighter-rouge">POST /events</code>.</li>
</ul>
<h2 id="references">References</h2>
<ul>
<li><a href="https://bundler.io/v2.0/guides/bundler_in_a_single_file_ruby_script.html">https://bundler.io/v2.0/guides/bundler_in_a_single_file_ruby_script.html</a></li>
<li><a href="https://github.com/bundler/bundler/blob/master/lib/bundler/inline.rb">https://github.com/bundler/bundler/blob/master/lib/bundler/inline.rb</a></li>
<li><a href="https://github.com/rails/rails/blob/master/activerecord/lib/active_record/tasks/database_tasks.rb">https://github.com/rails/rails/blob/master/activerecord/lib/active_record/tasks/database_tasks.rb</a></li>
<li><a href="https://apidock.com/rails/v4.2.7/ActiveRecord/ConnectionAdapters/SchemaStatements/column_exists%3F">https://apidock.com/rails/v4.2.7/ActiveRecord/ConnectionAdapters/SchemaStatements/column_exists%3F</a></li>
</ul>One Ruby file to rule them all: inline gems and inline ActiveRecord migrationsCrystal: Raising exceptions from Fibers, the parallel macro and invalid memory access2018-02-17T00:00:00+00:002018-02-17T00:00:00+00:00https://lipanski.com/posts/crystal-parallel<h1 id="crystal-raising-exceptions-from-fibers-the-parallel-macro-and-invalid-memory-access">Crystal: Raising exceptions from Fibers, the parallel macro and invalid memory access</h1>
<p><strong>UPDATE 2:</strong> My patch to the original <code class="language-plaintext highlighter-rouge">parallel</code> macro made it into <a href="https://github.com/crystal-lang/crystal/releases/tag/0.25.0">Crystal 0.25.0</a>, which also provides <a href="https://crystal-lang.org/api/0.25.0/toplevel.html#parallel%28%2Ajobs%29-macro">proper documentation</a> for this quite useful language feature.</p>
<p><strong>UPDATE 1:</strong> I openend a <a href="https://github.com/crystal-lang/crystal/pull/5726">PR</a> to propose my changes to Crystal.</p>
<p>Concurrency can be achieved in Crystal by using <em>Fibers</em>. Communication between <em>Fibers</em> is handled via <em>Channels</em>. The <a href="https://crystal-lang.org/docs/guides/concurrency.html">documentation</a> on these topics is quite comprehensive so I won’t go into detail here.</p>
<p>This post will focus on the <code class="language-plaintext highlighter-rouge">parallel</code> macro, present one of its drawbacks when dealing with unhandled exceptions and introduce a solution: the <code class="language-plaintext highlighter-rouge">parallel!</code> macro.</p>
<h2 id="the-parallel-macro">The parallel macro</h2>
<p>One useful tool that didn’t make it into the Crystal Book is the <code class="language-plaintext highlighter-rouge">parallel</code> macro. It allows firing up and waiting for several concurrent jobs in a more succinct manner:</p>
<div class="language-crystal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">say</span><span class="p">(</span><span class="n">word</span><span class="p">)</span>
<span class="nb">puts</span> <span class="n">word</span>
<span class="k">end</span>
<span class="n">parallel</span><span class="p">(</span>
<span class="n">say</span><span class="p">(</span><span class="s2">"a"</span><span class="p">),</span>
<span class="n">say</span><span class="p">(</span><span class="s2">"b"</span><span class="p">),</span>
<span class="n">say</span><span class="p">(</span><span class="s2">"c"</span><span class="p">),</span>
<span class="p">)</span>
</code></pre></div></div>
<p>The real beauty of this macro comes when you’re interested in capturing the return values of your concurrent jobs:</p>
<div class="language-crystal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">say</span><span class="p">(</span><span class="n">word</span> <span class="p">:</span> <span class="no">String</span><span class="p">)</span> <span class="p">:</span> <span class="no">String</span>
<span class="n">word</span>
<span class="k">end</span>
<span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span> <span class="o">=</span>
<span class="n">parallel</span><span class="p">(</span>
<span class="n">say</span><span class="p">(</span><span class="s2">"a"</span><span class="p">),</span>
<span class="n">say</span><span class="p">(</span><span class="s2">"b"</span><span class="p">),</span>
<span class="n">say</span><span class="p">(</span><span class="s2">"c"</span><span class="p">),</span>
<span class="p">)</span>
<span class="nb">puts</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span>
</code></pre></div></div>
<h2 id="raising-exceptions-in-fibers-and-usage-of-uninitialized-in-parallel">Raising exceptions in Fibers and usage of <code class="language-plaintext highlighter-rouge">uninitialized</code> in <code class="language-plaintext highlighter-rouge">parallel</code></h2>
<p>Exceptions raised from <em>Fibers</em> don’t propagate to the main thread. Though there are ways to re-raise these exceptions, the <code class="language-plaintext highlighter-rouge">parallel</code> macro doesn’t implement this.</p>
<p>For the same reason, the following code will get you in trouble:</p>
<div class="language-crystal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">say</span><span class="p">(</span><span class="n">word</span> <span class="p">:</span> <span class="no">String</span><span class="p">)</span> <span class="p">:</span> <span class="no">String</span>
<span class="k">raise</span> <span class="no">Exception</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"boom"</span><span class="p">)</span>
<span class="n">word</span>
<span class="k">end</span>
<span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span> <span class="o">=</span>
<span class="n">parallel</span><span class="p">(</span>
<span class="n">say</span><span class="p">(</span><span class="s2">"a"</span><span class="p">),</span>
<span class="n">say</span><span class="p">(</span><span class="s2">"b"</span><span class="p">),</span>
<span class="n">say</span><span class="p">(</span><span class="s2">"c"</span><span class="p">),</span>
<span class="p">)</span>
<span class="nb">puts</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span>
</code></pre></div></div>
<p>This program will crash because of an <code class="language-plaintext highlighter-rouge">Invalid memory access (signal 11)</code>.</p>
<p>The problem here lies in the usage of the <code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">b</code>, <code class="language-plaintext highlighter-rouge">c</code> variables on the last line, after the <em>Fibers</em> silently swallowed the exception. More specifically, the <code class="language-plaintext highlighter-rouge">parallel</code> macro implementation (which is <a href="https://github.com/crystal-lang/crystal/blob/v0.24.1/src/concurrent.cr#L131">quite easy to read</a>) has to define these variables before actually evaluating them. This is achieved by initially marking them as <code class="language-plaintext highlighter-rouge">uninitialized</code>:</p>
<div class="language-crystal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{%</span> <span class="k">for</span> <span class="n">job</span><span class="p">,</span> <span class="n">i</span> <span class="k">in</span> <span class="n">jobs</span> <span class="p">%}</span>
<span class="o">%</span><span class="n">ret</span><span class="p">{</span><span class="n">i</span><span class="p">}</span> <span class="o">=</span> <span class="n">uninitialized</span> <span class="n">typeof</span><span class="p">({{</span><span class="n">job</span><span class="p">}})</span>
<span class="c1"># ...</span>
<span class="p">{%</span> <span class="k">end</span> <span class="p">%}</span>
</code></pre></div></div>
<p>Yes - Crystal <a href="https://crystal-lang.org/docs/syntax_and_semantics/declare_var.html">allows declaring uninitialized variables</a>, and no - it’s probably not the best idea to use this <a href="https://github.com/crystal-lang/crystal/issues/4544#issuecomment-307612363">unless you know what you’re doing</a>. This is <em>unsafe</em> code.</p>
<p>Also note that placing checks before using the variable:</p>
<div class="language-crystal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="n">a</span>
<span class="nb">puts</span> <span class="n">a</span>
<span class="k">end</span>
</code></pre></div></div>
<p>…will not solve the issue and <a href="https://github.com/crystal-lang/crystal/issues/4544#issuecomment-307635912">there’s no way</a> to tell if a variable is initialized or not.</p>
<p>As long as your <em>Fibers</em> don’t raise unhandled exceptions, you can safely use the <code class="language-plaintext highlighter-rouge">parallel</code> macro. The moment you start raising undhandled exceptions, you’ll want to replace the <code class="language-plaintext highlighter-rouge">parallel</code> macro with some more explicit code:</p>
<div class="language-crystal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">say</span><span class="p">(</span><span class="n">word</span> <span class="p">:</span> <span class="no">String</span><span class="p">)</span> <span class="p">:</span> <span class="no">String</span>
<span class="k">raise</span> <span class="no">Exception</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"boom"</span><span class="p">)</span>
<span class="n">word</span>
<span class="k">end</span>
<span class="n">channel</span> <span class="o">=</span> <span class="no">Channel</span><span class="p">(</span><span class="no">Nil</span><span class="p">).</span><span class="nf">new</span>
<span class="n">a</span> <span class="p">:</span> <span class="no">String</span><span class="p">?</span> <span class="o">=</span> <span class="kp">nil</span>
<span class="n">b</span> <span class="p">:</span> <span class="no">String</span><span class="p">?</span> <span class="o">=</span> <span class="kp">nil</span>
<span class="n">c</span> <span class="p">:</span> <span class="no">String</span><span class="p">?</span> <span class="o">=</span> <span class="kp">nil</span>
<span class="n">spawn</span> <span class="k">do</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">say</span><span class="p">(</span><span class="s2">"a"</span><span class="p">)</span>
<span class="k">ensure</span>
<span class="n">channel</span><span class="p">.</span><span class="nf">send</span><span class="p">(</span><span class="kp">nil</span><span class="p">)</span>
<span class="k">end</span>
<span class="n">spawn</span> <span class="k">do</span>
<span class="n">b</span> <span class="o">=</span> <span class="n">say</span><span class="p">(</span><span class="s2">"b"</span><span class="p">)</span>
<span class="k">ensure</span>
<span class="n">channel</span><span class="p">.</span><span class="nf">send</span><span class="p">(</span><span class="kp">nil</span><span class="p">)</span>
<span class="k">end</span>
<span class="n">spawn</span> <span class="k">do</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">say</span><span class="p">(</span><span class="s2">"c"</span><span class="p">)</span>
<span class="k">ensure</span>
<span class="n">channel</span><span class="p">.</span><span class="nf">send</span><span class="p">(</span><span class="kp">nil</span><span class="p">)</span>
<span class="k">end</span>
<span class="mi">3</span><span class="p">.</span><span class="nf">times</span> <span class="p">{</span> <span class="n">channel</span><span class="p">.</span><span class="nf">receive</span> <span class="p">}</span>
<span class="c1"># In this case a nil-check is not needed, but for other method calls you might need it.</span>
<span class="nb">puts</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span>
</code></pre></div></div>
<p>Notice how we had to compromise on the type of our variables here. Also, this solution is pretty verbose.</p>
<h2 id="re-raising-exceptions-and-the-parallel-macro">Re-raising exceptions and the <code class="language-plaintext highlighter-rouge">parallel!</code> macro</h2>
<p>I’m not a big fan of verbose (which is why I really enjoy Crystal and Ruby), so it was time for a macro to hide away all this code. Because I didn’t want to compromise on the variable type, I decided to re-raise the exceptions to the main thread.</p>
<p>Make way for the <code class="language-plaintext highlighter-rouge">parallel!</code> macro:</p>
<div class="language-crystal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">macro</span> <span class="nf">parallel!</span><span class="p">(</span><span class="o">*</span><span class="n">jobs</span><span class="p">)</span>
<span class="o">%</span><span class="n">channel</span> <span class="o">=</span> <span class="no">Channel</span><span class="p">(</span><span class="no">Exception</span> <span class="o">|</span> <span class="no">Nil</span><span class="p">).</span><span class="nf">new</span>
<span class="p">{%</span> <span class="k">for</span> <span class="n">job</span><span class="p">,</span> <span class="n">i</span> <span class="k">in</span> <span class="n">jobs</span> <span class="p">%}</span>
<span class="o">%</span><span class="n">ret</span><span class="p">{</span><span class="n">i</span><span class="p">}</span> <span class="o">=</span> <span class="n">uninitialized</span> <span class="n">typeof</span><span class="p">({{</span><span class="n">job</span><span class="p">}})</span>
<span class="n">spawn</span> <span class="k">do</span>
<span class="k">begin</span>
<span class="o">%</span><span class="n">ret</span><span class="p">{</span><span class="n">i</span><span class="p">}</span> <span class="o">=</span> <span class="p">{{</span><span class="n">job</span><span class="p">}}</span>
<span class="k">rescue</span> <span class="n">e</span> <span class="p">:</span> <span class="no">Exception</span>
<span class="o">%</span><span class="n">channel</span><span class="p">.</span><span class="nf">send</span> <span class="n">e</span>
<span class="k">else</span>
<span class="o">%</span><span class="n">channel</span><span class="p">.</span><span class="nf">send</span> <span class="kp">nil</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="p">{%</span> <span class="k">end</span> <span class="p">%}</span>
<span class="p">{{</span> <span class="n">jobs</span><span class="p">.</span><span class="nf">size</span> <span class="p">}}.</span><span class="nf">times</span> <span class="k">do</span>
<span class="o">%</span><span class="n">value</span> <span class="o">=</span> <span class="o">%</span><span class="n">channel</span><span class="p">.</span><span class="nf">receive</span>
<span class="k">if</span> <span class="o">%</span><span class="n">value</span><span class="p">.</span><span class="nf">is_a?</span><span class="p">(</span><span class="no">Exception</span><span class="p">)</span>
<span class="k">raise</span> <span class="o">%</span><span class="n">value</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="p">{</span>
<span class="p">{%</span> <span class="k">for</span> <span class="n">job</span><span class="p">,</span> <span class="n">i</span> <span class="k">in</span> <span class="n">jobs</span> <span class="p">%}</span>
<span class="o">%</span><span class="n">ret</span><span class="p">{</span><span class="n">i</span><span class="p">},</span>
<span class="p">{%</span> <span class="k">end</span> <span class="p">%}</span>
<span class="p">}</span>
<span class="k">end</span>
</code></pre></div></div>
<blockquote>
<p>Note that prepending variable names with <code class="language-plaintext highlighter-rouge">%</code> inside macros will generate unique names for them, so that they won’t collide with your other variable names outside the macro implementation.</p>
</blockquote>
<p>The implementation is mostly based on the original <code class="language-plaintext highlighter-rouge">parallel</code> implementation, the difference being that exceptions from <em>Fibers</em> will be re-raised in the main thread.</p>
<p>The following code will raise <code class="language-plaintext highlighter-rouge">Exception.new("boom")</code> even before the last line, instead of that nasty <code class="language-plaintext highlighter-rouge">Invalid memory access</code>:</p>
<div class="language-crystal highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">say</span><span class="p">(</span><span class="n">word</span> <span class="p">:</span> <span class="no">String</span><span class="p">)</span> <span class="p">:</span> <span class="no">String</span>
<span class="k">raise</span> <span class="no">Exception</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"boom"</span><span class="p">)</span>
<span class="n">word</span>
<span class="k">end</span>
<span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span> <span class="o">=</span>
<span class="n">parallel!</span><span class="p">(</span>
<span class="n">say</span><span class="p">(</span><span class="s2">"a"</span><span class="p">),</span>
<span class="n">say</span><span class="p">(</span><span class="s2">"b"</span><span class="p">),</span>
<span class="n">say</span><span class="p">(</span><span class="s2">"c"</span><span class="p">),</span>
<span class="p">)</span>
<span class="nb">puts</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span>
</code></pre></div></div>
<h2 id="links">Links</h2>
<ul>
<li><a href="https://crystal-lang.org/docs/guides/concurrency.html">Concurrency in Crystal</a></li>
<li><a href="https://github.com/crystal-lang/crystal/blob/v0.24.1/src/concurrent.cr#L131">The parallel macro implementation</a></li>
<li><a href="https://crystal-lang.org/docs/syntax_and_semantics/declare_var.html">Uninitialized variables in Crystal</a></li>
<li><a href="https://github.com/crystal-lang/crystal/issues/4544">Issue explaining invalid memory access</a></li>
<li><a href="https://github.com/crystal-lang/crystal/releases/tag/0.7.0">Crystal 0.7.0 release notes, which introduced the %var macro notation</a></li>
</ul>Crystal: Raising exceptions from Fibers, the parallel macro and invalid memory accessBlocking malicious requests with nginx + ModSecurity2018-01-26T00:00:00+00:002018-01-26T00:00:00+00:00https://lipanski.com/posts/modsecurity<h1 id="blocking-malicious-requests-with-nginx--modsecurity">Blocking malicious requests with nginx + ModSecurity</h1>
<p><a href="https://modsecurity.org/">ModSecurity</a> is a <strong>web application firewall</strong> integrated with Apache and nginx. It can match request information at various stages and throttle or allow/deny requests based on the rules you define. ModSecurity comes with the <a href="https://github.com/SpiderLabs/owasp-modsecurity-crs">OWASP core rule set</a> but a <a href="http://modsecurity.org/rules.html">paid set of rules</a> is also available. Integrating your own rules is quite easy.</p>
<p>This post will try to give you an overview of how to <strong>install the ModSecurity nginx module</strong>, how to <strong>configure the module</strong> and, finally, how to <strong>create a rule for blocking a list of mailicous IPs</strong>.</p>
<h2 id="installing-modsecurity-nginx-bash">Installing ModSecurity-nginx (Bash)</h2>
<p>For the equivalent Ansible playbook, skip to the next chapter.</p>
<p>In order to install the <a href="https://github.com/SpiderLabs/ModSecurity-nginx">ModSecurity-nginx module</a> you’ll need to:</p>
<ul>
<li>install the <em>libmodsecurity</em> dependencies</li>
<li>build and install <em>libmodsecurity</em></li>
<li>pull the nginx source code for the nginx version that you’re currently running</li>
<li>build ModSecurity-nginx as a dynamic module by using the nginx source code</li>
<li>include the built module in your <code class="language-plaintext highlighter-rouge">nginx.conf</code></li>
</ul>
<p>The following commands were run on Ubuntu 14.04. Mileage may vary.</p>
<p>First, you’ll want to install the dependencies required in order to build <em>libmodsecurity</em>:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt <span class="nb">install</span> <span class="se">\</span>
git <span class="se">\</span>
g++ <span class="se">\</span>
flex <span class="se">\</span>
bison <span class="se">\</span>
curl <span class="se">\</span>
doxygen <span class="se">\</span>
libyajl-dev <span class="se">\</span>
libgeoip-dev <span class="se">\</span>
libtool <span class="se">\</span>
dh-autoreconf <span class="se">\</span>
libcurl4-gnutls-dev <span class="se">\</span>
libxml2 <span class="se">\</span>
libpcre++-dev <span class="se">\</span>
libxml2-dev
</code></pre></div></div>
<p>Next, you’ll want to pull the <em>libmodsecurity</em> code, build and install it:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd </span>opt/
git clone <span class="nt">--branch</span> v3.0.0 <span class="nt">--depth</span> 1 https://github.com/SpiderLabs/ModSecurity.git
<span class="nb">cd </span>ModSecurity/
./build.sh
git submodule init
git submodule update
./configure
make
make <span class="nb">install</span>
</code></pre></div></div>
<p>As explained <a href="/posts/nginx-dynamic-modules">in my previous post</a>, in order to build an nginx module you’ll need to pull in the source code of the nginx version that you’re currently running:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Identify your current nginx version</span>
nginx <span class="nt">-v</span>
<span class="c"># Pull the source code</span>
<span class="nb">cd </span>opt/
wget http://nginx.org/download/nginx-[INSERT NGINX VERSION HERE].tar.gz
<span class="nb">tar</span> <span class="nt">-xzvf</span> nginx-[INSERT NGINX VERSION HERE].tar.gz
</code></pre></div></div>
<p>Afterwards, download the <a href="https://github.com/SpiderLabs/ModSecurity-nginx">ModSecurity-nginx module</a>:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> /opt
git clone <span class="nt">--branch</span> v1.0.0 <span class="nt">--depth</span> 1 https://github.com/SpiderLabs/ModSecurity-nginx.git
</code></pre></div></div>
<p>Enter the directory where you downloaded the nginx source code, build ModSecurity-nginx as a dynamic module and copy it to <code class="language-plaintext highlighter-rouge">/etc/nginx/modules</code>:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd</span> /opt/nginx-[INSERT NGINX VERSION HERE]
./configure <span class="nt">--with-compat</span> <span class="nt">--add-dynamic-module</span><span class="o">=</span>/opt/ModSecurity-nginx <span class="nt">--with-cc-opt</span><span class="o">=</span><span class="nt">-Wno-error</span>
make modules
<span class="nb">cp </span>objs/ngx_http_modsecurity_module.so /etc/nginx/modules
</code></pre></div></div>
<p>Finally, include the module somewhere towards the beginning of your nginx configuration:</p>
<div class="language-nginx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># /etc/nginx/nginx.conf</span>
<span class="k">load_module</span> <span class="nc">modules/ngx</span><span class="s">_http_modsecurity_module.so</span><span class="p">;</span>
</code></pre></div></div>
<h2 id="installing-modsecurity-nginx-ansible">Installing ModSecurity-nginx (Ansible)</h2>
<p>This is the equivalent of the previous Bash commands in Ansible. It assumes nginx is already installed. It’s been tested with Ubuntu 14.04.</p>
<div class="language-yml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="pi">-</span> <span class="na">hosts</span><span class="pi">:</span> <span class="s">all</span>
<span class="na">become</span><span class="pi">:</span> <span class="no">true</span>
<span class="na">vars</span><span class="pi">:</span>
<span class="na">nginx_modsecurity_branch</span><span class="pi">:</span> <span class="s">v3.0.0</span>
<span class="na">nginx_modsecurity_nginx_branch</span><span class="pi">:</span> <span class="s">v1.0.0</span>
<span class="na">tasks</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">install modsecurity dependencies</span>
<span class="na">apt</span><span class="pi">:</span> <span class="s">name="{{ item }}"</span>
<span class="na">with_items</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">git</span>
<span class="pi">-</span> <span class="s">g++</span>
<span class="pi">-</span> <span class="s">flex</span>
<span class="pi">-</span> <span class="s">bison</span>
<span class="pi">-</span> <span class="s">curl</span>
<span class="pi">-</span> <span class="s">doxygen</span>
<span class="pi">-</span> <span class="s">libyajl-dev</span>
<span class="pi">-</span> <span class="s">libgeoip-dev</span>
<span class="pi">-</span> <span class="s">libtool</span>
<span class="pi">-</span> <span class="s">dh-autoreconf</span>
<span class="pi">-</span> <span class="s">libcurl4-gnutls-dev</span>
<span class="pi">-</span> <span class="s">libxml2</span>
<span class="pi">-</span> <span class="s">libpcre++-dev</span>
<span class="pi">-</span> <span class="s">libxml2-dev</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">clone the modsecurity repository</span>
<span class="na">git</span><span class="pi">:</span> <span class="s">repo="https://github.com/SpiderLabs/ModSecurity.git" version="{{ nginx_modsecurity_branch }}" accept_hostkey=yes depth=1 force=yes dest=/opt/ModSecurity</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">build and install modsecurity</span>
<span class="na">shell</span><span class="pi">:</span> <span class="s2">"</span><span class="s">{{</span><span class="nv"> </span><span class="s">item</span><span class="nv"> </span><span class="s">}}"</span>
<span class="na">args</span><span class="pi">:</span>
<span class="na">chdir</span><span class="pi">:</span> <span class="s">/opt/ModSecurity</span>
<span class="na">with_items</span><span class="pi">:</span>
<span class="pi">-</span> <span class="s">./build.sh</span>
<span class="pi">-</span> <span class="s">git submodule init</span>
<span class="pi">-</span> <span class="s">git submodule update</span>
<span class="pi">-</span> <span class="s">./configure</span>
<span class="pi">-</span> <span class="s">make</span>
<span class="pi">-</span> <span class="s">make install</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">clone the modsecurity-nginx repository</span>
<span class="na">git</span><span class="pi">:</span> <span class="s">repo="https://github.com/SpiderLabs/ModSecurity-nginx.git" version="{{ nginx_modsecurity_nginx_branch }}" accept_hostkey=yes depth=1 force=yes dest=/opt/ModSecurity-nginx</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">read the nginx version</span>
<span class="na">command</span><span class="pi">:</span> <span class="s">nginx -v</span>
<span class="na">register</span><span class="pi">:</span> <span class="s">nginx_version_output</span>
<span class="c1"># nginx writes the version to stderr</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">parse the installed nginx version</span>
<span class="na">set_fact</span><span class="pi">:</span>
<span class="na">installed_nginx_version</span><span class="pi">:</span> <span class="s2">"</span><span class="s">{{</span><span class="nv"> </span><span class="s">nginx_version_output.stderr.split('/')[1]</span><span class="nv"> </span><span class="s">}}"</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">download and extract the nginx sources for building the module</span>
<span class="na">unarchive</span><span class="pi">:</span> <span class="s">src="http://nginx.org/download/nginx-{{ installed_nginx_version }}.tar.gz" remote_src=yes dest=/opt creates="/opt/nginx-{{ installed_nginx_version }}"</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">configure the modsecurity-nginx module</span>
<span class="na">shell</span><span class="pi">:</span> <span class="s">./configure --with-compat --add-dynamic-module=/opt/ModSecurity-nginx --with-cc-opt=-Wno-error</span>
<span class="na">args</span><span class="pi">:</span>
<span class="na">chdir</span><span class="pi">:</span> <span class="s2">"</span><span class="s">/opt/nginx-{{</span><span class="nv"> </span><span class="s">installed_nginx_version</span><span class="nv"> </span><span class="s">}}"</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">build the modsecurity-nginx module</span>
<span class="na">shell</span><span class="pi">:</span> <span class="s">make modules</span>
<span class="na">args</span><span class="pi">:</span>
<span class="na">chdir</span><span class="pi">:</span> <span class="s2">"</span><span class="s">/opt/nginx-{{</span><span class="nv"> </span><span class="s">installed_nginx_version</span><span class="nv"> </span><span class="s">}}"</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">copy the module to /etc/nginx/modules</span>
<span class="na">shell</span><span class="pi">:</span> <span class="s">cp /opt/nginx-{{ installed_nginx_version }}/objs/ngx_http_modsecurity_module.so /etc/nginx/modules</span>
<span class="na">args</span><span class="pi">:</span>
<span class="na">creates</span><span class="pi">:</span> <span class="s">/etc/nginx/modules/ngx_http_modsecurity_module.so</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">load modsecurity inside nginx.conf</span>
<span class="na">lineinfile</span><span class="pi">:</span>
<span class="na">path</span><span class="pi">:</span> <span class="s">/etc/nginx/nginx.conf</span>
<span class="na">insertbefore</span><span class="pi">:</span> <span class="s">BOF</span>
<span class="na">line</span><span class="pi">:</span> <span class="s2">"</span><span class="s">load_module</span><span class="nv"> </span><span class="s">modules/ngx_http_modsecurity_module.so;"</span>
</code></pre></div></div>
<h2 id="configuring-modsecurity-nginx">Configuring ModSecurity-nginx</h2>
<p>The <a href="https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual">ModSecurity Reference Manual</a> provides a good overview of all the options and rules that ship with ModSecurity. Likewise, the <a href="https://github.com/SpiderLabs/ModSecurity-nginx">ModSecurity-nginx README</a> provides information about using the nginx module.</p>
<p>This part discusses the basic configuration required in order to start writing your own rules.</p>
<p>Inside your nginx site configuration, enable ModSecurity:</p>
<div class="language-nginx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">server</span> <span class="p">{</span>
<span class="c1"># ...</span>
<span class="kn">modsecurity</span> <span class="no">on</span><span class="p">;</span>
<span class="kn">location</span> <span class="n">/</span> <span class="p">{</span>
<span class="c1"># ...</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Next, inside every <code class="language-plaintext highlighter-rouge">location</code> block that should apply the ModSecurity rules, enable the rule engine and some handy options, like logging:</p>
<div class="language-nginx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">server</span> <span class="p">{</span>
<span class="c1"># ...</span>
<span class="kn">modsecurity</span> <span class="no">on</span><span class="p">;</span>
<span class="kn">location</span> <span class="n">/</span> <span class="p">{</span>
<span class="kn">modsecurity_rules</span> <span class="s">'</span>
<span class="s">SecRuleEngine</span> <span class="s">On</span>
<span class="s">SecDebugLog</span> <span class="n">/var/log/nginx/modsecurity-debug.log</span>
<span class="s">SecDebugLogLevel</span> <span class="mi">3</span>
<span class="s">SecAuditEngine</span> <span class="s">On</span>
<span class="s">SecAuditLog</span> <span class="n">/var/log/nginx/modsecurity-audit.log</span>
<span class="s">SecAuditLogParts</span> <span class="s">ABKZ</span>
<span class="s">'</span><span class="p">;</span>
<span class="c1"># ...</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The <strong>audit log</strong> will record the requests that matched your rules, while the <strong>debug log</strong> will contain ModSecurity related debug information, like a misconfigured setup.</p>
<p>If you’d like the ModSecurity <strong>audit logs to use JSON</strong>, add <code class="language-plaintext highlighter-rouge">SecAuditLogFormat JSON</code> to the mix.</p>
<p>Note that ModSecurity <strong>rule sets can also be loaded from a file</strong>:</p>
<div class="language-nginx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">server</span> <span class="p">{</span>
<span class="c1"># ...</span>
<span class="kn">modsecurity</span> <span class="no">on</span><span class="p">;</span>
<span class="kn">location</span> <span class="n">/</span> <span class="p">{</span>
<span class="kn">modsecurity_rules_file</span> <span class="n">/etc/modsecurity/my_rules.conf</span><span class="p">;</span>
<span class="c1"># ...</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>…where <code class="language-plaintext highlighter-rouge">/etc/modsecurity/my_rules.conf</code> would look like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SecRuleEngine On
SecDebugLog /var/log/nginx/modsecurity-debug.log
SecDebugLogLevel 3
SecAuditEngine On
SecAuditLogParts ABKZ
SecAuditLog /var/log/nginx/modsecurity-audit.log
</code></pre></div></div>
<h2 id="a-rule-to-block-ips-based-on-a-list">A rule to block IPs based on a list</h2>
<p>It’s time to write your first custom rule. Let’s introduce a blacklist - a file containing a list of IPs (masked or not) that should be denied any requests:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># /etc/modsecurity/blacklist.txt
1.2.3.4
5.6.7.8/24
9.10.11.12/16
</code></pre></div></div>
<p>How to maintain this file is left up to the reader. There are <a href="https://zeltser.com/malicious-ip-blocklists/">plenty</a> of <a href="http://iplists.firehol.org/">lists</a> out there; pick the one that suits your needs and make sure you strip any additional information or markup, aside from the IPs and their masks.</p>
<p>Once you have your list, add the follwing rule either to your nginx site configuration or to your dedicated ModSecurity configuration file:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SecRule REMOTE_ADDR "@ipMatchFromFile /etc/modsecurity/blacklist.txt" id:1,phase:1,deny,status:403,msg:\'blacklist\'
</code></pre></div></div>
<p>A couple of things are worth mentioning here:</p>
<ul>
<li>
<p>The <code class="language-plaintext highlighter-rouge">ipMatchFromFile</code> call is one of the <a href="https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual#Transformation_functions">many transformation functions</a> that you can use to match ModSecurity variables.</p>
</li>
<li>
<p>Likewise, <code class="language-plaintext highlighter-rouge">REMOTE_ADDR</code> is one of the <a href="https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual#Variables">many variables</a> that you can use to match request details, like the request IP in this case.</p>
</li>
<li>
<p>Every rule needs to have a unique <code class="language-plaintext highlighter-rouge">id</code>.</p>
</li>
<li>
<p>The <code class="language-plaintext highlighter-rouge">phase</code> refers to the <a href="(https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual#Processing_Phases)">event of the request lifecycle</a> when the rule can be applied: after parsing request headers, after parsing the request body, after parsing the response headers, after parsing the response body, after logging etc. In this case, <code class="language-plaintext highlighter-rouge">phase:1</code> means we apply our rule when we already have the request headers.</p>
</li>
<li>
<p>The <code class="language-plaintext highlighter-rouge">deny</code> keyword is the <strong>action</strong> that a matched requests will trigger. The opposite would be <code class="language-plaintext highlighter-rouge">allow</code> (if you’d be building a whitelist). Another interesting action is <code class="language-plaintext highlighter-rouge">drop</code>, if you’d like to <code class="language-plaintext highlighter-rouge">drop</code> the TCP connection instantly (like in the case of DDOS attacks), but this value didn’t really work in my tests.</p>
</li>
<li>
<p>You can set the <code class="language-plaintext highlighter-rouge">status</code> code that a matched request will receive. The response body will be the nginx template that coresponds to this status code.</p>
</li>
<li>
<p>The <code class="language-plaintext highlighter-rouge">msg</code> option can be used as an human readable identifier which will appear in the audit log.</p>
</li>
</ul>
<p>Once you’ve added the rule, make sure to reload nginx:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>service nginx reload
</code></pre></div></div>
<p>That’s pretty much it. Requests by IPs on your list will now be blocked and clients will see a <em>403</em> error.</p>
<p>Note that any changes to your list of IPs or to your ModSecurity rules will require <em>reloading nginx</em> in order for ModSecurity to pick up the changes.</p>
<h2 id="running-nginx-behind-a-reverse-proxy">Running nginx behind a reverse proxy</h2>
<p>If you’re running nginx behind a reverse proxy (e.g. a load balancer), which hides the client IP but sets the <code class="language-plaintext highlighter-rouge">X-Forwarded-For</code> header correctly, I recommend setting the <code class="language-plaintext highlighter-rouge">real_ip_header</code> option in your nginx configuration:</p>
<div class="language-nginx highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># /etc/nginx/nginx.conf</span>
<span class="k">http</span> <span class="p">{</span>
<span class="kn">real_ip_header</span> <span class="s">X-Forwarded-For</span><span class="p">;</span>
<span class="c1"># The proxy address that you trust to set the X-Forwarded-For header correctly</span>
<span class="kn">set_real_ip_from</span> <span class="mi">10</span><span class="s">.0.0.0/8</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>…which will ensure that the <code class="language-plaintext highlighter-rouge">REMOTE_ADDR</code> variable in ModSecurity points to the client IP and not the reverse proxy.</p>
<p>Sometimes, you’ll also want to enable the <code class="language-plaintext highlighter-rouge">real_ip_recursive</code> option - see the <a href="http://nginx.org/en/docs/http/ngx_http_realip_module.html#real_ip_recursive">documentation</a> for more details.</p>
<h2 id="alternatives">Alternatives</h2>
<p>One alternative that works with nginx is <a href="https://github.com/p0pr0ck5/lua-resty-waf">lua-resty-waf</a>. It requires <a href="https://openresty.org/en/">OpenResty</a> though or recompiling nginx with OpenResty/Lua.</p>
<p>Another one would be the cloud-based <a href="https://aws.amazon.com/waf/">AWS WAF</a>, which comes with some annoying restrictions: IP lists are limited to 1000 entries and you can only use <code class="language-plaintext highlighter-rouge">/8</code>, <code class="language-plaintext highlighter-rouge">/16</code>, <code class="language-plaintext highlighter-rouge">/24</code> or <code class="language-plaintext highlighter-rouge">/32</code> CIDR masks. The suggested workaround is to create multiple lists of 1000 entries and convert other masks to the available ones. Another limitation on the AWS WAF is that some AWS regions don’t come with all its features (e.g. the load balancer integration), so make sure to check availability in your region before.</p>
<p>If you only want to allow/deny a list of IPs, there’s the <a href="https://github.com/Vasfed/nginx_ipset_blacklist">nginx-ipset-blacklist module</a>, but it looks quite outdated and won’t work with newer nginx versions. On the other hand, you could use plain <code class="language-plaintext highlighter-rouge">iptables</code> or <code class="language-plaintext highlighter-rouge">iptables</code> integrated with <a href="http://ipset.netfilter.org/">IpSet</a> - a fast lookup store for IP addresses.</p>
<h2 id="links">Links</h2>
<ul>
<li><a href="/posts/nginx-dynamic-modules">Compiling and using dynamic nginx modules</a></li>
<li><a href="https://github.com/SpiderLabs/ModSecurity/wiki/Compilation-recipes#ubuntu-1504">ModSecurity Compilation Recipe for Ubuntu 15.04</a></li>
<li><a href="https://modsecurity.org/">https://modsecurity.org/</a></li>
<li><a href="https://github.com/SpiderLabs/ModSecurity">https://github.com/SpiderLabs/ModSecurity</a></li>
<li><a href="https://github.com/SpiderLabs/ModSecurity-nginx">https://github.com/SpiderLabs/ModSecurity-nginx</a></li>
<li><a href="https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual">https://github.com/SpiderLabs/ModSecurity/wiki/Reference-Manual</a></li>
<li><a href="https://danieljamesscott.org/all-articles/9-articles/34-whitelisting-ips-for-mod-security-when-behind-a-load-balancer.html">https://danieljamesscott.org/all-articles/9-articles/34-whitelisting-ips-for-mod-security-when-behind-a-load-balancer.html</a></li>
</ul>Blocking malicious requests with nginx + ModSecurityLinux: How to make your own keyboard layout2017-12-27T00:00:00+00:002017-12-27T00:00:00+00:00https://lipanski.com/posts/custom-keyboard-layout<h1 id="linux-how-to-make-your-own-keyboard-layout">Linux: How to make your own keyboard layout</h1>
<h2 id="history-time">History time</h2>
<p>Apart from the Latin alphabet, the Romanian language uses five special letters: <em>ă</em>, <em>â</em>, <em>î</em>, <em>ș</em>, <em>ț</em>.</p>
<p>Due to the lack of Romanian hardware keyboards back in the day (which hasn’t changed much), one of the most popular Romanian keyboard mapping is fully compatible with US keyboards. The special letters are produced by pressing <code class="language-plaintext highlighter-rouge">AltGr</code> and the corresponding Latin letter, with the exception of “ă”, which is attached to <code class="language-plaintext highlighter-rouge">AltGr + Q</code>. This is called the <em>Romanian (Programmers)</em> layout and is the de facto Romanian layout in the Linux world. Pairing the extra glyphs to the Latin keys is quite convenient, especially for programmers and people generally accustomed with the US layout.</p>
<p>On the other hand, the German layouts expect a German hardware keyboard. When using a US keyboard, the default layout maps <code class="language-plaintext highlighter-rouge">'</code>, <code class="language-plaintext highlighter-rouge">;</code>, <code class="language-plaintext highlighter-rouge">[</code> to produce <em>ä</em>, <em>ö</em>, <em>ü</em>. This is definitely not something that plays well with US keyboards or with my brain.</p>
<p>In my ideal world, I would trigger the German <em>umlauts</em> with the <code class="language-plaintext highlighter-rouge">AltGr</code> key on top of the US layout and have the Romanian characters work side by side, the same way. There’s definitely no layout for that. Fortunately, creating your own custom layout is quite easy in Ubuntu.</p>
<h2 id="how-to-make-your-own-custom-layout">How to make your own custom layout</h2>
<p>Layouts are contained within the <code class="language-plaintext highlighter-rouge">/usr/share/X11/xkb/symbols</code> directory.</p>
<p>Let’s start by copying the Romanian layout (feel free to choose any other layout as your starting point):</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>su
<span class="nb">cd</span> /usr/share/X11/xkb/symbols
<span class="nb">cp </span>ro fl
</code></pre></div></div>
<p>I’m calling my layout <code class="language-plaintext highlighter-rouge">fl</code>. If you open this file, you’ll see lots of lines that look like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>key <AD01> { [ q, Q, acircumflex, Acircumflex ] };
</code></pre></div></div>
<p>These lines are the key bindings - they connect a particular key on your keyboard to the value that will be printed out.</p>
<p>The first part <code class="language-plaintext highlighter-rouge">key <AD01></code> represents <strong>the key</strong> that we want to map. The code <code class="language-plaintext highlighter-rouge"><AD01></code> has the following meaning:</p>
<ul>
<li>The first letter, <code class="language-plaintext highlighter-rouge">A</code>, points to the <em>alphanumeric</em> key block. Other options include <code class="language-plaintext highlighter-rouge">KP</code> for the keypad and <code class="language-plaintext highlighter-rouge">FN</code> for the function keys.</li>
<li>The second letter, <code class="language-plaintext highlighter-rouge">D</code>, points to the row. The count starts with the row containing the space bar, which would be row <code class="language-plaintext highlighter-rouge">A</code>.</li>
<li>The last two digits represent the column of the key, starting with <code class="language-plaintext highlighter-rouge">01</code>, going from left to right and ignoring special keys (like <code class="language-plaintext highlighter-rouge">TAB</code> or <code class="language-plaintext highlighter-rouge">CapsLock</code>).</li>
</ul>
<p>So <code class="language-plaintext highlighter-rouge">key <AD01></code> would point to the <em>first key</em> (ignoring the <code class="language-plaintext highlighter-rouge">TAB</code>) of the <em>forth row</em> of the <em>alphanumeric block</em>, which on most keyboards is the letter <code class="language-plaintext highlighter-rouge">Q</code>.</p>
<p>The second part, <code class="language-plaintext highlighter-rouge">[ q, Q, acircumflex, Acircumflex ]</code>, points to the character that will be printed out, for the various combinations of <code class="language-plaintext highlighter-rouge">Shift</code> and <code class="language-plaintext highlighter-rouge">AltGr</code>:</p>
<ol>
<li>No combination: q</li>
<li><code class="language-plaintext highlighter-rouge">Shift</code>: Q</li>
<li><code class="language-plaintext highlighter-rouge">AltGr</code>: â (or <em>a circumflex</em>)</li>
<li><code class="language-plaintext highlighter-rouge">Shift + AltGr</code>: Â (or <em>A circumflex</em>)</li>
</ol>
<p>If don’t intend to use the <code class="language-plaintext highlighter-rouge">Shift</code> and/or <code class="language-plaintext highlighter-rouge">AltGr</code> combinations, you can simply leave those positions out.</p>
<p>Note that for representing the printed value you can either use the entity name - like <code class="language-plaintext highlighter-rouge">q</code> for the letter <em>q</em> and <code class="language-plaintext highlighter-rouge">acircumflex</code> for the letter <em>â</em> - or the <a href="https://en.wikipedia.org/wiki/List_of_Unicode_characters#Control_codes">Unicode control codes</a> - like <code class="language-plaintext highlighter-rouge">U00E5</code> for the letter <em>å</em>.</p>
<p>Based on these conventions, it’s quite easy to either add new bindings or modify existing ones.</p>
<p>In my case, I’ve added the following lines to the default block, in order to extend the Romanian layout with the German special letters:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// Print ö when pressing o + AltGr
key <AD09> { [ o, O, odiaeresis, Odiaeresis ] };
// Print ü when pressing u + AltGr
key <AD07> { [ u, U, udiaeresis, Udiaeresis ] };
// Print ß when pressing z + AltGr
// Because there's no capital ß in the German language, I've left out the Shift + AltGr combination
key <AB01> { [ z, Z, ssharp ] };
</code></pre></div></div>
<p>I’ve also edited the existing binding for the letter <em>w</em>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>// Print ä when pressing w + AltGr
key <AD02> { [ w, W, adiaeresis, Adiaeresis ] };
</code></pre></div></div>
<p>One more thing before we’re done: Ubuntu keeps track of the installed keyboards in a file called <code class="language-plaintext highlighter-rouge">/usr/share/X11/xkb/rules/evdev.xml</code>. Open it with your favourite editor and, after the last <code class="language-plaintext highlighter-rouge"></layout></code> tag, add your new layout:</p>
<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nt"><layout></span>
<span class="nt"><configItem></span>
<span class="nt"><name></span>fl<span class="nt"></name></span>
<span class="nt"><shortDescription></span>fl<span class="nt"></shortDescription></span>
<span class="nt"><description></span>My very own keyboard layout<span class="nt"></description></span>
<span class="nt"><languageList></span>
<span class="nt"><iso639Id></span>rum<span class="nt"></iso639Id></span>
<span class="nt"><iso639Id></span>ger<span class="nt"></iso639Id></span>
<span class="nt"><iso639Id></span>eng<span class="nt"></iso639Id></span>
<span class="nt"></languageList></span>
<span class="nt"></configItem></span>
<span class="nt"></layout></span>
</code></pre></div></div>
<p>That’s it - you can switch to your new layout by calling:</p>
<div class="language-sh highlighter-rouge"><div class="highlight"><pre class="highlight"><code>setxkbmap fl
</code></pre></div></div>
<h2 id="links">Links:</h2>
<ul>
<li><a href="https://github.com/lipanski/dotfiles/blob/master/usr/share/X11/xkb/symbols/fl">The keyboard layout file from my dotfiles</a></li>
<li><a href="https://askubuntu.com/questions/510024/what-are-the-steps-needed-to-create-new-keyboard-layout-on-ubuntu">https://askubuntu.com/questions/510024/what-are-the-steps-needed-to-create-new-keyboard-layout-on-ubuntu</a></li>
</ul>Linux: How to make your own keyboard layout