A collection of ramblings on computing, and other memorabilia
✍️Write rieview ✍️Rezension schreiben 🏷️Get Badge! 🏷️Abzeichen holen! ⚙️Edit entry ⚙️Eintrag bearbeiten 📰News 📰Neuigkeiten
Having procrastinated long enough, I’ve started reading the SICP as part of my December Adventure. The following ramblings represent my day-to-day progress – however much that may be – sprinkled with other thoughts.
Decided I’d kick this process off with (unsuspecting) passengers, and brought my laptop to the library – I would not dare ruin my physical copy in the trip to and fro – only to become massively distracted, as I’m wont to be, by the promise of doing something else. Still, I read most of Chapter 1, which is largely a discussion on Scheme and programming basics.
I left a few hours later, dejected but determined to do better the next day. First thing: no laptop, I’ll use the physical book or a copy on my e-book reader. Second thing, I’ll try to do the exercises on my phone – a BlackBerry Q20 – and to that end spent a few hours later that night compiling Chicken Scheme 5.4.0 for QNX. It works!
Arrived at the library in better spirits, and bereft of anything that can reasonably connect to the Internet. Finished reading through Chapter 1, and part way through Chapter 2.
Section 1.1.7 discusses Newton’s Method for finding square roots, and provides a definition of the mathematical function like so:
√x = the 'y' such that 'y ≥ 0' and 'y² = x'
As noted further down, the definition does not describe a procedure, but simply the properties of the thing in question; in other words, imperative knowledge vs. declarative knowledge.
However, programming languages in general don’t support concepts of equality-as-identity, which the above seems to denote, only of equality-as-(left-associative)-assignment (unless you look at specialized constraint-solving systems such as miniKanren). In a system such as that, the above example might be enough at calculating square roots for any x
!
It’s getting interesting now…
Not much reading done today – hosted a friend from Totnes – but managed to implement PDF thumbnail previews for slidge-whatsapp instead; had the work knocking about in my head for a while. Yay for distractions!
Lots of volunteering obligations these days, and not much space for focus. Working through some exercises from Chapter 1 on the go, however!
Whoa, that’s a lot of time. Spent the week seeing a friend in Devon, and as such didn’t have much computer-time; worked through exercises in Chapter 1 again, but losing track of all the parentheses. Worked on a bit of writing for the site, though.
Continued working through exercises for Chapters 1 and 2 – at this rate, I’ll still be on Chapter 2 by the end of the month. That said, I did get some new sections on writings and movie collections that are semi-related to the overarching project of being productive. Maybe the SICP is a slow burner. ☺️
Ooof, I’m not very good at this thing, am I? Most of the second half of December flew fast, with me only doing small bits of reading; I did, however, manage to spruce up content on the wiki, implented a “Now Playing” widget on the home-page, and a OpenRing-seque webring on posts (like this one – scroll to the bottom!). I’ll make separate posts for all these in time.
This isn’t the end, of course, and my adventures with SICP will continue into 2025. See you next year!
1.12.2024 00:00December Adventure 2024Some time around 20191, Cloudflare introduced its first version of API Tokens, capable of being scoped to granular levels of access for specific actions (e.g. listing, reading, deleting) against specific resources (e.g. accounts, zones, DNS records).
The underlying authorization system providing these granular access capabilities was mostly built out by yours truly over the years, from early 2018 and until my move away from the Cloudflare IAM team in 2022; credit for the initial design, however, goes to Thomas Hill, who established the data model of subject/action/scope, described in more detail below.
Though the system has evolved in both its user-facing aspects and its scale, the basic principles and overarching design of the authorization system (as well as most of the core code) is still much the same as it was back in 2019; a testament to the robustness and flexibility of the design.
Since then, a number of other designs have also emerged, chief of which is Google’s much-vaunted Zanzibar; these systems are complex enough to merit their own separate posts, but suffice it to say that they approach their design quite differently to ours.
At the core of any authorization system lies a question, for example:
Can Alice update DNS Record X in Zone Y?
Each part of this sentence has some significance toward the (yes/no) answer we might receive, and each represents some part of our final data-model, though different systems find different ways of answering questions like this.
Let’s start from Alice, our actor, or subject (we’ll prefer using “subject” further on): any request for authorization is assumed to pertain to an action being taken against some protected system, and any action is assumed to originate somewhere, whether that’s a (human) user, or an (automated) system, or anything in between. Moreover, the subject is assumed to provably be who they say they are, that is, be authenticated, before any authorization request is even made, lest we allow subjects to impersonate one another.
Actions, such as “update” above, are typically performed against resources, such as “DNS record X”, both of which pertain against one another in some way; it makes no sense to try to “update a door”, as much as it doesn’t make sense to “open a DNS record”. Resources, in turn, are also identified uniquely (the “X” above) and can furthermore be qualified by context, or “scope”, the “zone Y” above.
A more generalized form of the question above would, then, be:
Can Subject perform Action against Resource with ID, under a specific Scope with ID?
The concepts of subject, action, resource, and scope form the majority of our data-model, with only a sprinkle of organizing parts in between. Before we look at each of these in turn, let’s examine a common thread that exists between them, the idea of a “resource taxonomy”.
How do we determine what makes for a valid authorization request, and how do we ensure that authorization policies are consistent with the sorts of (valid) questions we expect to receive?
Subjects, actions, resources, and their scopes exist in a universe of (expanding) possibilities relevant to their use, and relate to one another intimately.
Different systems solve these issues in different ways, but a centralized system does benefit from solid definitions of what can and cannot be given access to; at Cloudflare, this culminated in a resource taxonomy, an organized hierarchy of possible “things” present in the system, driven by a reverse DNS naming convention, for instance:
com.cloudflare.api.account.zone.dns-record
Which represents an (abstract) DNS record placed under a zone.
Though names appear to contain their full hierarchies (and in some cases do, as DNS records belong to zones, which themselves belong to accounts, with predictable resource naming patterns), they have no real semantics beyond needing to be unique – we could’ve just as well used zone-dns-record
as a unique name, though the reverse DNS convention does come in handy when expressing actions and resource identities.
In our case, the (partial) resource hierarchy of relevance looks like so:
com.cloudflare.api.account
com.cloudflare.api.account.zone
com.cloudflare.api.account.zone.dns-record
com.cloudflare.api.user
com.cloudflare.api.token
If resources have stable references, then actions against those resources also need some sort of stable reference – in our case, we just extend the existing naming convention, adding a verb suffix to resource names, for example:
com.cloudflare.api.account.zone.dns-record.update
Our naming convention is only, thus far, capable of referring to resources in the abstract, and requires a way of specifying which specific resource we’re attempting to give access to; this is accomplished, once again, by adding a suffix to the resource name, this time, an (opaque) unique identifier, e.g.:
com.cloudflare.api.account.zone.dns-record.5d32efec
The 5d32efec
suffix refers to an identifier known by the source-of-truth2 for DNS records, and is otherwise opaque to the authorization system – any unique sequence of characters would do.
Putting this all together, you might be able to restate our perennial question above like so (omitting the com.cloudflare.api
prefix for brevity):
Can
user.3cf2e98a
doaccount.zone.dns-record.update
againstaccount.zone.dns-record.845cf6a7
, under scopeaccount.zone.5ab65c35
?
Phew, that’s a mouthful.
The point of making sure there’s common understanding on what kinds of things can be given access to relates to how the authorization system is designed as being fundamentally data-agnostic and completely independent from other systems asking questions of it. Two rules play into this determination:
Authorization policies are the sole property of the authorization system; no other service has any access to them. All other systems can do is ask whether or not access is allowed for a given resource/action, with a yes/no answer.
The authorization system cannot and does not ensure the correctness of authorization policies, beyond its own semantics.
There is no guarantee that zone X belongs to account Y for a corresponding resource/scope relationship, nor is there any guarantee that DNS record 845cf6a7
is a valid ID for that resource; only the systems-of-record can ensure these invariants.
The assumption, then, is as follows: systems asking for access against a specific resource (e.g. a DNS record) are likely in a good place to ensure the IDs they provide are valid; furthermore, they’re also likely in a good place to know the hierarchy of resources, specifically their immediate parents (e.g. the zone and account) to provide as scopes.
It is through this decision to decouple resource identity from taxonomy (which remains part of the authorization system, mainly for validation purposes) that has made the system as flexible and long-lasting as it has been.
This business about resources and taxonomies doesn’t actually bring us much closer to answering questions about access in our authorization system; how do we do that?
Firstly, we need to look at the originators of actions, our so-called subjects. As alluded to in the example above, subjects in our system are identified by their unique, fully-qualified resource identifier, e.g.:
com.cloudflare.api.user.3cf2e98a
Which represents a (presumably human) user with ID 3cf2e98a
. There’s nothing else our authorization system needs to know about the subject – remember, subjects are assumed to have already authenticated at a level prior to asking about access, typically by a different system, which would then produce a signed token of some kind (e.g. a JWT), which would then be provided as context to requests made against the authorization system.
Access for subjects is expressed as a collection of “policies”, themselves collections of action and resource identifiers. An example pseudo-policy that would fulfill access for previous examples might look like this:
subject: com.cloudflare.api.user.3cf2e98a
policies:
- actions:
- key: com.cloudflare.api.account.zone.dns-record.update
resources:
- key: com.cloudflare.api.account.zone.dns-record.845cf6a7
scopes:
- key: com.cloudflare.api.account.zone.5ab65c35
Resolving access then becomes a simple matter of traversing assigned policies, and applying the following criteria for each:
Check if actions
list contains the action requested.
Check if resources
list contains the resource requested. If a matching resource contains a list of scopes, check that the request contains matching scope names, ignoring any additional scopes given in the request.
If any policy in the list matches all criteria, then access is allowed.
Allowing access to specific resources by ID is all fine and well, until you have to provide access to all DNS records across all zones in an account, including any future DNS records. Rather than putting the burden of updating policies onto humans (or worse, some automated system somewhere), we need a way to allow access to classes of things, all at once.
Turns out our reverse DNS naming convention fits this use-case well; rather than using a concrete identifier as a resource suffix, we can simply use an asterisk to denote a partial wildcard, for instance:
com.cloudflare.api.account.zone.dns-record.*
Use of wildcards here does not intend to denote any kind of lexical matching of IDs – that is to say, you couldn’t really use dns-record.5c*
to match resources with IDs starting with 5c
, as identifiers are opaque as far as the authorization system is concerned.
Rather, the use of an asterisk denotes access to all resources of that kind, but also introduces additional constraints on our policies, namely the mandatory use of a scope, lest we want to provide access to all DNS records anywhere.
A modified policy giving access to all DNS records for our example zone would then look like so:
subject: com.cloudflare.api.user.3cf2e98a
policies:
- actions:
- key: com.cloudflare.api.account.zone.dns-record.update
resources:
- key: com.cloudflare.api.account.zone.dns-record.*
scopes:
- key: com.cloudflare.api.account.zone.5ab65c35
Similar to partial wildcards, we can also specify a standalone asterisk (i.e. a *
without a resource name) as a “catch-all wildcard”, denoting access to all resources under a scope.
As an extra rule, we might define that catch-all wildcard access includes its top-most scope as if it were the resource itself, if this is a fully-qualified or partial wildcard resource. That is, the following policy:
subject: com.cloudflare.api.user.3cf2e98a
policies:
- actions:
- key: com.cloudflare.api.account.zone.read
- key: com.cloudflare.api.account.zone.dns-record.update
resources:
- key: *
scopes:
- key: com.cloudflare.api.account.zone.5ab65c35
Will allow access for a check against action zone.read
and resource zone.5ab65c35
. Nevertheless, wildcard matching is only available at the resource level; scopes provided in policies are always assumed to match directly, with no wildcard semantics.
So far, our policies have been exclusively aimed at giving unequivocal access to resources. There are times, however, where we might want to throw a fence around things, lest we have the hoi polloi trample on our metaphorical petunias.
Doing so with our system as described is deceptively simple, though some complexity lurks underneath the surface. Let’s first see how we might, for example, give access to all DNS records in a zone, except for a single specific record:
subject: com.cloudflare.api.user.3cf2e98a
policies:
- access: allow
actions:
- key: com.cloudflare.api.account.zone.dns-record.update
resources:
- key: com.cloudflare.api.account.zone.dns-record.*
scopes:
- key: com.cloudflare.api.account.zone.5ab65c35
- access: deny
actions:
- key: com.cloudflare.api.account.zone.dns-record.update
resources:
- key: com.cloudflare.api.account.zone.dns-record.65caf35c
scopes:
- key: com.cloudflare.api.account.zone.5ab65c35
In case you missed it, the addition of an access
field with allow
or deny
values denotes which stance a policy takes, and whether a full match will mean access is allowed or denied.
Herein the problems begin: requesting access for dns-record.65caf35c
under zone.5ab65c35
will have both policies match, one to allow since we’re given access to all DNS records for the zone, and one to deny, since we’re denied access to the specific DNS record.
How do we resolve this conflict?
We could determine which policy “wins” by just taking the last decision made (i.e. the last policy in the list) as final; that, however, would put the onus of ensuring that policies are ordered the right way on users, with potentially catastrophic consequences if they are not.
We must therefore assume the intentions of our users – why would anyone take away access, only to give it back immediately? Clearly the opposite must always be true (especially since no access at all is the default): deny policies always trump allow policies, if the two overlap.
We can, and will, however, further elaborate on this rule, as the specificity of matching matters as well; direct matches against specific resources trump partial wildcard matches, which trump catch-all wildcard matches. It should, then, be possible to say the following:
Allow access to all resources under account X, but deny access to all resources under zone Y (including the zone itself), except for DNS records, but not including DNS record Z.
Which we might translate into the following policy representation:
subject: com.cloudflare.api.user.3cf2e98a
policies:
- access: allow
actions:
- key: com.cloudflare.api.account.zone.read
- key: com.cloudflare.api.account.zone.dns-record.update
resources:
- key: *
scopes:
- key: com.cloudflare.api.account.9cfe45ac
- access: deny
actions:
- key: com.cloudflare.api.account.zone.read
- key: com.cloudflare.api.account.zone.dns-record.update
resources:
- key: *
scopes:
- key: com.cloudflare.api.account.zone.5ab65c35
- key: com.cloudflare.api.account.9cfe45ac
- access: allow
actions:
- key: com.cloudflare.api.account.zone.dns-record.update
resources:
- key: com.cloudflare.api.account.zone.dns-record.*
scopes:
- key: com.cloudflare.api.account.zone.5ab65c35
- key: com.cloudflare.api.account.9cfe45ac
- access: deny
actions:
- key: com.cloudflare.api.account.zone.dns-record.update
resources:
- key: com.cloudflare.api.account.zone.dns-record.65caf35c
scopes:
- key: com.cloudflare.api.account.zone.5ab65c35
- key: com.cloudflare.api.account.9cfe45ac
Of course, policies this complex are fairly rare, but they do exist, and catering to these requirements is important in a system that purports to be as flexible as possible.
So far, we’ve been providing lists of actions and resources as direct references, but doing so in real life would be incredibly onerous (especially given the large number of options available to us). The solution to this is – you guessed it – normalization, or in other words, grouping things under a unique name we can refer to.
For actions, we can form action groups, or as they’re sometimes (and perhaps confusingly) called, “roles”. These would represent collections of possible actions available to a subject in the abstract, not tied to any specific resource or scope. For instance, an example “DNS Administrator” action group might look like this:
id: 9aff84ac
name: DNS Administrator
actions:
- key: com.cloudflare.api.account.zone.dns-record.read
- key: com.cloudflare.api.account.zone.dns-record.create
- key: com.cloudflare.api.account.zone.dns-record.update
- key: com.cloudflare.api.account.zone.dns-record.delete
Similarly, resource definitions can benefit from being named and referenced separately, for instance:
id: fd25a5dd
name: Production Zones
resources:
- key: com.cloudflare.api.account.zone.2acf325f
scopes:
- key: com.cloudflare.api.account.6afe524a
- key: com.cloudflare.api.account.zone.33cfade6
scopes:
- key: com.cloudflare.api.account.6afe524a
One might then assign these action groups and resource groups to a policy under a separate action_groups
and resource_groups
field respetively, to the exclusion of an actions
and resources
fields for the policy, e.g.:
subject: com.cloudflare.api.user.3cf2e98a
policies:
- access: allow
action_groups:
- id: 9aff84ac # DNS Administrator
resource_groups:
- id: fd25a5dd # Production Zones
It is assumed that these action and resource groups are managed separately, and can be re-used as needed by the owning user (which also implies that access to these resources is also governed by the authorization system, which needs to control access from itself, a fun exercise in recursion).
This post is not so much a guide of what the current (or really, past) state of the production system is at Cloudflare; a number of omissions and simplifications have been made to save on making this a rambling epic.
Rather, it is a high-level overview of the design thinking behind a production system that is part of every single request made to the public Cloudflare API (and a few more still), and which hopefully serves as a map for others looking to build similar systems, the core idea being: authorization is about ensuring the basic questions being asked can be answered as quickly and unambiguously as possible.
Typically, this means a lot of work goes into policy semantics, and for us, this meant building out a resource taxonomy in order to balance the data-agnostic nature of the system with the need to ensure that policies are always correctly formulated.
There is still a lot of ground left to cover, however, first of all being ABAC. I’ll leave that for a future post, then.
Specifically, August 2019, though the beta was opened a couple of months earlier. ↩︎
In other words, the service or component responsible for handling DNS records. ↩︎
Truly robust systems are primarily achieved through a combination of code reuse and consistent API design; these inform, and are informed by, the broader developer experience (e.g. team structures, workflows, etc.)
The degree in which code reuse can be achieved depends intrinsically on the cohesiveness of developer practice. Teams that use different architectures, programming languages, development approaches will find it all the harder to reuse code except at the edges of their practice.
Code reuse and consistent API design, and therefore the broader developer experience, are fundamentally economies of scale. Production systems rarely exist in a vaccuum, and thus the true robustness of a system is a function of all its dependencies; similarly, bugs in shared code are (potentially) experienced together, but are also solved together and for all consumers of a system.
Code reuse exists mainly in the space between integration and use, or in other terms, boilerplate and business logic. Boilerplate is assumed to be largely difficult to be made cohesive across systems, and business logic is assumed to inherently be what makes a system unique to its own needs.
These conceptions place artificial limits on the degree of code reuse even within individual systems, and solving this condrum requires rethinking of the definitions of boilerplate code, library code, and business logic, as well as the divisions between them.
All of these above principles exist both at the macro level, i.e. in the way disparate teams might participate in a service-oriented-architecture, but also at the micro level, i.e. in the way any given service is architected within a team, whether or not any external team or code dependencies are implied.
Serving Go modules off a custom domain (such as go.deuill.org
) requires only that you’re able to
serve static HTML for the import paths involved; as simple as this sounds, finding salient
information on the mechanics involved can be quite hard.
This post serves as a soft description into the mechanics of how go get
resolves modules over
HTTP, and walks through setting up Hugo (a static site generator) for serving Go modules off a
custom domain.
Why would anyone want to do this? Aside from indulging one’s vanities, custom Go module paths can potentially be more memorable, and control over these means you can move between code hosts without any user-facing disruption.
At a basic level, go get
resolves module paths to their underlying code by making HTTP calls
against the module path involved; responses are expected to be in HTML format, and must contain a
go-import
meta tag pointing to the code repository for a supported version control system, e.g.:
<meta name="go-import" content="go.example.com/example-module git https://github.com/deuill/example-module.git">
In this case, the go-import
tag specifies a module with name (or, technically, the “import
prefix”, as we’ll find out in later sections), go.example.com/example-module
, which can be
retrieved from https://github.com/deuill/example-module.git
via Git.
Go supports a number of version control systems, including Git, Mercurial,
Subversion, Fossil, and Bazaar; nevertheless, and assuming the meta tag is well-formed, go get
will then continue to pull code for the repository pointed to, for the VCS chosen.
Thus, with this content served over https://go.example.com/example-module
, we can then make our
go get
call and see code pulled from Github:
$ go get -v go.example.com/example-module
go: downloading go.example.com/example-module v0.0.0-20230325162624-6da6d8c20f04
…
The repository pulled must be a valid Go module – that is, a repository containing a valid go.mod
file – for it to resolve correctly; this file must also contain a module
directive that has the
same name as the one being pulled.
…that’s pretty much it!
It is, of course, quite plausible that we can just stuff static HTML files in our web root and be done with it, but where’s the fun in that? Furthermore, Hugo allows us to create complex page hierarchies using simple directives, which is definitely of use to us here.
In order to get a workable Go module host set up in Hugo, we need two things: content items, each representing a Go module, and a set of templates to render these as needed.
First, we’ll need to set up a new site with Hugo:
$ hugo new site go.deuill.org
This will set up a fairly comprehensive skeleton for our Go module host, but will need some love
before it’s anywhere near useful. First, add some basic, site-wide configuration – open hugo.toml
and set your baseURL
and title
to whatever values you want, e.g.:
# hugo.toml
baseURL = 'https://go.deuill.org'
languageCode = 'en-us'
title = 'Go Modules on go.deuill.org'
Next, let’s add a basic module skeleton as a piece of content – a Markdown file in the content
directory named example-module.md
. This file doesn’t actually need to contain any content per se:
rather, we’ll be looking to use Hugo front matter, i.e. page metadata, to
describe our modules (though of course the extent to which we take rendering content is entirely up
to us):
# content/example-module.md
---
title: Example Module
description: An example Go module with a custom import path
---
Since Hugo doesn’t create any default templates for rendering content pages, we’ll need to create some basic ones ourselves; at a minimum, we’ll need a page for the content itself, but ideally we’d also want to be able to see all modules available on the host.
Let’s tackle both in turn – for content-specific rendering, we’ll need to add a template file in
layouts/_default/single.html
. The content can be quite minimal for the moment:
<!-- layouts/_default/single.html -->
<!doctype html>
<html lang="{{.Site.LanguageCode}}">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>{{.Title}} - {{.Site.Title}}</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body>
<main>
<h1>{{.Title}}</h1>
<aside>{{.Description}}</aside>
</main>
</body>
</html>
Similarly, our home-page template is located in layouts/index.html
, and looks like this:
<!-- layouts/index.html -->
<!doctype html>
<html lang="{{.Site.LanguageCode}}">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>Go Modules - {{.Site.Title}}</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
</head>
<body>
<main>
<h1>{{.Site.Title}}</h1>
{{range .Site.RegularPages}}
<section>
<h2><a href="{{.Permalink}}">{{.Title}}</a></h2>
<aside>{{.Description}}</aside>
</section>
{{end}}
</main>
</body>
</html>
So far, so good – if you’ve been following along here, you should, by this point, have a fairly
brutalist listing of one solitary Go module, itself not quite ready for consumption by go get
as-of-right-now.
Serving a Go module is, as shown above, a simple matter of rendering a valid go-import
tag for the
same URL pointed to by the import path itself. Rendering a go-import
tag requires three pieces of
data, at a minimum:
go.example.com/example-module
).git
or hg
.https://github.com/deuill/example-module
.Hugo supports adding arbitrary key-values in content front matter, which can then be accessed in
templates via the {{.Params}}
mapping. Simply enough, we can extend our content file
example-module.md
with the following values:
# content/example-module.md
---
title: Example Module
description: An example Go module with a custom import path
+module:
+ path: go.example.com/example-module
+repository:
+ type: git
+ url: https://git.deuill.org/deuill/example-module.git
---
Producing a valid go-import
tag is, then, just a matter of referring to these values in the
content-specific layout, layouts/_default/single.html
; we can also render them out on-page to make
things slightly more intuitive for human visitors, e.g.:
<!-- layouts/_default/single.html -->
<!doctype html>
<html lang="{{.Site.LanguageCode}}">
<head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>{{.Title}} - {{.Site.Title}}</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
+ <meta name="go-import" content="{{.Params.module.path}} {{.Params.repository.type}} {{.Params.repository.url}}">
</head>
<body>
<main>
<h1>{{.Title}}</h1>
<aside>{{.Description}}</aside>
+ <dl>
+ <dt>Install</dt>
+ <dd><pre>go install {{.Params.module.path}}@latest</pre></dd>
+ <dt>Documentation</dt>
+ <dd><a href="https://pkg.go.dev/{{.Params.module.path}}">https://pkg.go.dev/{{.Params.module.path}}</a></dd>
+ </dl>
</main>
</body>
</html>
Ship it! This setup is sufficient in serving multiple Go modules, each in their own directories, and with a functional, albeit somewhat retro, human-facing interface.
Most modules would be well-served by this setup, but it does assume that the Go module is placed
at the repository root; based on what we know about go get
, the import path needs to resolve to an
HTML file containing a valid go-import
tag, and that import path needs to match the resulting
go.mod
file.
The module path in the go-import
tag can, therefore, be assumed to also be always equal to the
import path; closer reading of the official Go documentation reveals that this
is instead the import prefix, relating to the repository root and not necessarily to any of the Go
modules placed within.
Furthermore, the full path hierarchy must present the same go-import
tag in order to resolve
correctly with go get
. Clearly there’s some headaches to be had.
To better illustrate the issue, let’s assume the following repository structure, containing a number of files in the repository root, as well as a Go module in one of the sub-folders, e.g.:
├── .git
│ └── ...
├── LICENSE
├── README.md
└── thing
├── go.mod
├── go.sum
├── main.go
└── README.md
The full import path for this module would be go.example.com/example-module/thing
, which is also
what the module
directive would be set to in the go.mod
file; however, the import prefix
presented in the go-import
tag needs to be set to go.example.com/example-module
.
Given this conundrum, it is not enough to simply set a different module.path
in the content file
for example-module.md
, or even create a separate example-module/thing.md
content file – we need
to ensure that the full hierarchy resolves to a valid HTML file containing a valid go-import
tag,
that, crucially, always points to the import prefix of go.example.com/example-module
.
Turns out that Hugo has yet a few tricks up its sleeve, and can assist us in setting up this complex content hierarchy using a single content file, the trick being content aliases.
Aliases are commonly intended with redirecting alternative URLs to some canonical URL via
client-side redirects (using the http-equiv="refresh"
meta tag); for our use-case, we’ll need to
slightly extend the underlying templates and render a valid go-import
tag alongside the
http-equiv
tag.
First, let’s add the sub-path corresponding to the Go module as an alias in our existing
example-module.md
content file:
# content/example-module.md
---
title: Example Module
description: An example Go module with a custom import path
module:
path: go.example.com/example-module
repository:
type: git
url: https://git.deuill.org/deuill/example-module.git
+aliases:
+ - /example-module/thing
---
Navigating to go.example.com/example-module/thing
will, then, render a default page containing a
minimal amount of HTML, as well as the aforementioned http-equiv
meta tag. We can extend that for
our own purposes by adding a layout file in layouts/alias.html
, e.g.:
<!-- layouts/alias.html -->
<!DOCTYPE html>
<html lang="en-us">
<head>
<title>{{.Page.Title}}</title>
<link rel="canonical" href="{{.Permalink}}">
<meta name="robots" content="noindex">
<meta charset="utf-8">
<meta http-equiv="refresh" content="5; url={{.Permalink}}">
<meta name="go-import" content="{{.Page.Params.module.path}} {{.Page.Params.repository.type}} {{.Page.Params.repository.url}}">
</head>
<body>
<main>This is a sub-package for the <code>{{.Page.Params.module.path}}</code> module, redirecting in 5 seconds.</main>
</body>
</html>
Crucially, this alias has full access to our custom front matter parameters, making integration of additional sub-modules a simple matter of adding an alias in the base content file.
Since our import paths still expect the fully formed path (and not just the prefix), our
human-readable code examples rendered for specific modules will be incorrect in the face of
sub-packages; solving for this is an exercise left to the reader (hint hint: add a sub
field in
front matter, render it in the go get
example via {{.Params.module.sub}}
).
If you’re not a fan of the brutalist aesthetic, and would like to waste your human visitors' precious bandwidth with such frivolities as colors, you could spice things up with a bit of CSS. A little goes a long way:
/* static/main.css */
html {
background: #fefefe;
color: #333;
}
body {
margin: 0;
padding: 0;
}
main {
margin: 0 auto;
max-width: 50rem;
}
a {
color: #000;
border-bottom: 0.2rem solid #c82829;
padding: 0.25rem 0.1rem;
text-decoration: none;
}
a:hover {
color: #c82829;
}
dl dt {
font-size: 1.2rem;
font-weight: bold;
}
dl dd pre,
dl dd a {
display: inline-block;
}
pre {
background: #f0f0f0;
color: #333;
padding: 0.4rem 0.5rem;
overflow: auto;
margin-bottom: 1rem;
white-space: nowrap;
}
Add this to static/main.css
and link to it from within layouts/_default/single.html
and
layouts/index.html
.
The setup described here is directly inspired by and used for hosting my own Go modules, under go.deuill.org. The source-code for the site is available here.
12.8.2023 21:00Serving Go Modules with HugoI recently re-read through Michael Arntzenius’ excellent list of Aphorisms on Programming Language Design, and a specific point caught my eye:
21. “Declarative” means you can use it without knowing what it’s doing.
All too often, it means you can’t tell what it’s doing, either.
It is, in hindsight, quite obvious – any knowledge of how a declaration is processed implies knowledge of the context that declaration will be used in, be it a configuration file, a SQL query, a programming language REPL, and so on. That is to say, context matters, but more importantly, knowledge of the context matters even more.
It seems to me that this maxim can be applied beyond what we think of as strictly “declarative programming languages”.
Though programming languages differ in their modes of expression, the act of writing in a programming language is, itself, always declarative. This, too, seems obvious in hindsight; regardless of whether you’re writing in C or SQL, you’re not telling the computer what to do, but rather how it should be done, or what the outcome should be. Furthermore, both of these exercises requires fore-knowledge of the context: in the former case, knowledge of the C language and libraries, in the latter, knowledge of SQL and, for some use-cases, how the query planner works.
Of course, things are made easier by the fact that the behaviour of both C and SQL is covered by specifications, and both operate in fairly standard ways regardless of which compiler or database engine is used. On the other hand, though one might readily understand the formatting rules of any INI, JSON, or YAML file, the same cannot be said about how these are used once parsed – it heavily depends on the context. When it comes to DSLs, all bets are off.
How are we, then, to build systems that minimize this sort of “external” knowledge, or that otherwise make the least amount of assumptions in defining what is explicit and what is implicit?
A few things come to mind:
Use prior art or knowledge. One can leap-frog years of experience-building by utilizing pre-existing patterns, especially where these don’t form core competencies or differentiators.
Don’t override the meaning of existing conventions, especially in isolation. Different approaches to design should look (and feel) different.
Make things that look similar also behave in similar ways; a symbol used to mean one thing in a specific context should ideally not be re-used for different semantics in a different context.
This only seems to confirm another of Michael’s Aphorisms:
13.4.2023 17:30External Knowledge in Declarative Systems19. Syntax is a pain in the ass.
Fedora CoreOS has saved me. Believe me, it’s true – I was but a poor, lost soul, flitting through the hallways of self-hosting redemption that have seen so many a young folk driven to madness – all of this changed approximately a year ago, when I moved my home-server over to CoreOS, never (yet) to look back.
The journey that led me here was long and perilous, and filled with false twists and turns. The years from 2012 to 2017 were a simpler time, a mix of Ubuntu and services running directly on bare metal; so recent this was, that it might be atavism. Alas, this simplicity belied operational complexities and led to an unrelenting accumulation of cruft, so the years from 2017 to 2021 had me see the True Light of Kubernetes, albeit under a more minimal single-node setup with Minikube.
My early days with Kubernetes were carefree and filled with starry-eyed promises of a truly declarative future, so much so that I in turn declared my commitment to the world. It wasn’t long after until the rot set in, spurred by a number of issues, for example: Minikube will apparently set up local TLS certificates with a year’s expiration, after which kubectl
will refuse to manage resources on the cluster, and which might cause the cluster to go belly up in the case of a reboot. And even with Kubernetes managing workloads, one still needs to have a way of setting up the host and cluster, for which there’s a myriad of self-proclaimed panaceas out there.
Clearly, the answer to complexity is even more complexity: simply sprinkle some Ansible on top and you’ve got yourself a stew. And to think there was a time where I entertained such harebrained notions.
At first, it was a twinkle, a passing glance. Fedora CoreOS doesn’t feature as large in the minds of those of us practicing the Dark Arts of Self-Hosting (though I’m hoping this changes as of this post), and is relegated to being marketed as experimental, nascent. Nothing could be further from the truth.
The pillars on which the CoreOS temple is built are three, each playing a complementary role in what makes the present gushing appropriate reading material:
Butane/Ignition, in which our host can be set up in a declarative manner and in a way which allows mere mortals such as myself to comprehend. The spec is short, read it.
Podman, in which containerized workloads are run. For many, Podman is simply a new, drop-in replacement for Docker, but it can be much more than that.
systemd, which needs little introduction, and in which all of our disparate orchestration needs are covered; service dependencies, container builds, one-time tasks, recurring tasks, all handled in all of their glorious complexities.
Tying the proverbial knot on top of these aspects is how much the system endeavours to stay out of your way, handling automatic updates and shipping with a rather hefty set of SELinux policies, ensuring that focus remains on the containers themselves.
Before we head into the weeds, let’s try to address why you might even care about working with CoreOS; if anything, a bare-metal host will do well for most simple workloads, and Kubernetes isn’t all that unapproachable for a more complex single-node setup. How does CoreOS differentiate itself from other systems?
I can only really answer this from my own experience, by the main points that make CoreOS a worthwhile investment are:
The system is stable and robust, and is intended to be as hands-off as possible. This generally means you won’t have to worry about the base system itself across its entire life-cycle. One might argue that this is no different to any bare-metal system set up with auto-updates, though I’d personally never have these extend to system upgrades (and perhaps nothing beyond security updates).
The system is reasonably secure, and tries to make user interactions and workloads reasonably secure as well. This sometimes leads to inflexibility, as is the case with SELinux (which, if you’re not familiar with, is hell to try to understand), but the system has its way of keeping the user honest, which is a boon long-term.
The system has a good end-to-end deployment story, and is accompanied by excellent documentation. This generally means that you can rely on well-integrated workflows in testing, deploying, and updating your CoreOS-based system, and not have to resort to strange contortions or third-party/custom solutions in doing so.
Contrast these points with your typical bare-metal or Kubernetes-based setup (which is really just a layer above a bare-metal setup that you need to maintain separately):
Ubuntu and other similar efforts can be stable long-term (I probably stayed on the same LTS version of Ubuntu for 3 years), but this can lead to update stagnation and issues when time comes to move to the next major version of the OS. Most bare-metal OSs are designed to be managed as deeply as is necessary, which can lead to issues if there’s no discipline in change control.
In addition to this, keeping a Kubernetes cluster updated can be a full-time job for many, and even a single-node Minikube/K3s setup is not zero-maintenance by any means, and comes with its own set of perils.
As mentioned above, typical bare-metal setups tend to approach security in a less-than-holistic way, and give users the brunt of choice in deciding how to secure user workloads from the system, and vice-versa. Given how complicated security is, and how one shouldn’t connect a toaster, let alone an email server, to the internet for fear of having their house burn down, leaving these choices to the user may not work out for the long run.
Having user workloads run under Kubernetes improves the situation somewhat, as one is given a multitude of controls designed to separate and secure these from one another (e.g. network policies, CPU and memory limits); however, Kubernetes is also supremely complex, and is itself subject to esoteric security concerns.
Deployment, documentation, and upgrade concerns are typically rather disparate in other systems, and the quality of documentation varies wildly between communities. Kubernetes itself is well-documented, but remains complex and occupies a large surface area not typically needed for a simple home-server setup.
People tend to pick and choose solutions based on what their goals are, and the extent in which they’re comfortable learning about and maintaining these solutions long-term. If you’re looking for a system that is minimal, uses common components, and largely stays out of your way after deployment, CoreOS is a perfect middle-of-the-road solution.
There’s a few things to keep in mind, going into a CoreOS-based setup:
The system is immutable, and you’re expected to use the system as-is, out-of-the-box, and without needing to rely on anything not installed by default. Don’t even think about reaching for rpm-ostree
. In fact, don’t even think of storing anything outside of /var
, and maybe /etc
.
SELinux policies are pre-configured to be fairly restrictive, which means there’s quite a lot of functionality unavailable outside of interactive use – this includes things like using gpg
in systemd services.
Although not in any way unstable, the Podman ecosystem is still moving fast and may not be as feature-complete as one might expect coming from Kubernetes, or even Docker.
CoreOS will auto-update even between major versions, and unless configured otherwise, will reboot as needed when new versions become available. Allowing for system reboots is good hygiene; embrace the chaos.
CoreOS comes with a sizeable amount of documentation of excellent quality which will be useful once you’re ready to get your hands dirty, but the rest of this spiel will instead focus on setting up a system based on the CoreOS Home-Server setup I depend on myself. Clone this locally and play along if you wish, though I’ll cut through the abstractions where possible, and explain the base concepts themselves.
With all that disclaimed and out the way, let’s kickstart this hunk of awesome.
Butane is a specification and related tooling for describing the final state of a new CoreOS-based system, using YAML as the base format; this is then compiled into a JSON-based format and used by a related system, called Ignition. Both systems follow similar semantics, but Butane is what you’ll use as you develop for your host.
Let’s imagine we’re looking to provision a bare-metal server with a with a unique hostname and set up for SSH access via public keys. Our Butane file might look like this:
variant: fcos
version: 1.4.0
passwd:
users:
- name: core
ssh_authorized_keys:
- ecdsa-sha2-nistp521 AAAAE2VjZHNhL...
storage:
files:
- path: /etc/hostname
mode: 0644
contents:
inline: awesome-host
The default non-root user for CoreOS is aptly named core
, so we add our SSH key there for convenience; Butane allows for creating an arbitrary amount of additional users, each with pre-set SSH keys, passwords, group memberships, etc.
In addition, we set our hostname not by any specific mechanism, but simply by creating the appropriate file with specific content – we could, alternatively, provide the path to a local or even a remote file (over HTTP or HTTPS). Simplicity one of Butane’s strengths, and you might find using the same basic set of directives for the vast amount of our requirements.
Place this under host/example/spec.bu
if you’re using the coreos-home-server
setup linked to above, or simply example.bu
if not. Either way, these definitions are sufficient in having Butane produce an Ignition file, which we can then use in provisioning our imaginary CoreOS-based system. First, we need to run the butane
compiler:
$ butane --strict -o example.ign example.bu
Then, we need to boot CoreOS and find a way of getting the example.ign
file there. For bare-metal hosts, booting from physical media might be your first choice – either way, you’ll be dropped into a shell, waiting to install CoreOS based on a given Ignition file.
If you’re developing your Butane configuration on a machine that’s on the same local network as your home-server, you can use good ol’ GNU nc
to serve the file:
# Assuming the local IP address is 192.168.1.5.
$ printf 'HTTP/1.0 200 OK\r\nContent-Length: %d\r\n\r\n%s\n' "$(wc -c < example.ign)" "$(cat example.ign)" | nc -vv -r -l -s 192.168.1.5
This mess of a shell command should print out a message confirming the listening address and random port assignment for our cobbled-together HTTP server. If you’re using coreos-home-server
, all of this is handled by the deploy
target, i.e.:
$ make deploy HOST=example VERBOSE=true
You’re then ready to refer to the HTTP URL for the host Ignition file based on the local IP address and random port assignment over at your live system:
$ sudo coreos-installer install --insecure-ignition --ignition-url http://192.168.1.5:31453 /dev/sda
Assuming all information is correct, you should now be well on your way towards installing CoreOS on the /dev/sda
disk.
So, what do you do once you’re here? In short, nothing – the system, as shown in the above example, has configuration enough to give us SSH access into an otherwise bare system. CoreOS doesn’t come with much functionality other than the minimum needed to support its operations, and when I said the system is immutable, I meant it: you’re not supposed to re-apply Ignition configuration beyond first boot1.
Instead, the blessed way of expanding the functionality of a CoreOS-based server is re-deploying it from scratch; we’ll bend this rule slightly, but it’s important to understand that we’re not intended to tinker too much with the installed system itself, as this would contradict the notion of repeatability built into CoreOS as a whole.
A simpler way of testing our changes is available to us by using virt-install
, as described in this tutorial, or by using the deploy-virtual
target:
$ make deploy-virtual HOST=example VERBOSE=true
This, again, is a major strength of CoreOS – alternative systems require the arrangement of a more complex and disparate set of components, in this case (most likely) something like Vagrant (in addition to, say, Ansible). Virtual hosts don’t only help in developing new integrations, but also allow us to experiment and test against the same versions of the OS that will end up running on the server itself.
Since the base system (deliberately) allows for little flexibility and customization, we have to explore alternative ways of extending functionality; in CoreOS, the blessed way of doing so is via Podman, a container orchestration system similar to (but not based on) Docker.
Typically, containers are presented as either methods of isolating sensitive services from the broader system alongside more traditional methods of software deployment, or as forming their own ecosystem of orchestration “above the metal”, as it were. Indeed, most distributions expect most software to be deployed via their own packaging system, and, at the other side of the spectrum, most Kubernetes cluster deployments don’t care what the underlying distribution is, assuming it fulfils some base requirements.
Fedora CoreOS stands somewhere in the middle, where Podman containers are indeed the sole reasonable method of software deployment, while not entirely divorcing this from the base system.
I had little knowledge of Podman coming into CoreOS; what I knew was that it’s essentially a drop-in replacement for Docker in many respects (including the container definition/Dockerfile
format, technically part of Buildah), but integrates more tightly with Linux-specific features, and does not require a running daemon. This all remains true, and though the Podman ecosystem is still playing catch-up with Docker in a few ways (e.g. container build secrets), it has surpassed Docker in other ways (e.g. the podman generate
and podman play
suite of commands).
Podman and CoreOS will happily work with container images built and pushed to public registries, such as the Docker Hub, but we can also build these images ourselves with podman build
; let’s start from the end here and set up a systemd service for Redis, running in its own container, under a file named redis.service
:
[Unit]
Description=Redis Key-Value Store
[Service]
ExecStart=/bin/podman run --pull=never --replace --name redis localhost/redis:latest
ExecStop=/bin/podman stop --ignore --time 10 redis
ExecStopPost=/bin/podman rm --ignore --force redis
Though far from being a full example conforming to best practices, the above will suffice in showing how systemd and Podman mesh together; a production-ready service would have us use podman generate systemd
or Quadlet (whenever this is integrated into CoreOS).
The service above refers to a localhost/redis
image at the latest
version, and also specifies --pull=never
– this means that the service has no chance of running successfully, as the image referred to does not pre-exist; what we need is a way of having the required container image built before the service runs. What better way to do so than with systemd itself?
With the help of some naming and file-path conventions, we can have a templated systemd service handle container builds for us, and thus ensure container dependencies are maintained via systemd requirement and ordering directives. We’ll need an additional file called container-build@.service
:
[Unit]
Description=Container Build for %I
Wants=network-online.target
After=network-online.target
ConditionPathExists=/etc/coreos-home-server/%i/Containerfile
[Service]
Type=oneshot
ExecStart=/bin/podman build --file /etc/coreos-home-server/%i/Containerfile --tag localhost/%i:latest /etc/coreos-home-server/%i
That final @
in the service file name denotes a templated systemd service, and allows us to use the service against a user-defined suffix, e.g. container-build@redis.service
. We can then use this suffix via the %i
and %I
placeholders, as above (one has special characters escaped, the other is verbatim).
The Containerfile
used by podman build
is, for the most part, your well-trodden Docker container format, though the two systems might not always be at feature parity. In this case, we can cheat and just base on the official Docker image:
FROM docker.io/redis:6.2
The only thing left to do is extend our original redis.service
file we with dependencies for the container-build@redis
service:
[Unit]
Description=Redis Key-Value Store
Wants=container-build@redis.service
After=container-build@redis.service
Getting the files deployed to the host is simply a matter of extending our Butane configuration, i.e.:
variant: fcos
version: 1.4.0
systemd:
units:
- name: container-build@.service
contents: |
[Unit]
...
- name: redis.service
enabled: true
contents: |
[Unit]
...
storage:
files:
- path: /etc/coreos-home-server/redis/Containerfile
mode: 0644
contents:
inline: "FROM docker.io/redis:6.2"
All of this is, of course, also provided in the coreos-home-server
setup itself, albeit in a rather more modular way: service files are themselves placed in dedicated service directories and copied over alongside the container definitions, and can be conditionally enabled by merging the service-specific Butane configuration (e.g. in service/redis/spec.bu
).
You might be wondering why you’d want to go through all this trouble of building images locally, when they’re all nicely available on some third-party platform; typically, the appeal here is the additional control and visibility this confers, but having our container definitions local to the server allows for some additional cool tricks, such as rebuilding container images automatically when the definition changes, via systemd path units:
[Unit]
Description=Container Build Watch for %I
[Path]
PathModified=/etc/coreos-home-server/%i/Containerfile
Unit=container-build@%i.service
Install this as container-build@.path
and you’ll have the corresponding container image rebuilt whenever the container definition changes. You can even tie this up by having the service file depend on the path file, which will ensure the path file is active whenever the service file is used (even transitively via another service):
[Unit]
Description=Container Build for %I
Wants=network-online.target container-build@%i.path
After=network-online.target container-build@%i.path
The new container image will be used next time the systemd service is restarted – Podman has built-in mechanisms for restarting systemd units when new container image versions appear, though this requires that the services are annotated with a corresponding environment variable. See the documentation for podman auto-update
for more information.
If there is one final trick up our sleeve, it’s drop-in units, which allow for partial modification of a concrete systemd unit.
Let’s imagine we wanted our Containerfile
to access some protected resource that uses SSH key authentication for access (e.g. a private Github repository); to do so, we have to provide additional parameters to our podman build
command, as used in our container-build@.service
file.
Since there’s no real way of providing options beyond the templated unit suffix, and using an EnvironmentFile
(or similar) directive means applying the same options to every instance of the templated unit, it might seem that we’d have to do away with our generic container-build
service and use an ExecStartPre
directive in the service unit itself.
Enter drop-in units: if, for the examples above, we create a partial systemd unit file under the container-build@redis.service.d
directory with only the changes we want applied, we’ll get just that, and just for the specific instance of the templated unit (though it’s also possible to apply a drop-in for all instances). Given the following drop-in:
[Unit]
After=sshd-keygen@rsa.service
[Service]
PrivateTmp=true
ExecStartPre=/bin/install -m 0700 -d /tmp/.ssh
ExecStartPre=/bin/install -m 0600 /etc/ssh/ssh_host_rsa_key /tmp/.ssh/id_rsa
ExecStart=
ExecStart=/bin/podman build --volume /tmp/.ssh:/root/.ssh:z --file /etc/coreos-home-server/%i/Containerfile --tag localhost/%i:latest /etc/coreos-home-server/%i
The end-result here would be the same as if we had copied in the PrivateTmp
and ExecStartPre
directives into the original container-build@.service
file, and extended the existing ExecStart
directive with the --volume
option. What’s more, we can further simplify our drop-in by implementing the original ExecStart
directive with expansion in mind:
[Service]
Type=oneshot
ExecStart=/bin/podman build $PODMAN_BUILD_OPTIONS --file /etc/coreos-home-server/%i/Containerfile --tag localhost/%i:latest /etc/coreos-home-server/%i
Which would then have us remove ExecStart
directives from the drop-in, and add an Environment
directive:
[Service]
Environment=PODMAN_BUILD_OPTIONS="--volume /tmp/.ssh:/root/.ssh:z"
If not specified, the $PODMAN_BUILD_OPTIONS
variable will simply expand to an empty string in the original service file, but will expand to our given options for the specific instance covered by the drop-in.
Since CoreOS hosts have stable identities, generated once at boot, we can add the public key in /etc/ssh/ssh_host_rsa_key.pub
to our list of allowed keys in the remote system and have container image builds work reliably throughout the host’s lifetime.
There’s a lot more to cover, and clearly not all unicorns and rainbows – though the base system itself is simple enough, integration between containers makes for interesting challenges.
Future instalments will describe some of these complexities, as well as some of the more esoteric issues encountered. And, assuming it takes me another year to follow up on my promises, you’re welcome to follow along with progress on the repository itself.
This is different to systems such as Ansible, Chef, or Salt, where deployed hosts/minions are generally kept up-to-date in relation to upstream configuration ↩︎
Countless of bytes have been wasted on tutorials and Internet debates (i.e. flame wars) on which Git workflow is the best, when to merge or rebase, how many lines of code per commit make for the best review experience, and so on.
What I’ll attempt to do in this post, in the least amount of bytes possible, is describe a simple, orthogonal Git workflow, designed for projects with a semi-regular release cadence, and built around a pre-release feature freeze.
This is not intended to be the end-all guide to workflow nirvana, but rather a collection of idioms that have been applied successfully throughout the lifecycles of various projects.
Additionally, the final two chapters contain advice and pointers on merging strategies and commit standards. If all that sounds interesting, please read on.
Using Gitflow as a starting point, we simplify certain concepts and entirely discard others. Thus, we specify two main, permanent branches:
master
BranchThe master
branch represents code released to production (for releases with final release tags) and staging (for releases tagged -rc
). We’ll talk more about release tags in a second, but it’s important to understand that the tip of master
should always be pointed to by a tag, -rc
or otherwise.
Figure 1.0 - The master
branch tagged at various points.
Commits are never made against master
directly, but are rather made as part of other branches, and merged into master
when we wish to deploy a new tag. Merge strategies are described below, and apply towards all code moved between branches.
develop
BranchThe develop
branch is the integration point for all new features which will eventually make their way into master
.
Figure 1.1 - The develop
branch, merging back into master
.
As with master
, no commits should be made against develop
directly, but should rather be part of ephemeral feature branches. The use of such branches is described below.
While the above two branches are permanent (i.e. should never be removed), they only serve as integration points for code built in ephemeral, or temporary branches. Of these we have two, each serving a distinctly different purpose, and having different semantics.
feature/XXX_yyy
)Feature branches are where most of the work in a project happens, and are always opened against, and merged back into, the develop
branch. What constitutes a feature is fairly broad, but essentially covers any code that is not a bugfix for an issue that exists in current master
.
Figure 2.0 - Branching and merging feature branches.
Feature branch names follow a naming convention of feature/XXX_yyy
where XXX
refers to the ticket number opened against the work (if any), while yyy
is a short, all-lowercase, dash-separated description of the work done. A (perhaps contrived) example would be:
git checkout develop # This will affect the base branch for our feature.
git pull develop # Always a good idea to branch of the latest changes.
git checkout -b feature/45_implement-flux-capacitor
The rules behind merging of feature branches back into develop
are project-specific, but most teams would have the code go through peer review and possibly a CI pass before merging. However, it is intended that projects implement a semi-regular (or at least predictable) release schedule, in which case features that are intended to appear in the upcoming release will have to be merged into develop
before the feature freeze starts.
Once the feature freeze starts and develop
is merged back into master
and tagged as an -rc
, the team is free to merge feature branches into develop
again.
While most rules are meant to be broken, the ones described above (as loosely defined as they are) fit into the versioning strategies employed, and as such will benefit by being followed as closely as possible.
Certain fixes, however, cannot wait for next release, or are designed to fix breaking issues present in the master
branch. For those, we have the following.
bugfix/XXX_yyy
)Bugfix branches are intended to contain the bare minimum amount of code required for fixing an issue present on the master
branch, and as such are always opened against, and merged back into master
.
Figure 2.1 - Merging bugfix branches between master
and develop
.
Naming conventions and code acceptance rules are identical to those for feature branches, apart for the bugfix/
prefix applied. Bugfixes are not subject to feature freezes or release schedules.
For bugs that appear on both master
and develop
, the bugfix branch may, optionally, be merged into develop
as well, which has the additional benefit of reducing divergence between the two branches. Why does this matter? Read below.
So now you have a bunch of code on develop
waiting to be released. How do we go about doing that? Imagine the following, two-week (i.e. ten working day) release schedule.
Cycle starts, with feature development commencing immediately. Features are opened against develop
, peer-reviewed, tested, and eventually merged back into develop
according to the release manager/team lead/maintainer’s directions. Large features ready for merging during the end of the window may be left un-merged in order to better test and/or avoid any latent issues.
Merge window closes, with any features left unmerged making their way into next cycle’s release. This is also called a “feature freeze”.
The develop
branch is merged into master
, and a -rc
tag corresponding to the next feature version is opened. So, for instance, if the last version tagged against master
was v.2.9.3
, this tag is to be v.2.10.0-rc1
. This tag is then pushed to a staging server and tested by all means available.
Any bugs we inevitably find are fixed in bugfix branches opened against master
, and merged as soon as the fixes have been verified on the branches themselves. A subsequent -rc
(i.e. v.2.10.0-rc2
) release is tagged whenever we wish to push a new, fixed version to staging.
Release day! Hopefully we’ve had enough time to thoroughly test the new version, and as such are ready to tag and push a final version of master
, v.2.10.0
, to production. We make another round of testing on production and get ready for the next cycle (or release drinks).
We will eventually find bugs in production that weren’t uncovered by our testing on either feature branches or master
. The strategy we follow differs slightly depending on which phase of the next cycle we’re on.
As master
is still in a pristine state, merging bugfixes back into master
is a simple matter of opening a bugfix branch, merging that in, and tagging a new bugfix release version (e.g. v.2.10.1
for the above example) as soon as we’re ready to push to production.
Figure 3.0 - Tagging a bugfix before the feature freeze.
The situation is slightly complicated by the fact that master
now contains code that we’re not ready to release to production, and as such cannot be tagged directly. However, the workflow for opening a bugfix branch remains the same, as the issue will most likely exist in master
, even with the additions from develop
.
The most elegant way of solving this issue is opening a new “release” branch against the last stable tag, which will serve as the integration point for all relevant bugfix branches. The naming convention we’ll use for this branch is the major version for the release we’re branching off, i.e. v.2.10
.
Once the branch has been created, we’re free to merge in all relevant bugfix branches, test locally, and tag the new version against this branch.
Figure 3.1 - Tagging a bugfix after feature freeze.
This is the only case where a branch other than master
is tagged, and as such constitutes a extraordinary measure.
Both situations above require us to tag new release versions. Normally, we’d tag an initial -rc
version, after which we’d push to staging and test. Whether this is necessary or not for bugfixes is debatable, and is left to be decided on a case-by-case basis. However, the convention of -rc
to staging, final version to production, remains constant in all situations.
Countless debates exist on rebasing vs. merging, and whether to squash commits or not. Realizing that, in most cases, personal preference plays the largest role in choosing a strategy, the following sections may appear debatable, so please, take them with a grain of salt and apply them as needed. However, we’ll try to provide as much rationale as possible, while exploring alternatives in order to better understand the reasons behind our choices.
It is also important to understand that the following sections only apply to public code, i.e. anything that has been pushed to a remote. Nobody but you knows whether you squashed your 15 commits into 1 just before you pushed your code to a public repository somewhere. Regardless of the above, it often pays off to use the same strategies both offline and online, for reasons explained below.
General rules that apply to all strategies: merge with fast-forward, avoid squashing, avoid rebasing.
Merging code between develop
and various feature branches, as well as between develop
and master
, is one of the most common day-to-day operations, so let’s cover each case individually.
feature
to develop
Once a feature has been peer reviewed and tested as a unit, and provided the feature freeze window is still open, a feature may be merged back into develop
.
Choosing to merge instead of rebasing is based on the following rationale: the state of the work in the feature branch is a direct result of the point in develop
it was branched off. For long-lived feature branches, this meta-information is an important aspect of understanding the design choices behind the feature work.
Additionally, rebasing disrupts the linear nature of history, that is to say, commits may appear to be behind ones that were made further in the past, but which were rebased into develop
afterwards. This makes reasoning about the history harder (for instance, when wanting to bisect based on the knowledge that develop
was in a “good” state on some specific date).
Rebasing may also lead to the loss of information concerning how a feature evolved in time, especially when a feature had to be refactored in response to changes made in develop
(more on how these changes are brought into the feature branch in the following chapter).
In most cases, we’re only really concerned with the latest version of a feature. A commit introducing some code that is superseded by a following commit in the same feature branch may appear to be irrelevant since it never really touches upon the state of develop
at the time of merging.
However such information is important in a historical sense. It may be that the code was refactored in response to an outside event, such as a different feature being merged or product decisions being made behind the scenes. Such information may be relevant in the future, even if the code itself was never strictly part of any release.
The general idea is that, anything done with intent, that is to say, manually, should be preserved in the state in which it was made.
develop
into a feature
In several cases, you may need to synchronize your feature branch with develop
, for instance, when fixing merge conflicts.
Again, choosing to merge instead of rebasing is based on the general idea that actions with intent should be preserved. This especially true when working on a public branch, but holds for private branches as well.
Imagine the following scenario: You’re working on a feature branch for implementing image-uploading functionality in a CMS product. The functionality is close to be complete, when a refactor of the underlying Image
class is merged into develop
.
You, of course, can no longer merge my code as-is, and will have to change it in response to the refactor. You’re given two choices: either merge develop
into the feature branch, fix any conflicts, and continue to add commits for refactoring any remaining functionality, or rebase the feature branch on top of develop
and make it appear as if you were working with the refactored code from the beginning.
There are several reasons why merging provides benefit, especially in the long-term. One is, of course, that your refactor may be relevant to someone (including yourself) in the future. It may be as a pointer for refactoring other, similar features, or may help when attempting to debug issues that did not exist prior to the refactor.
Another, perhaps more esoteric reason is: fixing merge conflicts can go wrong. You may accidentally choose the wrong part of a conflict, or not merge the changes correctly, or add a typo somewhere that would not exist otherwise. When rebasing, it will appear as if these errors were part of the original design. When merging, these errors will appear as part of the merge commit, and as such can be traced back to with greater ease.
The proliferation of merge commits is the most common reason for choosing to rebase rather than merge, but cases like this demonstrate the value of preserving merge commits, both for their content and as meta-information: this was the point where you needed to refactor your feature; this is the point you merged your feature into develop
.
develop
to master
The same general rules for merging between feature branches and develop
apply here as well: merge with fast-forward, do not squash.
It may be that, due to bugfix branches being applied to master
alone and not develop
, that the two will diverge. This however, should not complicate matters much, as in most cases develop
is a strict super-set of master
. Merging bugfix branches on both master
and develop
can help alleviate any future problems, and is the preferred strategy.
bugfix
to master
Again, the same rules as with merging between feature branches and develop
apply. As stated above, we should also merge all bugfixes into develop
as well, even when the bug no longer applies, in order to eliminate divergence.
When merging features or bugfixes, we choose to fast-forward our branch relative to the base branch, for various reasons, most notably, the fact that feature and bugfix branches are intended to be ephemeral, and can (and should) be pruned regularly. The reason of why we treat these branches as such is related to how we treat commits, and is explained further below.
Choosing not to fast-forward makes bisecting and reasoning about the history harder, while providing dubious benefits, especially since pull requests (on, for example, GitHub or Bitbucket) continue to exist even if the underlying branch has been deleted.
It may appear, from the above, that the most important aspects of our workflow lie within our branching and merging strategies. However, this is not entirely true.
The smallest monad in any Git repository is the commit, which also makes it the most important aspect of our workflow. Maintaining a clean history depends largely on the quality of each individual commit pushed, and keeping the quality consistent is hard and requires buy-in from every individual team member.
Squashing commits is the antithesis of maintaining consistent quality – why would you squash commits that have been prepared with such diligence? Several other reasons apply, as explained in the following sections.
The following chapters outline several rules on creating good commits.
Perhaps the easiest rule to implement, and the one providing the most benefits for the least amount of effort, is standardizing on naming conventions for commit messages. The advice below echoes conventions followed by quite a few large repositories, including the Git repository for the Linux kernel itself, but is nevertheless worth repeating:
Commit titles should be prepended with the file name or subsystem they affect, be written in the imperative starting with a verb, and be up to 60 characters in length.
So, applying the aforementioned rules, we have two examples:
Bad example:
The Get method of the Image class now fetches files asynchronously
Good example:
Image: Refactor method “Get” for asynchronous operation
The reasons are many-fold: prepending the name of the subsystem helps in understanding where the work is happening at a glance. Using the imperative and starting with a verb is easier to understand by using the following sentence before every commit title: “applying this commit will…”. Lastly, the choice of limiting the title to 60 characters may appear archaic, but it helps in being more terse.
All commits should be accompanied by a commit message (separated from the title by two consecutive newlines), ideally containing the rationale behind the changes made within the commit, but minimally the name of the ticket this work is attached to, which will most certainly be useful to you at some point in the future. For example:
Image: Refactor method “Get” for asynchronous operation
Fetching images from the remote image repository is now asynchronous, in order to allow for multiple images to download concurrently. This change does not affect the user-facing API or functionality in any way.
Related-To: #123
Using a standard syntax for relating commits to ticket numbers helps with finding them using git log --grep
.
We don’t always have the ability or knowledge to foresee the final, completed state of work needed in order to implement a feature or fix a bug. As such, most work is driven by whatever idea we have about the code at the moment, and can therefore change rapidly.
The standard rule for choosing what to include in a commit is this: every commit should represent a single, individually reversible change to the codebase. That is to say, related work, work that builds on top of itself in the same branch, should be part of the same commit.
As an example, in the course of implementing the asynchronous image operations described above, you find a bug in the same file but a different, unrelated method.
This bugfix and the feature work done should appear in two, separate commits, for the simple reason that, we should be able to revert a buggy feature without sacrificing unrelated bugfixes made in the course of building that feature.
The tools we use will largely affect what our commits look like: GitHub now allows for better control over reviewing specific commits. Gerrit allows commits to be grouped into patch-sets, which can be reviewed and reworked as separate entities (which would usually either require a rebase or a new pull request). Other tools only allow reviewing the latest version of a branch as a whole.
Pushing for clear boundaries between commits, especially in the face of ever-changing requirements, and the fact that in most cases, you’d only ever revert an entire pull request/branch and not the individual commits themselves, may appear to be losing battle.
The easiest way to deal with these issues is at the time of review: if the commits are too big (over a couple hundred lines of code) and do not appear cohesive, reviewing the code is that much harder, and will eventually lead to inferior code quality and/or bugs falling through the cracks.
Various concepts have been presented, some harder to implement than others. If there is one take-away, please allow it to be this: it’s better to be consistent than to be correct, and it’s better to be simple than comprehensive.
Rules that are not orthogonal to one another are harder to implement and follow consistently, so keep that in mind when choosing which battles to fight.
The graphics in this post have been generated using Grawkit, a AWK script which generates git
graphs from command-line descriptions.