Thursday, April 6, 2017

A Repeatable Build Process [1]

In one of my earliest posts, I said that...

... I think that even a bare-bones build-process should:
  • Run automated tests and stop if any tests fail;
  • Generate notifications if the tests fail;
  • Package the build(s) in some fashion so that it's ready to be deployed;
  • Generate notifications if a build fails for reasons other than test-failures; and
  • Deploy to an environment where the current build can be executed.
Now that there's more than one file in the codebase that I'm building, I have the beginnings of a repeatable test-process thought out, and I'm generating the snapshots that appear in the right sidebar from time to time, I think it's about time for me to start digging in to a repeatable build-process that can, at a minimum, run tests (and generate test-results reports), and package up those snapshots. Eventually, once I have time to give some thought to the various packaging-solution options, and pick one or more out, this build-process will have to be revisited. I won't get any further down that list than the first three items for now — and only part way through the third, if I'm honest about it — but that will get me far enough for the time being. And, at least for now, I'll be building the codebase out with GNU Make.

Why make? Why not {insert-tool-name}?

As I noted in that same earlier post, it seems to me that there are a truly ridiculous number of build-processes available these days. So why am I using something as old and primitive as make? Why not use grunt, or ant, or puppet or salt or any of the other fifty- or sixty-odd possibilities that can be found quickly and easily?

A good part of that decision is based around various needs that I know I'll eventually have to cater to:

  • Multiple build-output formats (more or less in order of priority for me):
    • A snapshot ZIP-file for inclusion here, whose build-process allows a date-specification to be provided during the build — I keep separate snapshot files for each snapshot I post here in order to keep them relevant to the post they were created for;
    • A gzipped tarball with installation scripts (for now, at least);
    • (Eventually) A Python egg file;
    • (Eventually) A deb package-file for Debian-based Linux distros; and
    • (Eventually) An rpm package-file for RPM-based Linux distros;
    (There is always the possibility of a new output-format becoming available as well)
  • The ability to run arbitrary command-line programs:
    • Execution of module unit-tests (using the unit-testing structure I just spent time on);
    • Generation of final API documentation;
    • Deployment routines, such as rsync over ssh;
  • Recursive and/or dependent-project sub-builds;
There may be other tasks or goals that I haven't thought of yet that would be added to this list...

I already know that make can handle most of these, even the recursive calls that would be required to have one project require another. The tasks that I'm not sure about are the ones that I've noted as eventual goals in the list above, but I'd be very surprised if they weren't feasible with a Makefile.

Additionally, I already get how to work with a Makefile, at least for the most part, and what I don't understand I'm confident I can find references or even sample code for.

To be fair, I'd also be surprised if most of the other available tools couldn't handle all of my requirements, but I don't know that any of them could. Since I'm more interested in writing code and discussing it than in learning some new-to-me build-process tool-chain, I'll stick to what I know, at least for the time being.

What I Want To Accomplish

For now, the two build-results that I'm concerned with are:

  • Creating a snapshot ZIP-file; and
  • Creating a locally-installable package (which need not be anything more than a tarball and an installation script).
If, as I'm looking at packaging structures and processes, I happen to come across a quick and easy recipe for deb, rpm or egg files, I may flesh those out as well, but I'm not going to make it a priority.

So, the things that need to be done to build a snapshot are (in order):

  1. Make a copy of all of the project's files;
  2. Clean out any pyc files. Leaving them in the ZIP-file wouldn't cause any harm, but is they're compiled from my local source, they'll have file-path information relevant to my local system, which is annoying (in my opinion) if those appear in an error traceback, and could be a (minor?) security risk in a real-world dev-shop scenario;
  3. Remove any irrelevant directories or files from the copy. It may actually be a lot easier to copy only the relevant directories and files than to remove those that aren't relevant — Only including items that are specifically identified for inclusion is also a better security-policy, at least in my opinion.
  4. Ask for input for a snapshot name (I've been using a date-string of the format YYYY-MMM-DD, so 2017-Apr-06 for today), and create a folder in some predetermined location using that name;
  5. Create a {project_name}.zip archive-file in the named folder;

A tarball-and-install-script result isn't much more complex:

  1. Make a copy of all of the project's files;
  2. Clean out any pyc files. Leaving them in the ZIP-file wouldn't cause any harm, but is they're compiled from my local source, they'll have file-path information relevant to my local system, which is annoying (in my opinion) if those appear in an error traceback, and could be a (minor?) security risk in a real-world dev-shop scenario;
  3. Remove any irrelevant directories or files from the copy. It may actually be a lot easier to copy only the relevant directories and files than to remove those that aren't relevant — Only including items that are specifically identified for inclusion is also a better security-policy, at least in my opinion.
  4. Create a {project_name}.tgz archive-file in a common current-builds directory;
  5. Create the installation-script and any needed support-files in the same common current-builds directory;

The major chunks of functionality in a Makefile are probably its targets. A target wraps a series of commands that need to be executed in order to achieve a desired outcome, and have a very specific structure:

# An example target - target_name:
target_name: # target variable-settings and/or dependent targets
    # Target process-steps
Each target process-step line is indented with a tab though I'm showing them here with spaces, since tabs are quite a bit longer than the four spaces that the code I've been showing have used because the code-formatting script I have in place doesn't change those to something more spacing-friendly on the site). Each line in each Makefile target will be executed in sequence. By default, unless something is done to store data from any given line, no one line has any awareness of anything done by any other line.

Each target can have any number of variables associated with it, one per line (as far as I'm aware) and one at a time. Each target can also have a list of other targets that need to be executed before that target is executed:

# An example target - target_name:
target_name: BUILD_VAR_1=value1
target_name: BUILD_VAR_2=value2
target_name: BUILD_VAR_3=value3
target_name: dependent_target_1 dependent_target_2 dependent_target_3
    # Target process-step 1 
    #    (uses $(BUILD_VAR_1), maybe...)
    # Target process-step 2 (uses $(BUILD_VAR_2) 
    #    and $(BUILD_VAR_3), maybe...)
    # Target process-step 3
If this target were executed, the various BUILD_VAR_* variables would be set first, then the dependent_target_* targets would be executed, with access to the values set in the BUILD_VAR_* variables, then the target_name target would execute.

This sort of make-based build may be very brute-force in its approach — it may well be that there are more... elegant... ways of structuring a Makefile than what I'm showing, too. Nevertheless, as I noted earlier, I know that this structure, as brute-force and inelegant as it might be, will work for what I want to accomplish.

The Starting-Point for my Makefile Template

Without any functionality or variables set up, the basic structure of what will eventually become my template Makefile is going to start as:

# Makefile template

#############################
# PROJECT SETTINGS          #
#############################
# Eventually, I'll want to set up some "global" variables for 
# keeping track of values that are common across the *entire* 
# build-process --- more on that later

#############################
# BUILD SETTINGS            #
#############################
# Settings that are specific to the project will go here

#############################
# MAIN BUILD TARGETS        #
#############################
# These are the "main" targets -- The ones that will actually 
# be used to build a project for a given environment or end.

snapshot:
    # Gathers all of the project files, ZIPs them, and copies 
    # them to a directory where I can easily share the current 
    # snapshot through my Dropbox account -- which is how I'm 
    # sharing them now...

local_all:
    # Gathers ALL of the project's files needed for a 
    # deployable copy of the project (plus files from dependent 
    # projects!) performs any necessary clean-up (removal of 
    # pyc files, files that are tagged for *other* deployment 
    # destinations, etc.), renames any files that will reside 
    # in standard executable-file locations (/usr/local/bin) 
    # when deployed, etc., etc.

#############################
# COMMON BUILD-TASK TARGETS #
#############################
# These are targets that perform tasks that are common to 
# two or more of the main build targets -- Anything that 
# *can* be done in one place *should* be done in one place 
# if only so there's only one copy of the process-code...

There's a pretty substantial amount of organizational and structural thought that I need to go through before actually creating the Makefile template, let alone putting one in play, even just for the current project, with the very few files it contains so far. I'm going to deep-dive into those details inthe balance of this post, and do the corresponding deep-dive into the guts of the Makefile next time.

Planning the Build Targets

Although the snapshot target is, I think, going to be simpler, I'm going to start with the local_all target because it sets up the pattern that will be in play for all of the other environment-specific targets, at least for the most part. I expect that it will also provide several common dependency-targets that I'll also want/need for the snapshot target. Maybe for several others, too — time will tell.

The local_all Target

The tasks that any environment-centric build needs to perform include:

  • Gathering all of the project's deployable files into a single archive-file (a tgz file in this case). As I see it, that involves:
    • Copying all of the deployable directory-structure into a temporary build-location, which, in turn requires:
      • Creating the temporary build-location;
    • Performing any clean-up of files that do not need to be part of the build in that build location (pyc files, maybe others);
    • Renaming any files in standard executable-file locations (/usr/local/bin/*/*.*, for example) to remove their file-extensions (/usr/local/bin/*/*), and making sure they are executable;
    • Reconciling any environment-specific file-names so that (in this case), all files that have been copied into the build-directory whose names start with LOCAL. are renamed without LOCAL. in them, and any files whose names start with any other environments are removed. I'm currently expecting five variants (at most), including LOCAL. files:
      • LOCAL.
      • DEV.
      • TEST.
      • STAGE.
      • LIVE.
      Given that I'm not going to have access to DEV-, TEST- or STAGE-environments in the foreseeable future, and I may not have aLIVE environment for a while, I'm not going to worry too much about planning for those, but I will set things up so that each environment-specific build can follow the same pattern. I'm also planning on putting as much of the environment-specific information into variables in the Makefile as makes sense.
  • Generating the final tgz archive-file;
  • Copying any install-scripts into the build-directory, next to the archive-file, and modifying it/them as needed to make them viable for the target environment the deployment is for, which involves:
    • Finding the applicable scripts and copying them;
    • Replacing any environment-specific variables/items in them according to the environment the build is for;
    • Making sure they are executable;
  • Copying all of the files generated to a common, local current builds directory, in order to have a copy of the most recent build for any environment available locally, just in case.
A deployable package for a local build should be installable by running the install-script from its final location, so arguably there's no need to worry about generating logic in the Makefile to copy the final package-file and install-script anywhere else. When I start looking at builds for targets that aren't local to the machine I'm building on, I'll start thinking about ways to copy the package off-site, but that's still some ways away now, given where I am with the codebase today.

Also further off, but something I can plan for now, is the idea of having builds for a given project be aware of dependencies on other projects, and generating build-packages for those projects as well. My current gut feeling is that builds for dependent projects should include all the files that would be built for each of those dependent projects in the same package. That's without any consideration for real packages (egg, deb and/or rpm files) down the line, though, so that may well change, and change drastically once I have time to do more research and thinking about those.

There are a couple of other targets that should probably be included in any build for a non-local target that I didn't list, but that I think I'll probably create now anyway: targets for unit-testing the project and for generation of API-level documentation (at a minimum). Though only one is explicitly noted in my personal coding standards, they are both important in a real-world development scenario. And, realistically, having a unit-testing target that can be called from the command-line isn't a bad thing at a local-environment level.

Finally, and also for future consideration, so not something I'll plan on implementing now: A lot of my future projects are web-applications, and I may well want to be able to deploy files and code that are only relevant to the website portion of an application, without deploying any of the application-logic. A similar argument could also be made for a logic-only build. Eventually, then, I'm expecting to have, for example, local_all, local_app and local_site builds, and, logically, similar build-targets for other environments (so, live_all, live_app and live_site builds for the eventual live projects).

Each of the bullet-points in the list above should, I think, be represented in the Makefile by its own target — that way, as other targets are needed, the existing sub-targets can be used to accomplish the same tasks. That capability, particularly for items shared across builds for different deployment environments, will save time down the line, and keep the build-process as consistent across environments as I can manage. Those sub-targets, then, are (in order by their name):

base_all:
Copies the current deployable project-directories to the temporary build-directory (into {build_directory}/{project_copy}).
base_app:
Copies the current deployable project-directories for application-logic only to the temporary build-directory (into {build_directory}/{project_copy}).
base_site:
Copies the current deployable project-directories for website items only to the temporary build-directory (into {build_directory}/{project_copy}).
build_directory:
Creates the temporary build-directory that the project-directory and deployment-scripts will be copied to.
clean_build_directory:
Removes any files that should not be part of a deployment (.pyc files, etc.)
clean_executables:
Renames any files in executable paths ({build_directory}/{project_copy}/usr/local/bin, etc.) to remove their file-extensions, makes sure that all such files are executable.
copy_to_current_builds:
Copies the package and install-script(s) ({build_directory}/*.* to the {current_builds} directory
create_documentation:
Creates project-documentation (structure TBD) at {build_directory}/{project_copy}/usr/share/doc.
create_install_scripts:
Makes copies of standard install-scripts in {build_directory}, changes all environment-references within them to match the target environment the build is for, and renames them according to the build's target environment.
create_tarball:
Creates a tgz file of the {build_directory}/{project_copy} at {build_directory}/{project_name}.tgz
current_builds:
Creates the final common current-build directory ({current_builds}) that all results will end up in.
fix_environment:
Removes any files in the build-directory that start with an environment name other than the one that the build is for (e.g., removes {build_directory}[*/[*/[...]]]LIVE.* files for a LOCAL build).
test:
Runs the unit-test code (at {project_directory}/runUnitTests.py) and dies if the tests fail.
Looking ahead at the idea of having *_app and *_site targets, I'm going to try to set things up so that implementation of base_app and base_site targets can use as much of the same structure as the base_all target listed. Those targets could be described similarly, with:
base_app:
Copies the current deployable project-directories for application-logic only to the temporary build-directory (into {build_directory}/{project_copy}).
base_site:
Copies the current deployable project-directories for website items only to the temporary build-directory (into {build_directory}/{project_copy}).

There are several items that show up in those targets that would be worth thinking about setting up as process-variables or global variables within the Makefile:

build_directory:
The temporary build-directory (e.g., /tmp/build-process);
current_builds:
The local user-directory where final build-results will be dropped (e.g., ~/Current_Builds);
project_copy:
The copy of the original project_directory (below), created inside the build_directory, where deployable files will be gathered in order to be archived into the final package-tarball (e.g., /tmp/build-process/{project_name});
project_directory:
The original project-directory (e.g., ~/IDreamInCode/{project_name});
project_name:
The name of the project (e.g., idic);
These feel, to me, like they are all viable global values within the context of the Makefile. That is, I believe that they all should be items that could be set once inthe header of the Makefile and passed along as needed to the individual targets where they are relevant. I'll contemplate that before the next post, and see how well that idea works out once I actually start writing the Makefile code.

The snapshot Target

The snapshot target is likely going to be the simplest build-target I'll create. Since all I'm intending to do with it is make a copy of the current project-directory — keeping all of the files that are part of the project proper — and creating a ZIP file archive of them in a predetermined location, there's not a whole lot involved with it, at least in comparison to the targets that are intended to generate deployable builds... Whether it will (eventually) generate snapshots of project-dependencies is something I'll have to contemplate later, but since I'm focusing on one project right now — the idic core codebase — I don't have to make that decision for a while yet...

The main difference between a snapshot build and a build that's intended to generate an installable package is in the files and directories that the final result contains. Looking back at what I expected a project-structure to look like almost two months ago, there are a few new items that need to be added, I think:

File-system Path
[Project Root Directory]
  etc
    apache2
      sites-available
  test_project_name
  usr
    local
      bin
        [project-name]
      etc
        idic
          datastores
      lib
        idic
          [project-name]
    share
      doc
        [project-name]
      icons
        [project-name]
      [project-name]
  var
    cache
      [project-name]
    www
      [project-name]
        media
        scripts
        styles
  Makefile
  runUnitTests.py
  [install-script]
where:
[Project Root Directory]/test_project_name
is a directory containing the unit-test modules (that I now know how I want to set up);
[Project Root Directory]/Makefile
is the Makefile for the project;
[Project Root Directory]/runUnitTests
is the script that the Makefile will call in its test target, specifically set up to return a standard system error-code if any test fails; and
[Project Root Directory]/[install-script]
is the base install-script that the create_install_scripts target in the Makefile will use as it's starting-point for generating installation-scripts for build for specific environments.

Procedurally, the snapshot target has the following tasks it needs to perform:

  • Gathering up all the files needed in the snapshot, which might be able to leverage a target already defined:
    • Gathering all of the project's deployable files into a single archive-file (a zip file in this case). This is already available through the base_all target.
    • Adding the snapshot-specific items noted above into the project-copy directory.
    Like the local_all equivalent, that requires:
    • Creation of the temporary build-location (build_directory);
    Since the snapshot is intended to provide source-code files, it should probably also:
    • Clean out files that do not need to be part of the build in that build location (pyc files, maybe others (clean_build_directory if the types of files to be removed are the same across snapshot/non-snapshot builds);
  • Generating the final zip archive-file; and
  • Moving the final zip archive-file to a predetermined location.
The predetermined location that the final zip-file gets moved to can be the current-builds directory that an environment-targeted build ends with. That's not quite optimal for my purposes — ideally it'd drop the final file into a directory-structure in my Dropbox directory — but not having to write code in the Makefile that asks for a final destination isn't really necessary just yet. On top of that, I'd have to work out how to check the submitted path-value to ensure that it jives with the existing structure I have already established in Dropbox. That, I think, would add a degree of complexity that I don't need to worry about for now — and it's not like creating a directory and moving a file is all that time-consuming...

The snapshot build-process doesn't use any potential variables that weren't already noted in the environment-specific build-process earlier, so there's nothing new to consider there, for whatever that might be worth. It does add a couple of new targets, though:

add_snapshot_files:
Adds the files and directories (in {project_directory}) to the project-copy directory ({build_directory}/{project_copy})
create_snapshot_zip:
Creates a zip file of the {build_directory}/{project_copy} at {build_directory}/{project_name}.zip

I'm pretty sure that covers all of the variations I need to actually write in the Makefile template, and from a post-length standpoint, today's doesn't feel like it's too long, so I'll break for now, and spend the next post actually writing the Makefile template, and implementing it for the idic project — at least for purposes of generating snapshots.

No comments:

Post a Comment