Speed Up Your Builds: Cache Your Dependencies

Mathias Meyer, 5 Dec 2013

A large amount of time of a normal build is spent installing dependencies, be it Ubuntu packages or simply running bundle install or npm install on a normal project. On an average Rails project, this can take a few minutes to succeed unfortunately.

This is dreadful to watch and it takes precious time that should rather be spent running the build. We’ve seen some great initiatives from customers to cache bundles, like bundle_cache (inspired by our friends at Kisko Labs) and WAD from the fine folks at Fingertips.

But in the end, it became clear that we need to offer something that’s built-in, automatically caching our customers’ bundles without requiring major changes to their build configuration.

Today, we’re officially announcing built-in caching for dependencies.

While it integrates nicely with Bundler already, you can also use it to cache Node packages, Composer directories, or Python PIP installations. Heck, you can even use it to speed up the asset pipeline’s temporary test cache, to speed up builds even more.

Plus, caching an entire bundle of dependencies has the benefit of reducing the impact of outages of dependency mirrors.

To give you an idea of the impact of this simple yet efficient way of speeding up builds, we’ve had customers report saving a total of half an hour or more on their builds with bigger build matrixes.

On our billing app, it shaved off four minutes, cutting the build time down to one minute.

How can you enable it for your private projects?

If you’d like us to cache your Bundler directory, simply add the following to your .travis.yml:

cache: bundler

If you want to add the asset pipeline’s compilation cache, you can specify the directories to cache as well:

cache:
  bundler: true
  directories:
    - tmp/cache/assets/test/sprockets

For a Node.js project, simply specify the node_modules directory:

cache:
  directories:
    - node_modules

The specified paths are relative to the build directory.

Cache Rules Everything Around Me

We’re working on getting more language-specific caching methods included, and on getting common dependency mirrors closer to our infrastructure, as we already did with a network-local APT cache.

Bundler 1.5 has great features upcoming, including Gem Source Mirrors, which we’re looking into utilizing to increase the reliability of installing RubyGems.

Stay tuned, and happy caching!

For all the gory details on caching, see our docs. Note that this feature is currently only available for private repositories.

The tool we’re using to cache dependencies is open source!

Builds Atom Feed Now Available

Hiro Asari, 4 Dec 2013

There are many ways to stay up-to-date with your projects’ build results; you can visit the web site, or write your own client by using our API. We have recently added another. You can now subscribe to the Atom feed with your favorite news reader!

To subscribe to the feed, point your favorite Atom Reader to the Atom feed URL:

https://api.travis-ci.org/repos/travis-ci/travis-core/builds.atom

You can substitute travis-ci and travis-core to the repository owner and name of the repository you would like to subscribe to.

Alternatively, you can also send the HTTP Accept: application/atom+xml header to the above URL with or without the .atom extension. (If you do not pass this header, you will get the JSON-formatted data without the extension.)

With cURL, you can run:

curl -H "Accept: application/atom+xml" \
https://api.travis-ci.org/repos/travis-ci/travis-core/builds

Availability

The Atom feed is available now on both Travis CI and Travis Pro.

Enjoy!

A Long Overdue Database Upgrade to Heroku Postgres 2.0

Mathias Meyer, 2 Dec 2013

Last week, we shipped official support for PostgreSQL 9.2 and 9.3 on Travis CI.

In the weeks leading up to the announcements, just in time, we finished up some long overdue upgrades of our own infrastructure, bringing Travis CI up to par with the latest versions of PostgreSQL.

For the longest time, Travis CI ran off a single PostgreSQL instance, posing a challenge for us to both scale up and out. Due to unfortunate timing, we’ve been running on a 32 bit server with 1.7 GB of memory.

This limited our upgrade options, as we couldn’t just bring up a new follower in 64 bit mode based on the archived data of a 32 bit machine. We had to do a full export and import as part of the migration. This was one of the reasons why we held off on this upgrade for so long. Initial estimates pointed to a downtime of almost an entire day.

Amazingly, this single box held up quite nicely, but the load on it, mostly due to several hundred writes per second, bogged down the user experience significantly, making for slow API calls and sluggish responses in the user interface. We’d like to apologize for this bad experience, these upgrades were long overdue.

First the good news: the upgrades brought a significant speed boost. We upgraded to a bigger server, a 7.5 GB instance, and we upgraded to the latest PostgreSQL version, 9.3.1.

Here’s a little preview (and an unexpected cameo appearance) about the results:

This graph is the accumulated response time of all our important API endpoints in the 95th percentile.

Just in time for our planned upgrades, Heroku shipped their new Heroku Postgres 2.0, with some new features relevant to our interests.

Two Steps Forward

While travis-ci.com was already running on PostgreSQL 9.1, travis-ci.org was still running on 9.0.

We had one major problem to solve before we could approach the upgrade. Most of the data we carry around are build logs. For open source projects, as much as 136 of the 160 GB total database size was attributed to build logs.

Due to their nature, build logs are only interesting for a limited amount of time. We implemented features to continuously archive and load build logs to S3 a while ago.

But before the upgrade, we had to doubly make sure that everything was uploaded and purged properly, as we’d abandon the database afterwards, starting with a clean slate for build logs.

Once this was out and done, we migrated the logs database first. It only consists of two tables, and with all logs purged, only a little bit of data remained to be exported and imported.

All migrations took the better of four hours each, an unfortunate but urgently need service disruption. We kept the maintenance windows mostly to the weekends as much as we could to reduce overall impact.

The Results

We were pretty surprised by the results, to say the least.

Let’s look at a graph showing API response times during the week when we migrated two databases for travis-ci.org.

The first step happened on Wednesday, November 13. We moved build logs out of the main database and into a set of Tengu instances with high availability.

You can see the downtime and the improvements in the graph above. Overall time spent waiting for the database in our API went down significantly after we upgraded to a bigger instance, notable after the second set of red lines, which marks the migration of our main data to a bigger setup. A great result.

Here’s a graph of the most popular queries in our systems:

What you’ll also notice is that after the first migration, the averages are smoother than before, less spikey. Times overall initially didn’t really change significantly, but we removed a big load from our main database with this first step.

After the second migration, the most popular queries went down dramatically, almost dropping on the floor.

Here’s the breakdown on what was previously the most expensive query in our system:

Average call time went pretty much flat after the upgrade from a very spikey call time with lots of variation, suggesting much more expensive queries in the higher percentiles.

We can attribute that to much better cache utilization. Cache hit rates for indexes went up from 0.85 to 0.999, same for table hits, which is now at 0.999 as well.

Thanks to much more detailed statistics and Datascope, we now get much better insight into what our database is up to, so we can tune more queries for more speedups.

Unexpected Benefits

PostgreSQL 9.3 brought a significant feature, the ability to fetch data from indexes.

We saw the impact of this immediately. The scheduling code that searches for and schedules runnable jobs, has been a problem child as of late, mostly in terms of speed.

During peak hours, running a full schedule took dozens of seconds, sometimes even minutes. We analyzed the queries involved, and a lot of time was spent fetching data with what PostgreSQL calls a heap scan.

This turned out to be very expensive, and the lack of caching memory added the potential for lots of random disk seeks.

With PostgreSQL 9.3, a single scheduling run, which used to take up to 600 seconds, now takes less than one, even during busy hours.

A great combination of useful new features and lots more available cache memory gave us some unexpected leg room for build scheduling. We still need to tackle it, but now have less urgency.

Unexpected Problems

After the migration of travis-ci.com, we noticed something curious. Build logs for new jobs wouldn’t load, at all.

We quickly noticed a missing preparation step in the migration: check for any joins that could be affected by having two separate databases.

The query behaviour on travis-ci.com is slightly different compared to the open source version. We explicitly check access permissions to protect private repositories from unauthorized access.

This check broke for log, as the permissions check is currently joined into the query. As we’re working on doing more explicit permissions checks rather than join in the permissions, it worked out well to fix this minor issue on the spot and remind ourselves that we need to tackle the overarching issue soon.

Travis CI is now running on a total of eight database servers, four in total to store build logs, two for both platforms, and two Premium Ika for serving the main data for API, job scheduling, and all the other components. While eight may sound like a lot, four of those are failover followers.

We hope you enjoy a faster Travis CI experience. We still have lots more tuning to do, but even for bigger projects like Mozilla’s Gaia, with currently more than 21000 builds, the build history loads almost instantly.

If you haven’t already, check out Heroku’s new PostgreSQL. We’ve been happy users, and we’re very lucky to have such great people on their team to help us out.

Also, be sure to subscribe to PostgreSQL Weekly, curated by Craig Kerstiens, who also has a lot of great content about PostgreSQL published on his blog.

Older

Newer