litany against fear
RailsConf 2009 A Morning with GitHub
This is part of my series of notes on RailsConf 2009. Check them all out here.
Introducing the guys behind GitHub, and then going through some QA’s.
1) Does github use any special techniques for storing so many repos?
Storing repos uses alternates, which shares objects between different repos. They use Grit for a pure Ruby interface to Git. They use RedHat GFS as a shared filesystem. Lots of caching. No distributed hash table for objects…yet. Architecture is pretty standard so far.
2) How do you monitor the whole system?
Twitter. No, seriously: NewRelic. There’s also a job queue that they’ve written custom stuff to look into what’s going on. Engine Yard monitors the overall health of the system. They use Tom’s god gem for lots of monitoring. They have to make sure that some processes don’t kill the servers, such as syntax highlighting or upload-pack running wild.
3) Do you shard git repos or do you use one big GFS?
They split up repositories because having thousands of repos in one directory is just…bad. They use a pretty simple hashing algorithm to split it up.
4) How long does it take the test suite to run?
A few minutes? No one seems to know. They use integrity to let them know when the build breaks.
5) Are they going to open source github?
Cost of running the open source project is really high, they’re in this to make money. Parts of it are open source: Jekyll, Grit, the services.
6) How many servers do you run GH on?
Hosted at EngineYard. 4 front-end slices, 1 job slice, 1 machine for solr, 1 for gem building. Memcache runs on the front-end slices too.
7) Is the GitHub source on GH?
Yes.
8) Is GitHub profitable?
Yes.
9) Why an Octocat?
Tom was looking for a mascot when GitHub was starting, and he was looking for octopus style images since Git has an octopus merge. He found some artist that made him
10) How do you manage authentication of so many SSH public keys on one user account?
Someone from EY patched sshd for them. Patching authorized_keys doesn’t work well with thousands of keys. So now, it looks up the keys via MySQL, which scales pretty well.
11) When will you offer GH tshirts featuring the Octocat?
Soon, hopefully.
12) Naming gems username-gemname sort of works. How can it be made better?
They’re not very happy with the setup. They had to deal with some people creating accounts like ‘net’ and having an ‘ssh’ repo. It’s been an incredible amount of work and a huge hassle so far for them.
13) What can we expect in terms of new features on the upcoming months?
Make sure the site scales. GFS isn’t going to scale for much longer. They need to make Git scale, not Rails scale. Git wasn’t really built to do what they need, and it’s a big problem that they need to tackle. Better documentation for the API along with better integration for third party apps.
14) Why not use passenger?
EY manages it. No big reasons to change it yet.