Site Reliability Engineering is a book based on a practice at Google which has gradually become a profession. It's a discussion about what it means to run software at scale, and for your users to have a good experience using your platform. Google Books
I came across this from a blog post from my go list, and it's a very interesting perspective.