A peek into startup infrastructure-- what goes on back there anyway?!

Tody I moderated a discussion today with Socialize executive team members Jason Polites, Isaac Mosquera and Sean Shadmand about how Socialize has created an infrastructure that supports over 7 million API requests per day, 2.5 million social actions (now creating over 1 million new actions per month), 100,000 new users per day, over 7,000 SDK downloads, thousands of live iOS and Android apps running Socialize, all doubling monthly.

I have been getting requests from many companies of all sizes asking how Socialize has built and scaled a large infrastructure with just an 8 person engineering team.  Other startups interested in offering APIs and SDKs have wanted to know what we've learned and what pitfalls to avoid, and larger, progressive Fortune 1000 companies have been interested in re-working their infrastructure to be more scalable and efficient.

A partner at Accenture recently told me that I should share some of the secret sauce for other companies to learn from.  The Socialize team believes in being as open as possible with non-proprietary knowledge sharing, to help others benefit from what we've learned, and in turn to be able to learn from others as well.

Socialize has created a drop-in social platform that greatly boosts mobile app installs and engagement. If you don't yet know what Socialize does, you may want to learn more about that here before watching this infrastructure talk.

Here's the video, with a summary of the discussion points below:

>

In this 45 minute discussion, the Socialize executive team discusses:

  • How Socialize approaches infrastructure from the perspectives of process and business logic
  • How teams of one to two employees each for API, iOS SDK, Android SDK, Web and QA work together to create a modularized API infrastructure
  • How Socialize teams use APIs to communicate internally within the company to create accountability, efficiency and speed
  • Best practices in segmenting internal teams around APIs, including integration testing, unit testing, minimizing distraction and documentation
  • How to then break APIs up into products which can be exposed beyond the company internally, and how each API has its own Pivotal Tracker, Github issue tracking, Sphinx auto-generated documentation and change-request systems
  • How Socialize treats internal teams as customers of each other, and uses that approach to define and refine the scope of the product, and limits feature creep
  • How the SDK team becomes the 'head of the product,' scoping and defining functionality back into the company as one of the teams
  • The value of hardlining, staying away from perfecting an unreleased API, and getting good at creating a plan to deal with grwoth
  • Using a specific approach to scrum Socialize calls "short-cycle scrum" and tools like Basecamp, Google Docs and Pivotal Tracker  to manage situation which often come up, such as capturing and implementing ideas from employees and customers in a structured, productive way, how to stay away from premature design discussions, and how to innovate asynchronously.
  • How short-cycle scrum allows each team member to be a product lead within their team,
  • Taking full advantage of Amazon services for databases, redundancy, queueing, hosting, scalability, load balancing, all done programmatically, which allows for spooling up of infrastructure literally in minutes
  • The benefits of agile development methodologies, and specifically scrum, for those who haven't already used it for development, and how to trust scrum even though it is counter intuitive to remove dates from projects
  • The importance and value of prioritization of focus, and how we use these methodologies to do that, and make the less important things fall away
  • How Socialize is breaking up the Socialize action bar into modular components
  • How on a micro level, within the code base, the Socialize SDK communicates across the stack using the same type of modular infrastructure Socialize employs as a company at a macro level
  • Best practices around continuous integration with TeamCity, code coverage and testing, using GetSatisfaction for support, and how this approach makes support easy
  • How Socialize has implemented Splunk not just for log monitoring, but also at the application level for business intelligence, reporting and alerting, and even exposing Splunk data to Socialize clients