Machine Learning Build vs Buy: The In-House…

Machine Learning Build vs Buy: The In-House Temptation


By now most CEOs of startups understand that they urgently need to figure out what role machine learning will play in their business. Large established companies too are actively engaged in this process. This is not an easy task in and of itself, as machine learning isn’t a panacea for everything wrong with your business and you cannot just sprinkle it on top of your existing business process and strategy. Instead, you likely have to jettison many assumptions about “how things are done” in your industry.

Suppose you have identified a genuine opportunity to apply machine learning, the next obvious challenge becomes a question of how to pursue it. Should you build something on your own or should you buy from a vendor?

Here I believe leaders on the product and business side are not always getting great advice from their engineering departments. Why? Because building machine learning systems from the ground up in-house is what every engineer wants to do. What could be more exciting for an engineer today than getting to build a machine learning system using TensorFlow? And yet in many instances that will be the wrong thing to do compared to using a specialized service provider.

Why? Because while it is easy to get going and achieve moderate accuracy, it is quite difficult to build something that improves in accuracy over time and delivers high availability and low latency at scale. So you should approach the decision of what to build and what to buy with the same clarity as you would in other areas. For instance, does it make sense for you to run your own database instances and patch them, upgrade them, scale them or should you use a managed database service? Not so long ago everyone ran their own databases but today even large enterprises are shifting to managed services.

The same is true for machine learning. Running, maintaining and scaling your own TensorFlow installation takes time and effort. Building and improving your own model is similarly difficult. There are questions of how you snapshot your model and roll it back if you mistakenly train it with the wrong data. Or how you push your model out to mobile devices if it needs to be close to endusers and then keep it updated there or better yet incorporate feedback from endusers into the model.

Here are some key questions you should consider as you figure out whether to build or buy when it comes to machine learning

1. Do you have the scale and is your operations team staffed to run this yourself? Can you attract top machine learning talent to your company?

2. Do you have more data than almost any other player in your industry for this particular problem or could you benefit from a vendor’s ability to train on lots of data across multiple customers?

3. Is this machine learning application unique to your business or something all your competitors need to do also? (e.g., fighting fraud, moderating forums) Will this allow you to competitively differentiate yourself?

And here are some questions you may want to ask of vendors you would consider

1. Is this their core business and do they provide high service levels (uptime, latency, security, etc.)?  How strong is their customer support?

2. Do they have a track record of improving model accuracy over time?

3. Are they trying to value price or is their pricing predictable and reasonable compared to cost, especially at scale?

You will want to take a hard look at these questions and use your engineering team’s views as just one input. For the time being, machine learning is an area in which the personal interests of your engineering team have a high likelihood of diverging from what might be best for your company.