By Yiftach Schoolman, Redis Labs Co-founder and CTO
One of the most critical steps in any operational machine learning (ML) pipeline is artificial intelligence (AI) serving, a task usually performed by an AI serving engine. AI serving engines evaluate and interpret data in the knowledgebase, handle model deployment, and monitor performance. They represent a whole new world in which applications will be able to leverage AI technologies to improve operational efficiencies and solve significant business problems.
AI Serving Engine for Real Time: Best Practices
I have been working with Redis Labs customers to better understand their challenges in taking AI to production and how they need to architect their AI serving engines. To help, we’ve developed a list of best practices:
Fast end-to-end serving
If you are supporting
real-time apps, you should ensure that adding AI functionality in your stack will have little to no effect on application performance.
As every transaction potentially includes some AI processing, you need to maintain a consistent standard SLA, preferably at least five-nines (99.999%) for mission-critical applications, using proven mechanisms such as replication, data persistence, multi availability zone/rack, Active-Active geo- distribution, periodic backups, and auto-cluster recovery.
Driven by user behaviour, many applications are built to serve peak use cases, from Black Friday to the big game. You need the flexibility to scale-out or scale-in the AI serving engine based on your expected and current loads.
Support for multiple platforms
Your AI serving engine should be
able to serve deep-learning models trained by state-of-the-art platforms like TensorFlow
or PyTorch. In addition, machine-learning models like random-forest and
linear-regression still provide good predictability for many use cases and
should be supported by your AI serving engine.
Easy to deploy new models
Most companies want the option
to frequently update their models according to market trends or to exploit new
opportunities. Updating a model should be as transparent as possible and should not
affect application performance.
Performance monitoring and retraining
Everyone wants to know how well the model they
trained is executing and be able to tune it according to how well it
performs in the real world. Make sure to require that the AI serving engine
support A/B testing to compare the model against a default model. The system
should also provide tools
to rank the AI execution
of your applications.
In most cases it’s best to build and train in the cloud and be able to serve wherever you need to, for example: in a vendor’s cloud, across multiple clouds, on-premises, in hybrid clouds, or at the edge. The AI serving engine should be platform agnostic, based on open source technology, and have a well-known deployment model that can run on CPUs, state-of-the-art GPUs, high- engines, and even Raspberry Pi device.