Spotlight Interview: Maurice Kroon, CEO and Founder of Vox AI


In the rapidly evolving landscape of quick-service restaurants (QSRs), the drive-thru experience is undergoing a revolutionary transformation, thanks to the advent of advanced artificial intelligence (AI) technologies. At the forefront of this innovation is Vox AI, a groundbreaking AI solution designed to enhance the efficiency and profitability of drive-thru operations. Founded by Maurice Kroon, CEO and visionary behind Vox AI, the company has set a new standard in creating conversational, human-like AI that speaks 35 languages, aiming to alleviate the pressure on overworked employees and significantly boost return on investment for drive-thru businesses.

With its ability to replicate the speed, tone, and rhythms of human speech, Vox AI promises to streamline drive-thru queues, elevate customer satisfaction, and open up new avenues for upselling. Launched in October 2023, Vox AI has quickly distinguished itself from competitors by offering a solution that is not only multilingual and highly accurate but also capable of delivering a genuinely human-like interaction. In this interview, Kroon discusses the inspiration behind creating a conversational AI tailored for drive-thru restaurants and the challenges encountered in developing technology that stands out in the market. He delves into how Vox AI maintains the human touch in customer interactions, its contribution to operational efficiency, and the technology’s potential to revolutionize the QSR industry beyond drive-thrus.

First off, congrats on the launch of Vox AI, which made its official debut at CES 2024. Can you briefly describe your journey to founding the company and what inspired the creation of “a conversational, human-like AI” for drive-thru restaurants?

I came across some posts about existing solutions on social media, and when reviewing them, I was surprised that these solutions were not very human-like. Part of it was due to high latency in most of the solutions. Given my background in AI and genuine interest in AI solutions, I thought, “This really feels like talking to a machine instead of a human.” Soon, it became clear that I was able to come up with a solution that could truly replicate human-like interaction.

Can you share some of the challenges you faced while developing Vox AI and tell us a bit about the technology behind it and how it’s different from other solutions on the market? 

The biggest challenge was improving the speed of the AI’s response time. Typically, in a human conversation, the pause between people’s words (when the other person starts talking) should be around 500 milliseconds. In a heated debate, it’s around 350 milliseconds. If you have a solution that takes seconds or longer, people notice this as a barrier. It directly becomes robotic.

The challenge was to bring the AI’s response down to under 500 milliseconds, so you don’t really notice you’re talking to a computer. At least, not in terms of the speed. In real life, at the drive-thru, the entire time to take an order is typically around 60 seconds. The conversation goes back and forth between the customer and agent several times. Competing products can take seconds to respond. This translates into waiting time where the entire drive-thru queue is stalled. Given this, it’s absolutely critical that the speed of the AI is as quick as the human agent, as anything slower causes frustration with the consumer and costs the drive-thru money. Personally, I believe this is the main reason why there has not been a huge rollout of voice AI in drive-thrus visible yet.

Another challenge we’re working on currently is languages. Contrary to competing products, we already support 35+ languages. However, we’re now working on mixes of languages. For example, if you’re in Miami they speak a bit of English and a bit of Spanish, turned into Spanglish. Getting that nuance right is our current challenge. It will make the voice AI sound much less formal and more human.

How is the technology able to replicate the speed, tone, and rhythms of a human voice? How does it maintain the human touch in customer interactions?

Replicating the speed in a human-like interaction is part of our tech stack — in combination with some secret ingredients. Speed alone, however, doesn’t provide customers with that truly human-like experience. To achieve a genuinely human experience, we incorporate different layers such as tone of voice, emotions, and an understanding of the world.

How does Vox AI contribute to the efficiency of drive-thru restaurants? To what extent does it reduce the workload of employees?

We look at it from a more holistic point of view in drive-thru operations and not only as the Voice AI solution on its own. What touchpoints can be made more efficient? Is the traditional drive-thru layout still required?

Scanning a QR code at the speaker post is no longer required, given many drive-thrus have payment integration via their loyalty app for loyalty points savings and direct payments. Vox AI also saves time on the traditional window payment. This alone shaves off seconds on the total drive time per order. And why should there still be a first window in the drive-thru layout?

Vox will greet each customer when they drive up to the speaker post—always within a second, 24 hours per day. Vox AI is never off sick, requires no (sales) training, and doesn’t quit its job at the worst possible time. It’s always there, happy to serve and ready to surprise you with a joke.

And no, Vox AI will not replace a human. Currently, most drive-thru staff have five tasks to do. Vox AI removes this fifth task and enables them to focus on the core operations within the kitchen, resulting in a faster time to serve—stress-free.

How does Vox AI improve order accuracy and contribute to the overall customer experience in a drive-thru restaurant? How does the technology adapt to changing customer preferences?

Order accuracy is a core KPI we continuously measure before launching the system. In other words, the order accuracy must be above a certain threshold agreed upon with the customer before the system goes live. With this approach, we ensure that throughout Vox AI’s active lifecycle, staff involvement remains consistent from the start of implementation to ongoing operations.

The result is that customers may drive up one day and be served by a human, as always. On another day, they may be served by Vox AI, experiencing the same, or even better, order accuracy.

Can you elaborate on the potential revenue increase by implementing Vox AI? How does the technology create new upselling opportunities in drive-thru restaurants?

Upselling is one of the biggest contributors to the average order value, but it is also one of the most challenging strategies to execute successfully. Typically, humans find it difficult to try to upsell in a persuasive way consistently. Vox AI can upsell on every order, all day long, with the best combinations possible based on success rates and combinations set by the customer. Moreover, Vox AI knows when not to upsell and instead focuses on speed, especially during rush hours, for example.

When loyalty integration is enabled, the upsell becomes personalized, leading to a higher success rate. Calculations have shown that Vox AI’s return on investment (ROI) is 14x based solely on upsell performance.

What are your future plans for Vox AI? Are there potential applications of Vox AI beyond the restaurant industry?

We have some exciting new features planned for the coming months that I cannot disclose just yet, but Restaurant Technology News will be among the first to know once we are ready to announce them.

A human-like Voice AI solution is groundbreaking in many other verticals and we anticipate Vox AI being active in the coming years. However, many of these are not as challenging or diverse as the drive-thru business; after all, everyone frequents a drive-thru now and then. The variations in languages, dialects, and acronyms encountered in the drive-thru business will serve as a solid foundation, preparing Vox AI for new verticals over time.

What is the learning curve for restaurants to implement Vox AI? What is involved in getting the technology up and running, and what are the costs?

For restaurant operators, Vox AI doesn’t require any training. Our solution is entirely AI-based, meaning no human intervention is needed. This holds true during the system’s training, as well as for menu adjustments, price changes, or updates to allergy and nutritional information. The only requirement from an installation and setup perspective is to install a piece of hardware positioned between the communication system’s base station and the store’s network. Typically, this installation is handled by customer service agencies.

How do you see AI technology shaping the future of the restaurant industry, particularly QSRs?

AI will drive and reshape many aspects of the restaurant industry, benefiting customers, restaurant operators, and employees. AI will handle information on hardware and systems for first-line support fixes. Inventory predictions aim to minimize food waste, considering projected numbers based on various external factors. Voice AI is becoming a standard rather than an exception.

The QSR business is projected to continue growing over the coming years. Global staff shortages require restaurants to invest in smarter solutions and processes driven by AI. Given that the QSR market has always been a striving market for optimization in operational processes, we are convinced that technology adaptation will thrive in the coming years.