top of page

Virtual assistants

Virtual assistants

In a virtual assistant use case, the goal is to enable real-time interaction with users for tasks like setting reminders, answering questions, scheduling meetings, or controlling smart devices. Here, small language models (SLMs) and large language models (LLMs) offer different trade-offs between efficiency, speed, and task complexity.


Use Case: Personal Virtual Assistant for Calendar Management


Scenario

A business implements a virtual assistant that schedules meetings, sends reminders, and manages the user’s calendar. The assistant runs directly on a smartphone or embedded device with limited hardware resources.


Key Metrics for Comparison

  • Latency: Time taken to process and respond to a user’s query.

  • Resource Utilization: Memory, CPU, and power consumption.

  • Response Accuracy: Correctness in responding to scheduling or reminder tasks.

  • Energy Efficiency: Power usage, particularly for mobile or battery-powered devices.


Metric

  • Model Size

  • Latency (average)

  • Memory Usage (RAM)

  • CPU/GPU Requirements

  • Battery Consumption

  • Response Accuracy

  • Task Complexity Limit


Small Language Model (SLM)

  • 50M parameters

  • 0.04 seconds/interaction

  • 200 MB

  • CPU only

  • 2% per hour (smartphone)

  • 89%

  • Moderate


Large Language Model (LLM)

  • 1.2B parameters

  • 1.2 seconds/interaction

  • 6.5 GB

  • GPU/High-end CPU

  • 8% per hour (smartphone)

  • 95%

  • High


Technical Insights

  1. Latency: SLMs can respond to user interactions nearly instantaneously, with an average latency of 0.04 seconds per interaction, which is 30 times faster than LLMs. LLMs, with their larger size and more complex computation, take around 1.2 seconds per interaction. This latency is especially noticeable when users expect immediate responses for simple calendar tasks or reminders.

  2. Memory and Compute Efficiency: SLMs require just 200 MB of RAM and can operate using a standard CPU, making them ideal for mobile devices with limited memory and no access to high-end GPUs. LLMs, on the other hand, consume significantly more memory (6.5 GB RAM) and typically require dedicated GPUs or high-performance CPUs, which are impractical for on-device applications.

  3. Energy Efficiency: For mobile devices or wearables where battery life is a priority, SLMs provide an advantage by consuming far less power, only draining 2% of battery per hour compared to LLMs, which drain about 8% per hour. This difference is crucial for virtual assistants running continuously in the background, as the SLM will allow the device to run longer without requiring frequent recharges.

  4. Task Complexity: While LLMs offer higher accuracy (95%) and can handle more nuanced or complex queries (e.g., understanding indirect or ambiguous language), SLMs are sufficiently accurate (89%) for the majority of common virtual assistant tasks like setting reminders or checking calendar events. For more complex tasks, an LLM might be required, but for typical virtual assistant use cases, the SLM performs adequately.


Business Insights

  1. Cost Efficiency: Using SLMs reduces the overall cost of deploying virtual assistants. SLMs can run efficiently on low-power, low-cost hardware without the need for expensive cloud infrastructure or GPUs. For a business providing mobile or on-premises virtual assistants, this translates into lower infrastructure and operational costs while maintaining reasonable functionality.

  2. Faster User Interactions: The speed at which an SLM responds to user queries (0.04 seconds) ensures that the virtual assistant feels immediate and responsive, providing a seamless user experience. In contrast, LLMs introduce noticeable delays, which can frustrate users, especially in real-time interactions like setting calendar events during a meeting.

  3. Battery Life and Device Longevity: For businesses developing mobile apps or wearables with virtual assistants, battery life is a critical factor in the user experience. The lower power consumption of SLMs (2% battery per hour) means that users can rely on their devices for longer without frequent recharging. This is a key selling point for mobile and wearable devices where energy efficiency is crucial.

  4. Scalability: Businesses looking to scale their virtual assistant offerings across a wide range of devices can benefit from the versatility of SLMs. They can be deployed on budget smartphones, smartwatches, and other IoT devices without requiring cloud access. This makes the deployment of virtual assistants more scalable and cost-effective, especially when reaching users in markets with lower hardware capacities.


Benchmarking Example

Let’s say a business needs to deploy virtual assistants on 10,000 devices for calendar management, each performing around 100 interactions daily.


  • SLM Total Latency: 0.04 seconds/interaction → 4 seconds per device per day.

  • LLM Total Latency: 1.2 seconds/interaction → 2 minutes per device per day.


With the SLM, users would spend 2 minutes less waiting for responses from the assistant daily, which improves user satisfaction and makes the virtual assistant feel more responsive.


Conclusion

For virtual assistant use cases, especially on mobile or embedded devices, small language models (SLMs) offer substantial advantages over large language models (LLMs) in terms of efficiency, speed, and resource utilization. While LLMs might provide marginally higher accuracy and better handling of complex queries, the latency, lower memory footprint, and energy efficiency of SLMs make them ideal for everyday assistant tasks like calendar management, reminders, and simple question-answering. These benefits translate into lower costs, better user experiences, and longer battery life, making SLMs a preferred choice for businesses aiming to deliver scalable and cost-effective virtual assistant solutions.


bottom of page