
Improving AI Visual Interpreters
How do we improve Blind user confidence in Google’s AI visual model (e.g., Gemini Pro Vision)?
Duration
June - Nov 2025
Client
Google DeepMind
Services

Challenge
Improving Google’s Vision AI
Google DeepMind creates AI models that analyze visual data from a live camera feed. Aira is:
Testing their AI as a "visual interpreter"
For blind/low-vision users
Considering safety implications
Interaction
Managing User Expectations
I designed onboarding pages to establish:
AI capabilities
For user engagement and retention
Proper use
For user safety - avoid for medications & appliances
Background
Visual Descriptions for the Blind
Aira allows users to video call with a trained, sighted professional and ask questions like:
What temperature does the thermostat say?
How can I edit this inaccessible PDF?
Research
Testing Accuracy and Trust Across Applications
Navigation - e.g., finding your way to the front door after getting out of an uber.
Using Appliances - e.g., setting the washer water to hot, or learning microwave buttons.
Reading packages - e.g., reading expiration dates or heating instructions.
Picking Clothes - e.g., finding out if a shirt is appropriate for an interview or if socks match.
Results
Inaccurate with Appliances
The model struggled the most consistently to assist with:
Knobs
Keypads
Sliders
suggesting a need for more training on non-linear symbolic associations and skeuomorphs.
Impact
Google is Implementing Findings in Their New Model
Google - Aira partnership is continuing into 2026
Onboarding UX is being implemented in Aira’s app in Q1 of 2026
Local model research informed Aira roadmap
Learnings
Easier to Guide Expectations than Repair Mistrust
I learned the importance of properly conveying product capabilities to users -
Proper onboarding
If users don’t know how to adequately interact with a great product, it’s useless
Future improvement: Prompt engineer to see how it impacts product performance.


