Google's AI "opens its eyes": Gemini transforms your smartphone into a visual assistant

Google has just unveiled a major advancement for its Gemini artificial intelligence, which now goes beyond simple conversation or text search. Thanks to a new feature, also implemented on Android systems, this AI can now see the world through your smartphone’s camera, analyzing your environment in real time to assist you like never before.

Initially presented under the code name “Project Astra” during a technology demonstration in 2024, this innovation promises to turn your phone into an omnisensory companion, capable of deciphering your surroundings with impressive precision.

Currently available only in the United States through a paid subscription, this technology could redefine our relationship with digital tools and pave the way for a decidedly futuristic future.

Gemini with camera access: How does it work? An AI that sees and understands

Behind this feat lies a sophisticated combination of computer vision and advanced machine learning, integrated into Gemini. The AI uses your smartphone’s camera as a pair of eyes, processing captured images on the fly using algorithms capable of recognizing objects, text, shapes, and contexts.

Here’s a video from Google presenting Project Astra:

This capability is based on years of Google research in areas like image recognition (think Google Lens) and conversational artificial intelligence, now merged into a single powerful tool. Concretely, Gemini can not only “see,” but also interpret what it observes and respond intelligently, drawing on its vast knowledge base.

Two game-changing features

Google has equipped Gemini with two distinct visual capabilities, each designed to simplify and enrich your daily life:

1: Smart screen sharing: an assistant in your activities

With this feature, Gemini becomes an active partner in your daily tasks. By enabling screen sharing, the AI looks through your camera and analyzes everything around you in real time. Imagine grocery shopping at the supermarket: you point your phone at a shelf, and Gemini identifies products, compares prices, or reminds you if an item is on your list.

Need a new outfit? During a shopping session, the AI can suggest clothing combinations based on what it sees in the store. It’s like having an expert friend by your side, but in digital and tireless form.

2: Real-time visual analysis: a scanner of the real world

The second feature transforms your smartphone into an exploration tool. By pointing the camera at your environment, Gemini can understand and explain what it sees. For example, scan a bookshelf, and the AI will list book titles, give you summaries, or check their online availability. Show it a toolbox, and it will identify a drill before explaining how to drill a wall without breaking everything.

You could even film a plant in a park and ask: “Is this edible?” This ability to interpret the physical world makes it an educational and practical assistant, ideal for the curious as well as DIY enthusiasts.

Availability and cost: a limited but ambitious launch

For now, this visual super-assistant is only accessible in the United States, reserved for Gemini Advanced subscribers, a premium service priced at $20 per month (approximately €20). This fee includes other advanced Gemini features, but the visual component is clearly the star of this update.

Google has not yet specified when, or if, this technology will arrive in Europe or elsewhere, suggesting a testing phase on the American market. This strategic choice could allow the company to refine the tool, adjust its performance, and address potential regulatory concerns (particularly regarding privacy with the European AI Act) before a global rollout.

Smart glasses on the horizon: Google Glass 2.0?

Google’s ambition doesn’t stop at smartphones. This visual technology is also planned to be integrated into smart glasses, reviving the spirit of Google Glass, abandoned ten years ago but never forgotten. Like Meta Ray-Ban, these glasses could analyze your environment continuously, without you having to take out your phone.

Imagine walking through an unfamiliar city: the glasses identify monuments and display historical information as an overlay. Or while driving: the AI detects a dangerous intersection and alerts you discreetly. This project, still in development, could transform how we perceive the world, by overlaying a layer of digital intelligence onto our reality.

The implications: between promises and questions

This innovation offers exciting prospects. For students, Gemini could become a visual tutor, explaining a diagram or equation filmed on a board. For travelers, an instant guide that translates signs or describes places. For visually impaired people, an assistance tool describing their surroundings with precision. But it also raises questions.

Privacy: what happens to confidentiality if your camera constantly records what you see?
Dependence: do we risk delegating too many tasks to this AI, to the point of losing our own skills?
Ethics: who controls the captured data, and how will Google use it?

These questions remain unanswered, but they will inevitably accompany the evolution of this technology.

A step toward an omnisensory future?

With this update, Gemini is no longer limited to text or voice: it becomes a multisensory AI, capable of seeing, interpreting, and acting on the physical world. This brings Google closer to its dream of an omnipresent artificial intelligence, integrated into every aspect of our lives.

If the United States serves as an experimental ground, it’s likely that this technology, once mature, will expand to other markets, perhaps with adjustments to comply with local laws (like GDPR in Europe).

A fascinating but double-edged advancement

This evolution of Gemini is a technical feat that illustrates the potential of Large Language Models when combined with computer vision. It could simplify mundane tasks, like checking an ingredient in a recipe by filming your kitchen, while opening up creative or educational uses.

But it also brings to mind an episode of Black Mirror: a world where everything is constantly analyzed can be convenient, but also intrusive. Google will need to find a balance between innovation and user respect for this visual AI to be adopted without mistrust.

Toward a futuristic horizon

Gemini and its digital eyes mark a turning point. From smartphones to smart glasses, Google is sketching a future where AI no longer just answers our questions, but anticipates our needs by looking at the world with us. For now reserved for a handful of American subscribers, this visual assistant is just a preview of what awaits us.

One day, perhaps, our devices won’t just see: they’ll smell, touch, and understand the world like us, or better than us. In the meantime, Gemini reminds us that science fiction is no longer so far from reality.

Google’s AI “opens its eyes”: Gemini transforms your smartphone into a visual assistant

Gemini with camera access: How does it work? An AI that sees and understands

Two game-changing features

1: Smart screen sharing: an assistant in your activities

2: Real-time visual analysis: a scanner of the real world

Availability and cost: a limited but ambitious launch

Smart glasses on the horizon: Google Glass 2.0?

The implications: between promises and questions

A step toward an omnisensory future?

A fascinating but double-edged advancement

Toward a futuristic horizon

Leave a Reply Cancel reply

Glen

Google’s AI “opens its eyes”: Gemini transforms your smartphone into a visual assistant

Gemini with camera access: How does it work? An AI that sees and understands

Two game-changing features

1: Smart screen sharing: an assistant in your activities

2: Real-time visual analysis: a scanner of the real world

Availability and cost: a limited but ambitious launch

Smart glasses on the horizon: Google Glass 2.0?

The implications: between promises and questions

A step toward an omnisensory future?

A fascinating but double-edged advancement

Toward a futuristic horizon

Leave a Reply Cancel reply

L'actualité de l'IA :

AI Slop on YouTube: 21% of Shorts Are AI-Generated

China to regulate “companion” AIs and overly human chatbots

Water, Pollution, Consumption: What is the True Environmental Impact of AI?

Éric Sadin: Introducing a Vigilant AI Thinker

Cyprien and ChatGPT: A Satire on Artificial Intelligence at Work

Yann LeCun to Launch “AMI Labs” and is Set to Raise Half a Billion!

Glen