
Google on Tuesday announced its latest AI model, Gemini 1.5 Pro, which is more effective in understanding complex queries by leveraging multimodality and long context to give more precise answers than the previous generation. The Gemini 1.5 Pro will power Google’s consumer-facing apps such as Gmail, Search, and Photos to offer voice- and text-based search capabilities. The highlighted tool is Ask Photos, the Google Photos assistant, which will help you look for a particular photo by scanning the entire repository of photos and videos saved to your account.
In a demo, Alphabet Chief Executive Officer Sundar Pichai used Ask Photos to look for his car’s license plate number using voice. Instead of showing all the photos that may contain a license plate number, Ask Photos uses Gemini to look for the exact plate belonging to the owner. Another example showed the new tool’s ability to point to the exact photo for a query like “When did my daughter learn swimming?” Photos can also show a collection of photos and videos to show a chronology of events, for queries such as “Show me how my daughter has progressed in swimming.”
Much like Photos, Gemini 1.5 Pro will power a bevvy of apps to help users do more in less time. Google is also incorporating Gemini deep into Workspace, meaning the AI model will be more tightly tied into Google’s app ecosystem to offer answers and solutions through the sidebar. The Gemini app on Android smartphones is getting new features, as well, including a new Live mode that will work similarly to how we talk to someone over a video call. It will be like a video call to Gemini, who can analyse the video feed to offer suggestions. Google is also using Gemini to curb spam and fraud calls more effectively.
“Ask Photos can also help you search your memories in a deeper way,” Pichai said during his keynote at the I/O while announcing Ask Photos with Gemini. Rolling out to users this summer, the new tool will essentially scan all the photos and videos saved to a user’s account to answer personal questions. It could be something like “When was the last time I went to Himachal Pradesh?” or maybe a question about your facial looks two years back. Ask Photos will use Gemini to provide you with exact answers. Pichai also said that Photos now receives over 6 billion uploads (photos and videos) daily since its launch nine years ago.
Earlier this year, Google introduced the Gemini app for Android and iOS devices. On Android, the new Gemini app, powered by the Gemini AI model, co-exists with Google Assistant and offers generative AI-based answers. The app is now getting even smarter with two new features: Live and Gems. Live would allow users to sort of make a video call to Gemini and ask her suggestions based on what the chatbot can see and analyse from the video feed. Gems are essentially custom AI chatbots you can create on the Gemini app. Google is also making Gemini better at understanding what’s on screen and offering better suggestions. It is now “more aware” of the context on the screen, the company said at the event. Gemini AI assistant is also getting a new voice, which users can hear when talking to it through Live.
Gemini 1.5 Pro is coming to Google Workspace to make looking up information and creating results easier and faster. The sidebar in Google Docs, Slides, and Sheets, which works as a virtual assistant — similar to Microsoft’s Copilot, will know everything you have saved and offer answers to questions based on that information. For instance, you could ask the Gemini sidebar to tell you total savings in a month based on a sheet and it will make a chart for you, categorising all your expenses in a preferred order, and even offer suggestions based on the result. It can create a mail based on data in a presentation or set a reminder. It can even scan PDF files to answer questions, read emails to find receipts and save them in a new folder in Drive.
Google also demonstrated its upcoming voice assistant, which it refers to as the universal assistant. Currently, in testing at Google DeepMind labs, Project Astra will be based on multimodality, something Google Assistant, Siri, or Alexa cannot yet do. However, OpenAI’s GPT-4o-based ChatGPT is also coming up with a similar functionality later this year. Project Astra will offer real-time answers to queries by looking at real things. In a demo, a researcher asked Astra about the locality she was in and the assistant gave a precise answer along with additional information about the location. It even remembered the video feed to tell her where she misplaced her glasses. Project Astra has a humanlike voice, making it easy for humans to have conversations with it. However, there is no information on when Project Astra will be commercially available.
Last year, Google announced three tiers of its AI model, but it added one more at the I/O. The Gemini 1.5 Flash is a lighter version of Gemini 1.5 Pro, offering the same level of accuracy but in less time. It will give developers more choice in the models they can use to build applications. However, it is not available like the regular Gemini chatbot. Instead, it is accessible through Google AI Studio and Vertex AI currently.
Google Search will receive several new Gemini-powered updates this year. They include the rebranding of Search Generative Experiences (SGE) to AI Overviews and better handling of information to generate answers. The AI-generated summary will appear more often for different kinds of search queries, also including options to simplify and break down the AI-generated content for easier understanding. It will also help you plan a trip more conveniently based on the input. Google Lens will also gain Gemini capabilities to analyse a video to give you answers to related questions from the Web. The Circle to Search functionality can now solve math problems on Android phones and tablets. Google Chrome is also getting a lightweight version of Gemini AI called Gemini Nano to help you do things like creating a caption for your social media photo post.
Get latest Tech and Auto news from Techlusive on our WhatsApp Channel, Facebook, X (Twitter), Instagram and YouTube.Author Name | Shubham Verma
Select Language