September 9, 2024

Krazee Geek

Unlocking the future: AI news, daily.

How Google Gemini works: Everything you need to know

7 min read

With Gemini, Google is planning to make waves with its new generative AI platform. While Gemini appears promising in some aspects, it falls short in others. So, what exactly is Gemini? In what ways can you use it? What does it have to offer compared to its competitors?

As new Gemini models and features are released, we’ll update this handy guide to keep you up-to-date on all the latest developments.

Gemini: what is it?

Gemini is Google’s long-promised next-generation generative AI model family, developed by DeepMind and Google Research. You can choose from three flavours:

  • Gemini Ultra, the flagship model of the Gemini line
  • Gemini Pro, a “lightweight” version of Gemini
  • The Gemini Nano, which runs on mobile devices like the Google Pixel 8 Pro, is a smaller version of the standard Gemini model.

Our Gemini models are trained to be “natively multimodal” – that is, able to handle more than just text. A wide range of audio, images and videos, as well as a number of code bases and text in several different languages, have been pre-trained and fine-tuned.

This makes Gemini a unique model compared to LaMDA, Google’s large language model trained only on text data. LaMDA cannot comprehend or generate anything other than text (e.g. essays, email drafts, etc.) – but Gemini models can. Despite their limited ability to understand images, audio, and other modalities, it is still a significant improvement.

Bard vs Gemini: what’s the difference?

Google Gemini

Images courtesy of Google

Google, once again showing its lack of branding skills, needed to make it clear from the get-go that Gemini is distinct from Bard. Bard is essentially an interface for accessing Gemini models – you can think of it as a client or app for Gemini. The Gemini family of models, on the other hand, is not an application or a front. Gemini doesn’t have a standalone experience, and it probably won’t. For example, Bard can be compared to OpenAI’s popular conversational AI app ChatGPT, and Gemini can be compared to the GPT-3.5 or GPT-4 language model that powers ChatGPT.

Additionally, Gemini is totally independent of Imagen-2, a text-to-image model that may or may not fit into the company’s overall AI strategy. If you need clarification, you’re not alone!

Gemini’s capabilities: what can it do?

Due to the multimodal nature of Gemini models, they are capable of performing a wide range of tasks, including transcribing speech, captioning images and videos, and generating artwork. Google has yet to bring many of these capabilities to market (more on that later), but the company promises to do so soon.

Although the company’s word is hard to trust, it’s worth a shot.

In the original launch of Bard, Google seriously underdelivered. In addition, a video showing Gemini’s capabilities was heavily doctored and more or less aspirational ruffled feathers recently. To the credit of the tech giant, Gemini is available in some form today – though in a somewhat limited capacity.

When the Gemini models are released, there will be what they will be able to do, assuming Google is being more or less honest with its claims:

Gemini Ultra

As of now, only a “select set” of customers using a handful of Google apps and services have access to Gemini Ultra, the “foundation” model. This will change later this year when Google’s largest model goes on sale. It’s best to take Ultra info with a grain of salt since most of it comes from Google-led product demos.

According to Google, Gemini Ultra can help you with physics homework, solve problems on a worksheet step-by-step, and identify mistakes in answers you’ve already entered. A chart can also be “updated” using Gemini Ultra by generating formulas needed to recreate the chart with more recent data by identifying scientific papers relevant to a particular problem. Google says it can also be used to identify scientific documents related to a specific problem.

Gemini Ultra supports image generation technically, as mentioned earlier. According to Google, that capability will not be included in the productized version of the model at launch – perhaps because it is more complex than how apps such as ChatGPT generate images. In ChatGPT’s case, instead of feeding prompts to an image generator (such as DALL-E 3), Gemini outputs images “natively” without the need for an intermediary.

Gemini Pro

In contrast to Gemini Ultra, Gemini Pro is available to the public today. However, its capabilities vary depending on where it’s used.

According to Google, Gemini Pro is a better reasoning, planning, and understanding model than LaMDA at Bard, where it was launched as a text-only model first. Gemini Pro is indeed more capable of handling longer and more complex reasoning chains than OpenAI’s GPT-3.5, according to researchers from Carnegie Mellon and BerriAI.

However, users have reported numerous examples of Gemini Pro making mistakes and reasoning poorly when dealing with math problems involving several digits, similar to all large language models. There were a lot of factual errors when searching for simple queries, such as who won the latest Academy Awards. There has yet to be any indication of when Google will make improvements.

Additionally, Gemini Pro is available via an API in Vertex AI, Google’s fully managed AI development platform, which allows users to input text and generate text. Gemini Pro Vision, an additional endpoint, can process text and imagery as well as photos and videos to produce text along the lines of OpenAI’s GPT-4.

Google Gemini Structured prompt

Using Gemini Pro in Vertex AI.

With Gemini Pro, developers can customize it to meet specific contexts and use cases within Vertex AI through the “grounding” process. In addition, Gemini Pro can use third-party APIs to perform specific tasks.

Gemini Pro will be available to Vertex customers in “early 2024” for custom-built chatbots and conversational voice agents (i.e. chatbots). To satisfy queries, Gemini Pro will also be able to summarize, recommend, and generate answers based on documents from a variety of sources (e.g. PDFs, images) drawn from different sources (e.g. OneDrive, Salesforce).

Google Gemini Structured prompt google AI studio

Image Credits: Gemini

In AI Studio, Google’s web-based tool for app and platform developers, you can create freeform, structured, and chat prompts using Gemini Pro. In addition, developers can adjust the model temperature to give tone and style instructions, as well as tune the safety settings using both Gemini Pro and Gemini Pro Vision endpoints.

Gemini Nano

Gemini Nano runs on (some) smartphones instead of sending tasks to a server, so it’s smaller and more efficient than Gemini Pro and Ultra. It is currently used for two features on the Pixel 8 Pro: Summarize in Recorder and Smart Reply in Gboard.

A Gemini-powered summary of recorded conversations, interviews, presentations, and other snippets is included in the Recorder app, which lets users push a button to record and transcribe audio. In a nod to privacy, no data leaves the user’s phone while they get these summaries despite not having a signal or Wi-Fi connection.

Additionally, Gemini Nano is available as a developer preview in Gboard, Google’s keyboard app. In messaging apps, it powers a feature called Smart Reply, which helps suggest the next thing you should say during a conversation. According to Google, the feature will initially be available only on WhatsApp but will be available on more apps by 2024.

What is the difference between Gemini and OpenAI’s GPT-4?

Google will know how the Gemini family stacks up when Ultra is released later this year, but the company has claimed improvements over OpenAI’s GPT-4, which is state-of-the-art.

According to Google, Gemini Ultra exceeds current state-of-the-art results on “30 of the 32 widely used academic benchmarks used in language model research and development.” As a result, the company claims that Gemini Pro is better suited for tasks such as summarizing content, brainstorming, and writing than GPT-3.5.

However, the scores Google points to are only marginally better than OpenAI’s corresponding models, leaving aside the question of whether benchmarks really are an indication of a better model. As mentioned earlier, some early impressions could have been better, with users and academics complaining that Gemini Pro gets basic facts wrong, struggles with translations, and provides poor coding guidance.

What will the cost of Gemini be?

As of now, Gemini Pro is free to use in Bard as well as AI Studio and Vertex AI.

When Gemini Pro exits preview in Vertex, the model costs $0.0025 per character, while output costs $0.00005 per character. Customers of Vertex pay per 1,000 characters (140 to 250 words) and, in the case of models like Gemini Pro Vision, per image ($0.0025).

Let’s assume that a 500-word article contains 2,000 characters. Gemini Pro would cost $5 to summarize that article. As a comparison, it would cost $0.01 to generate an article of similar length.

Where can you try Gemini?

Gemini Pro

Gemini Pro is the easiest to experience in Bard. As of now, Pro is answering text-based Bard queries in English in the U.S., with additional languages and countries set to arrive in the future.

A preview version of Gemini Pro is also available in Vertex AI via an API. As of now, the API is free to use “within limits” and supports 38 languages and regions, including Europe, as well as chat functionality and filtering.

Additionally, Gemini Pro is available in AI Studio. By using the service, developers can create prompts and Gemini-based chatbots, then export them to a more powerful IDE or use them in their apps.

Google’s Duet AI for Developers, which helps developers with code generation and completion, will begin using a Gemini model in the coming weeks. In early 2024, Google plans to make Gemini models available as dev tools for Chrome and Firebase mobile development platforms.

Gemini Nano

The Gemini Nano is on the Pixel 8 Pro – and it will be coming to other smartphones in the near future. The model can be sneak-peeked by developers who are interested in incorporating it into their Android apps.

We’ll keep this post updated with the latest information.

Leave a Reply

Your email address will not be published. Required fields are marked *