Google Gemini: Everything you need to know about the new generative AI platform

AI News

Google Gemini: Everything you need to know about the new generative AI platform

bicycledays

April 30, 2024

Google Gemini: Everything you need to know about the new generative AI platform

Google’s attempting to make waves with Gemini, its flagship suite of generative AI fashions, apps and companies.

So what’s Gemini? How will you use it? And the way does it stack as much as the competitors?

To make it simpler to maintain up with the newest Gemini developments, we’ve put collectively this helpful information, which we’ll preserve up to date as new Gemini fashions, options and information about Google’s plans for Gemini are launched.

What’s Gemini?

Gemini is Google’s long-promised, next-gen GenAI mannequin household, developed by Google’s AI analysis labs DeepMind and Google Analysis. It is available in three flavors:

Gemini Extremely, probably the most performant Gemini mannequin.
Gemini Professional, a “lite” Gemini mannequin.
Gemini Nano, a smaller “distilled” mannequin that runs on cell units just like the Pixel 8 Professional.

All Gemini fashions have been skilled to be “natively multimodal” — in different phrases, in a position to work with and use extra than simply phrases. They have been pretrained and fine-tuned on a wide range of audio, pictures and movies, a big set of codebases and textual content in several languages.

This units Gemini aside from fashions reminiscent of Google’s personal LaMDA, which was skilled completely on textual content information. LaMDA can’t perceive or generate something aside from textual content (e.g., essays, e-mail drafts), however that isn’t the case with Gemini fashions.

What’s the distinction between the Gemini apps and Gemini fashions?

Google, proving as soon as once more that it lacks a knack for branding, didn’t make it clear from the outset that Gemini is separate and distinct from the Gemini apps on the net and cell (previously Bard). The Gemini apps are merely an interface via which sure Gemini fashions could be accessed — consider it as a shopper for Google’s GenAI.

By the way, the Gemini apps and fashions are additionally completely unbiased from Imagen 2, Google’s text-to-image mannequin that’s accessible in among the firm’s dev instruments and environments.

What can Gemini do?

As a result of the Gemini fashions are multimodal, they will in concept carry out a spread of multimodal duties, from transcribing speech to captioning pictures and movies to producing paintings. A few of these capabilities have reached the product stage but (extra on that later), and Google’s promising all of them — and extra — sooner or later within the not-too-distant future.

In fact, it’s a bit onerous to take the corporate at its phrase.

Google severely underdelivered with the unique Bard launch. And extra lately it ruffled feathers with a video purporting to indicate Gemini’s capabilities that turned out to have been closely doctored and was roughly aspirational.

Nonetheless, assuming Google is being roughly truthful with its claims, right here’s what the completely different tiers of Gemini will be capable of do as soon as they attain their full potential:

Gemini Extremely

Google says that Gemini Extremely — because of its multimodality — can be utilized to assist with issues like physics homework, fixing issues step-by-step on a worksheet and stating potential errors in already filled-in solutions.

Gemini Extremely may also be utilized to duties reminiscent of figuring out scientific papers related to a selected drawback, Google says — extracting info from these papers and “updating” a chart from one by producing the formulation essential to re-create the chart with newer information.

Gemini Extremely technically helps picture technology, as alluded to earlier. However that functionality hasn’t made its manner into the productized model of the mannequin but — maybe as a result of the mechanism is extra advanced than how apps reminiscent of ChatGPT generate pictures. Relatively than feed prompts to a picture generator (like DALL-E 3, in ChatGPT’s case), Gemini outputs pictures “natively,” with out an middleman step.

Gemini Extremely is accessible as an API via Vertex AI, Google’s absolutely managed AI developer platform, and AI Studio, Google’s web-based software for app and platform builders. It additionally powers the Gemini apps — however not without spending a dime. Entry to Gemini Extremely via what Google calls Gemini Superior requires subscribing to the Google One AI Premium Plan, priced at $20 per 30 days.

The AI Premium Plan additionally connects Gemini to your wider Google Workspace account — assume emails in Gmail, paperwork in Docs, displays in Sheets and Google Meet recordings. That’s helpful for, say, summarizing emails or having Gemini seize notes throughout a video name.

Gemini Professional

Google says that Gemini Professional is an enchancment over LaMDA in its reasoning, planning and understanding capabilities.

An unbiased examine by Carnegie Mellon and BerriAI researchers discovered that the preliminary model of Gemini Professional was certainly higher than OpenAI’s GPT-3.5 at dealing with longer and extra advanced reasoning chains. However the examine additionally discovered that, like all giant language fashions, this model of Gemini Professional notably struggled with arithmetic issues involving a number of digits, and customers discovered examples of unhealthy reasoning and apparent errors.

Google promised treatments, although — and the primary arrived within the type of Gemini 1.5 Professional.

Designed to be a drop-in alternative, Gemini 1.5 Professional is improved in plenty of areas in contrast with its predecessor, maybe most importantly within the quantity of information that it may well course of. Gemini 1.5 Professional can soak up ~700,000 phrases, or ~30,000 strains of code — 35x the quantity Gemini 1.0 Professional can deal with. And — the mannequin being multimodal — it’s not restricted to textual content. Gemini 1.5 Professional can analyze as much as 11 hours of audio or an hour of video in a wide range of completely different languages, albeit slowly (e.g., trying to find a scene in a one-hour video takes 30 seconds to a minute of processing).

Gemini 1.5 Professional entered public preview on Vertex AI in April.

A further endpoint, Gemini Professional Imaginative and prescient, can course of textual content and imagery — together with pictures and video — and output textual content alongside the strains of OpenAI’s GPT-4 with Imaginative and prescient mannequin.

Inside Vertex AI, builders can customise Gemini Professional to particular contexts and use circumstances utilizing a fine-tuning or “grounding” course of. Gemini Professional may also be linked to exterior, third-party APIs to carry out specific actions.

In AI Studio, there’s workflows for creating structured chat prompts utilizing Gemini Professional. Builders have entry to each Gemini Professional and the Gemini Professional Imaginative and prescient endpoints, and so they can modify the mannequin temperature to manage the output’s artistic vary and supply examples to offer tone and elegance directions — and in addition tune the protection settings.

As soon as Gemini 1.5 Professional exits preview in Vertex, nevertheless, the mannequin will price $0.0025 per character whereas output will price $0.00005 per character. Vertex prospects pay per 1,000 characters (about 140 to 250 phrases) and, within the case of fashions like Gemini Professional Imaginative and prescient, per picture ($0.0025).

Let’s assume a 500-word article comprises 2,000 characters. Summarizing that article with Gemini 1.5 Professional would price $5. In the meantime, producing an article of the same size would price $0.1.

Extremely pricing has but to be introduced.