On December 6, Alphabet, Google’s parent company, unveiled the first phase of Gemini, its cutting-edge AI model. Sundar Pichai, CEO of Alphabet and Google, along with the involvement of Google DeepMind, spearheaded the development of this next-generation AI model.
Achievements in MMLU
Gemini sets a new benchmark by surpassing human experts in Massive Multitask Language Understanding (MMLU), a widely accepted method for evaluating language model performance. This achievement positions Gemini as a groundbreaking model capable of code generation from diverse inputs, combining text and images, and cross-lingual visual reasoning.
Comparison with Competitor
Sundar Pichai emphasized Gemini’s superiority over OpenAI’s ChatGPT, particularly showcasing its proficiency in various tests evaluating AI performance across text and image-related tasks.
Multimodal Capabilities and Design of Gemini AI
Gemini’s standout features extend beyond its MMLU success. The model is designed for efficiency and scalability, allowing seamless integration with existing tools and APIs. This characteristic positions Gemini as a potent force driving advancements in AI, and its open-source approach encourages collaboration within the AI community.
” It’s also exciting because Gemini Ultra is state of the art in 30 of the 32 leading benchmarks, and particularly in the multimodal benchmarks. That MMMU benchmark—it shows the progress there. I personally find it exciting that in MMLU [massive multi-task language understanding], which has been one of the leading benchmarks, it crossed the 90% threshold, which is a big milestone. The state of the art two years ago was 30, or 40%. So just think about how much the field is progressing. Approximately 89% is a human expert across these 57 subjects. It’s the first model to cross that threshold.” – Sundar Pichai
Gemini Version Differentiation
Gemini is presented in three distinct versions: Ultra, the largest; Pro, a medium-sized variant; and Nano, a smaller, more efficient iteration. Google’s Bard, a ChatGPT-like chatbot, will be powered by Gemini Pro, while the Nano version is slated for use on Google’s Pixel 8 Pro phone.
Social Media Reaction On The Launch Of Gemini AI
Public response on social media to Gemini has been mixed, with some users reporting impressive outcomes and others mentioning experiences of ongoing hallucinations. Melanie Mitchell, an AI researcher, expressed admiration for Gemini’s sophistication but questioned whether it significantly surpasses GPT-4.
Gemini’s Development and Structure
Gemini is part of a family of multimodal large language models developed by Google DeepMind. Serving as the successor to LaMDA and PaLM 2, the model is named in reference to NASA’s Project Gemini. It utilizes decoder-only Transformers with modifications for efficient training and inference on TPUs. Input handling includes images of varying resolutions, video sequences, and audio converted into tokens by the Universal Speech Model.
Ethical Considerations and Mitigations
Before Gemini’s release, the development team conducted model impact assessments to identify societal benefits and potential harms. “Model policies” were devised to guide model development and evaluation based on known and anticipated effects. A comprehensive suite of evaluations, including policy and risk areas, was conducted. Mitigations at the data layer and instruction tuning were implemented to address safety concerns, and methods like attribution, closed-book response generation, and hedging were used to minimize hallucinations.
Gemini AI’s Regulatory Compliance With US Govt
In line with President Joe Biden’s Executive Order 14110, Google committed to sharing testing results of Gemini Ultra with the U.S. federal government, showcasing a commitment to transparency and compliance with regulatory standards.