Skip to main content

Evaluating AGI

The authors of the paper "Levels of AGI: Operationalizing Progress on the Path to AGI," propose six principles to define Artificial General Intelligence (AGI):

  1. Focus on Capabilities, Not Processes: AGI should be judged by what it can do, not how it does it.
  2. Generality and Performance: AGI should be able to handle a wide range of tasks (generality) and do them well (performance).
  3. Cognitive and Metacognitive Tasks: AGI should be able to think, understand, and also learn new things.
  4. Potential, Not Deployment: AGI is about what the AI could do in theory, not necessarily what it's currently doing.
  5. Ecological Validity: The tasks AGI can do should be meaningful and practical in the real world.
  6. Focus on the Path to AGI: The journey towards achieving AGI is important.

Can we measure AGI Capability?

Measuring AGI capability is tricky. The authors suggest that while we can measure how well an AI performs certain tasks (performance), it's harder to decide what tasks are important for AGI (generality). They ask questions like: What tasks should an AI be able to do to be considered AGI? How many of these tasks does it need to master? Are there some tasks that are must-haves?

What Should an AGI Benchmark Include?

The authors don't provide a specific set of tasks (a benchmark) for testing AGI in their paper. They say it's a complex process and needs input from many different fields. Instead, they focus on what a good benchmark should look like. A good benchmark for AGI should test a wide range of skills, like understanding language, solving math problems, spatial reasoning, understanding people, learning new skills, and being creative. These tests could be based on different theories of intelligence but need to be relevant for computers, not just humans. There are still open questions, like whether AI should be allowed to use tools during these tests. The authors suggest that the benchmark for AGI should be flexible and updated over time with new tasks, calling it a "living benchmark."

How to decide if system designed is AGI ?

To decide if something is AGI, you see if there are important tasks that humans can do but the AI can't. If an AI system passes most of the tasks in the benchmark, including new ones added later, it can be considered to have reached a certain level of AGI for practical purposes. Creating a benchmark for AGI is hard but important. It helps define what we're aiming for in AI research and shows how far we've come. In simple terms, the authors are discussing how to define and measure AGI, emphasizing the importance of practical, real-world tasks, and acknowledging that this is an evolving and challenging area of research.