Media Summary: To learn, you need to try new things, but that can be risky. How do we make AI systems that can explore We can expect AI systems to accidentally create serious negative side effects - how can we avoid that? The first of several videos ... Maybe AI systems would be safer if they avoid gaining too much control over their environment? How might that work? This is a ...
Safe Exploration Concrete Problems In - Detailed Analysis & Overview
To learn, you need to try new things, but that can be risky. How do we make AI systems that can explore We can expect AI systems to accidentally create serious negative side effects - how can we avoid that? The first of several videos ... Maybe AI systems would be safer if they avoid gaining too much control over their environment? How might that work? This is a ... This is a follow-up to this earlier video: There's another Three different approaches that might help to prevent reward hacking. New Side Channel with no content yet! Why can't we just have humans overseeing our AI systems? The
Introduction to Reinforcement Learning and Concrete Problems in AI Safety Sometimes AI can find ways to 'cheat' and get more reward than we intended by doing something unexpected. The Goodhart's Law, Partially Observed Goals, and Wireheading: some more reasons for AI systems to find ways to 'cheat' and get ...