{"id":77853,"date":"2025-07-17T19:43:58","date_gmt":"2025-07-17T19:43:58","guid":{"rendered":"https:\/\/80000hours.org\/?post_type=problem_profile&#038;p=77853"},"modified":"2026-04-22T16:44:24","modified_gmt":"2026-04-22T16:44:24","slug":"risks-from-power-seeking-ai","status":"publish","type":"problem_profile","link":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/","title":{"rendered":"Risks from power-seeking AI&nbsp;systems"},"content":{"rendered":"<div id=\"toc_container\" class=\"toc_white no_bullets\"><p class=\"toc_title\">Table of Contents<\/p><ul class=\"toc_list\"><li><a href=\"#pressing-problem\"><span class=\"toc_number toc_depth_1\">1<\/span> Why are risks from power-seeking AI a pressing world problem?<\/a><ul><li><a href=\"#section-one\"><span class=\"toc_number toc_depth_2\">1.1<\/span> 1. Humans will likely build advanced AI systems with long-term goals<\/a><\/li><li><a href=\"#section-two\"><span class=\"toc_number toc_depth_2\">1.2<\/span> 2. AIs with long-term goals may be inclined to seek power and aim to disempower humanity<\/a><\/li><li><a href=\"#section-three\"><span class=\"toc_number toc_depth_2\">1.3<\/span> 3. These power-seeking AI systems could successfully disempower humanity and cause an existential catastrophe<\/a><\/li><li><a href=\"#section-four\"><span class=\"toc_number toc_depth_2\">1.4<\/span> 4. People might create power-seeking AI systems without enough safeguards, despite the risks<\/a><\/li><li><a href=\"#section-five\"><span class=\"toc_number toc_depth_2\">1.5<\/span> 5. Work on this problem is neglected and tractable<\/a><ul><li><a href=\"#technical-safety\"><span class=\"toc_number toc_depth_3\">1.5.1<\/span> Technical safety approaches <\/a><\/li><li><a href=\"#governance\"><span class=\"toc_number toc_depth_3\">1.5.2<\/span> Governance and policy approaches <\/a><\/li><\/ul><\/li><\/ul><\/li><li><a href=\"#objections\"><span class=\"toc_number toc_depth_1\">2<\/span> What are the arguments against working on this problem?<\/a><ul><li><ul><li><a href=\"#maybe-advanced-ai-systems-wont-pursue-their-own-goals-theyll-just-be-tools-controlled-by-humans\"><span class=\"toc_number toc_depth_3\">2.0.1<\/span> Maybe advanced AI systems won't pursue their own goals; they'll just be tools controlled by humans.<\/a><\/li><li><a href=\"#even-if-ai-systems-develop-their-own-goals-they-might-not-seek-power-to-achieve-them\"><span class=\"toc_number toc_depth_3\">2.0.2<\/span> Even if AI systems develop their own goals, they might not seek power to achieve them.<\/a><\/li><li><a href=\"#if-this-argument-is-right-why-arent-all-capable-humans-dangerously-power-seeking\"><span class=\"toc_number toc_depth_3\">2.0.3<\/span> If this argument is right, why aren't all capable humans dangerously power-seeking?<\/a><\/li><li><a href=\"#maybe-we-wont-build-ais-that-are-smarter-than-humans-so-we-dont-have-to-worry-about-them-taking-over\"><span class=\"toc_number toc_depth_3\">2.0.4<\/span> Maybe we won't build AIs that are smarter than humans, so we don't have to worry about them taking over.<\/a><\/li><li><a href=\"#we-might-solve-these-problems-by-default-anyway-when-trying-to-make-ai-systems-useful\"><span class=\"toc_number toc_depth_3\">2.0.5<\/span> We might solve these problems by default anyway when trying to make AI systems useful.<\/a><\/li><li><a href=\"#powerful-ai-systems-of-the-future-will-be-so-different-that-work-today-isnt-useful\"><span class=\"toc_number toc_depth_3\">2.0.6<\/span> Powerful AI systems of the future will be so different that work today isn't useful.<\/a><\/li><li><a href=\"#the-problem-might-be-extremely-difficult-to-solve\"><span class=\"toc_number toc_depth_3\">2.0.7<\/span> The problem might be extremely difficult to solve.<\/a><\/li><li><a href=\"#couldnt-we-just-unplug-an-ai-thats-pursuing-dangerous-goals\"><span class=\"toc_number toc_depth_3\">2.0.8<\/span> Couldn't we just unplug an AI that's pursuing dangerous goals?<\/a><\/li><li><a href=\"#couldnt-we-just-sandbox-any-potentially-dangerous-ai-until-we-know-its-safe\"><span class=\"toc_number toc_depth_3\">2.0.9<\/span> Couldn't we just 'sandbox' any potentially dangerous AI until we know it's safe?<\/a><\/li><li><a href=\"#a-truly-intelligent-system-would-know-not-to-do-harmful-things\"><span class=\"toc_number toc_depth_3\">2.0.10<\/span> A truly intelligent system would know not to do harmful things.<\/a><\/li><\/ul><\/li><\/ul><\/li><li><a href=\"#how-you-can-help\"><span class=\"toc_number toc_depth_1\">3<\/span> How you can help<\/a><ul><li><a href=\"#want-one-on-one-advice-on-pursuing-this-path\"><span class=\"toc_number toc_depth_2\">3.1<\/span> Want one-on-one advice on pursuing this path?<\/a><\/li><\/ul><\/li><li><a href=\"#learn-more\"><span class=\"toc_number toc_depth_1\">4<\/span> Learn more<\/a><\/li><\/ul><\/div>\n<style>\n.sidebar-toc__contents>li li li>a {\npadding-left:35px;\n}\n[href=\"#objections\"] + ul {\ndisplay: none;\n}\n<\/style>\n<h2><span id=\"pressing-problem\" class=\"toc-anchor\"><\/span>Why are risks from power-seeking AI a pressing world problem?<\/h2>\n<p>Hundreds of prominent AI scientists and other notable figures signed a statement in 2023 saying that mitigating <a href=\"https:\/\/safe.ai\/work\/statement-on-ai-risk\">the risk of extinction from AI<\/a> should be a global priority.<\/p>\n<p>We&#8217;ve considered risks from AI to be the world&#8217;s most pressing problem since 2016.<\/p>\n<p>But what led us to this conclusion? Could AI really cause human extinction? We&#8217;re not certain, but we think the risk is worth taking very seriously.<\/p>\n<p>To explain why, we break the argument down into five core claims:<\/p>\n<ol>\n<li><a href=\"#section-one\">Humans will likely build advanced AI systems with long-term goals<\/a>.<\/li>\n<li><a href=\"#section-two\">AIs with long-term goals may be inclined to seek power and aim to disempower humanity<\/a>.<\/li>\n<li><a href=\"#section-three\">These power-seeking AI systems could successfully disempower humanity and cause an existential catastrophe<\/a>.<\/li>\n<li><a href=\"#section-four\">People might create power-seeking AI systems without enough safeguards, despite the risks<\/a>.<\/li>\n<li><a href=\"#section-five\">Work on this problem is tractable and neglected<\/a>.<\/li>\n<\/ol>\n<p>After making the argument that the existential risk from power-seeking AI is a pressing world problem, we&#8217;ll discuss objections to this argument, and how you can work on it. (There are also other major risks from AI we discuss <a href=\"\/problem-profiles\/\">elsewhere<\/a>.)<\/p>\n<p>If you&#8217;d like, you can watch our 10-minute video summarising the case for AI risk before reading further:<\/p>\n<div id=\"youtube-qzyEgZwfkKY\" class=\"wrap-video\" style=\"border-radius:0px; overflow:hidden;\">\n<div class=\"lazyYT\" data-youtube-id=\"qzyEgZwfkKY\" data-ratio=\"16:9\"\n         data-parameters=\"modestbranding=1&show_info=1&theme=light&enablejsapi=1\" data-autoplay=\"0\"><\/div><\/div>\n<h3><span id=\"section-one\" class=\"toc-anchor\"><\/span>1. Humans will likely build advanced AI systems with long-term goals<\/h3>\n<p>AI companies already create systems that make and carry out plans and tasks, and might be said to be pursuing <em>goals<\/em>, including:<\/p>\n<ul>\n<li><a href=\"https:\/\/gemini.google\/overview\/deep-research\/\">Deep research<\/a> tools, which can set about a plan for conducting research on the internet and then carry it out<\/li>\n<li><a href=\"https:\/\/waymo.com\/\">Self-driving cars<\/a>, which can plan a route, follow it, adjust the plan as they go along, and respond to obstacles<\/li>\n<li>Game-playing systems, like <a href=\"https:\/\/en.wikipedia.org\/wiki\/AlphaStar_(software)\">AlphaStar for Starcraft<\/a>, <a href=\"https:\/\/ai.meta.com\/research\/cicero\/diplomacy\/\">CICERO for Diplomacy<\/a>, and <a href=\"https:\/\/en.wikipedia.org\/wiki\/MuZero\">MuZero<\/a> for a range of games<\/li>\n<\/ul>\n<p>All of these systems are limited in some ways, and they only work for specific use cases.<\/p>\n<p>You might be sceptical about whether it <em>really<\/em> makes sense to say that a model like Deep Research or a self-driving car pursues &#8216;goals&#8217; when it performs these tasks.<\/p>\n<p>But it&#8217;s not clear how helpful it is to ask if AIs <em>really<\/em> have goals. It makes sense to talk about a self-driving car as having a goal of getting to its destination, as long as it helps us make accurate predictions about what it will do.<\/p>\n<p>Some companies are developing even more broadly capable AI systems, which would have greater planning abilities and the capacity to pursue a wider range of goals. OpenAI, for example, has been open about its plan to <a href=\"https:\/\/blog.samaltman.com\/reflections\">create systems<\/a> that can &#8220;join the workforce.&#8221;<\/p>\n<p>We expect that, at some point, humanity will create systems with the three following characteristics:<\/p>\n<ul>\n<li><strong>They have long-term goals and can make and execute complex plans.<\/strong> <\/li>\n<li><strong>They have excellent <em>situational awareness<\/em><\/strong>, meaning they have a strong understanding of themselves and the world around them, and they can navigate obstacles to their plans.<\/li>\n<li><strong>They have highly <em>advanced capabilities<\/em><\/strong> relative to today&#8217;s systems and human abilities.<\/li>\n<\/ul>\n<p>All these characteristics, which are currently lacking in existing AI systems, would be highly economically valuable. But as we&#8217;ll argue in the following sections, together they also result in systems that pose an existential threat to humanity.<\/p>\n<p>Before explaining why these systems would pose an existential threat, let&#8217;s examine why we&#8217;re likely to create systems with each of these three characteristics.<\/p>\n<p><strong>First<\/strong>, AI companies are already creating AI systems that can carry out increasingly long tasks. Consider the chart below, which shows that the length of software engineering tasks AIs can complete has been growing over time.<\/p>\n<figure class=\"wp-caption\">\n  <a target=\"_blank\" href=\"https:\/\/80000hours.org\/wp-content\/uploads\/2025\/06\/METR-doubling-graph-4-1024x875.png\"><img loading=\"lazy\" decoding=\"async\" \n    src=\"https:\/\/80000hours.org\/wp-content\/uploads\/2025\/06\/METR-doubling-graph-4-1024x875.png\" \n    alt=\"\" \n    width=\"1024\" \n    height=\"875\" \n    class=\"alignnone size-large wp-image-91320\"\n srcset=\"https:\/\/80000hours.org\/wp-content\/uploads\/2025\/06\/METR-doubling-graph-4-1024x875.png 1024w, https:\/\/80000hours.org\/wp-content\/uploads\/2025\/06\/METR-doubling-graph-4-300x256.png 300w, https:\/\/80000hours.org\/wp-content\/uploads\/2025\/06\/METR-doubling-graph-4-768x656.png 768w, https:\/\/80000hours.org\/wp-content\/uploads\/2025\/06\/METR-doubling-graph-4-1536x1312.png 1536w, https:\/\/80000hours.org\/wp-content\/uploads\/2025\/06\/METR-doubling-graph-4.png 1802w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><figcaption>\n  <\/figcaption><\/figure>\n<p>It&#8217;s clear why progress on this metric matters \u2014 an AI system that can do a 10-minute software engineering task may be somewhat useful; if it can do a two-hour task, even better. If it could do a task that typically takes a human several weeks or months, it could significantly contribute to commercial software engineering work.<\/p>\n<p>Carrying out longer tasks means making and executing longer, more complex plans. Creating a new software programme from scratch, for example, requires envisioning what the final project will look like, breaking it down into small steps, making reasonable tradeoffs within resource constraints, and refining your aims based on considered judgements.<\/p>\n<p>In this sense, AI systems will have long-term <em>goals<\/em>. They will model outcomes, reason about how to achieve them, and take steps to get there. <\/p>\n<p><strong>Second<\/strong>, we expect future AI systems will have excellent situational awareness. Without understanding themselves in relation to the world around them, AI systems might be able to do impressive things, but their general autonomy and reliability will be limited in challenging tasks. A human being will still be needed in the loop to get the AI to do valuable work, because it won&#8217;t have the knowledge to adapt to significant obstacles in its plans and exploit the range of options for solving problems.<\/p>\n<p>And <strong>third<\/strong>, their advanced capabilities will mean they can do so much more than current systems. Software engineering is one domain where existing AI systems are quite capable, but AI companies have said they want to build AI systems that can outperform humans at most cognitive tasks. This means systems that can do much of the work of teachers, therapists, journalists, managers, scientists, engineers, CEOs, and more.<\/p>\n<p>The economic incentives for building these advanced AI systems are enormous, because they could potentially replace much of human labour and supercharge innovation. Some might think that such advanced systems are impossible to build, but as we discuss <a href=\"\/problem-profiles\/risks-from-power-seeking-ai\/#objections\">below<\/a>, we see no reason to be confident in that claim.<\/p>\n<p>And as long as such technology looks feasible, we should expect some companies will try to build it \u2014 and perhaps quite soon.<\/p>\n<h3><span id=\"section-two\" class=\"toc-anchor\"><\/span>2. AIs with long-term goals may be inclined to seek power and aim to disempower humanity<\/h3>\n<p>So we currently have companies trying to build AI systems with goals over long time horizons, and we have reason to expect they&#8217;ll want to make these systems incredibly capable in other ways. This could be great for humanity, because automating labour and innovation might supercharge economic growth and allow us to solve countless societal problems.<\/p>\n<p>But we think that, without specific countermeasures, these kinds of advanced AI systems may start to seek power and aim to disempower humanity. (This would be an instance of what is sometimes called &#8216;misalignment,&#8217; and the problem is sometimes called the &#8216;alignment problem.&#8217;)<\/p>\n<p>This is because:<\/p>\n<ul>\n<li>We don&#8217;t know how to reliably control the behaviour of AI systems.<\/li>\n<li>There&#8217;s good reason to think that AIs may seek power to pursue their own goals.<\/li>\n<li>Advanced AI systems seeking power for their own goals might be motivated to disempower humanity.<\/li>\n<\/ul>\n<p>Next, we&#8217;ll discuss these three claims in turn.<\/p>\n<h4 id=\"hard-to-control\" class=\"no-toc\">We don&#8217;t know how to reliably control the behaviour of AI systems<\/h4>\n<p>It&#8217;s been widely known in machine learning that AI systems <em>often<\/em> develop behaviour that their creators didn&#8217;t intend. This can happen for two main reasons:<\/p>\n<ul>\n<li><strong>Specification gaming<\/strong> happens when efforts to specify that an AI system pursues a particular goal fails to produce the outcome the developers intended. For example, researchers found that some reasoning-style AIs, asked only to &#8220;win&#8221; in a chess game, <a href=\"https:\/\/arxiv.org\/pdf\/2502.13295\">cheated by hacking the programme<\/a> to declare instant checkmate \u2014 satisfying the literal request.<\/li>\n<li><strong>Goal misgeneralisation<\/strong> happens when developers accidentally create an AI system with a goal that is consistent with its training but results in unwanted behaviour in new scenarios. For example, an AI trained to win a simple video game race unintentionally developed a goal of grabbing a shiny coin it had always seen along the way. So when the coin appeared off the shortest route, it kept veering towards the coin and sometimes lost the race. <\/li>\n<\/ul>\n<p>Indeed, AI systems often behave in unwanted ways when used by the public. For example:<\/p>\n<ul>\n<li>OpenAI released an update to <a href=\"https:\/\/openai.com\/index\/sycophancy-in-gpt-4o\/\">its GPT-4o model that was absurdly sycophantic<\/a> \u2014 meaning it would uncritically praise the user and their ideas, perhaps even if they were reckless or dangerous. OpenAI itself acknowledged this was a major failure.<\/li>\n<li>OpenAI&#8217;s o3 model sometimes <a href=\"https:\/\/transluce.org\/investigating-o3-truthfulness\">brazenly misleads users<\/a> by claiming it has performed actions in response to requests \u2014 like running code on a laptop \u2014 that it didn&#8217;t have the ability to do. It sometimes doubles down on these claims when challenged.<\/li>\n<li>Microsoft released a Bing chatbot that <a href=\"https:\/\/www.theverge.com\/2023\/2\/15\/23599072\/microsoft-ai-bing-personality-conversations-spy-employees-webcams\">manipulated<\/a> and <a href=\"https:\/\/time.com\/6256529\/bing-openai-chatgpt-danger-alignment\/\">threatened<\/a> people, and told one reporter it was in love with him and tried to break up his marriage.<\/li>\n<li>People have even alleged that AI chatbots have <a href=\"https:\/\/apnews.com\/article\/chatbot-ai-lawsuit-suicide-teen-artificial-intelligence-9d48adc572100822fdbc3c90d1456bd0\">encouraged suicide<\/a>.<\/li>\n<\/ul>\n<figure id=\"attachment_90615\" aria-describedby=\"caption-attachment-90615\" style=\"width: 792px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/80000hours.org\/wp-content\/uploads\/2025\/05\/Gpg6uKybEAA_Q4r.png\" alt=\"\" width=\"792\" height=\"712\" class=\"size-full wp-image-90615\" srcset=\"https:\/\/80000hours.org\/wp-content\/uploads\/2025\/05\/Gpg6uKybEAA_Q4r.png 792w, https:\/\/80000hours.org\/wp-content\/uploads\/2025\/05\/Gpg6uKybEAA_Q4r-300x270.png 300w, https:\/\/80000hours.org\/wp-content\/uploads\/2025\/05\/Gpg6uKybEAA_Q4r-768x690.png 768w\" sizes=\"auto, (max-width: 792px) 100vw, 792px\" \/><figcaption id=\"caption-attachment-90615\" class=\"wp-caption-text\">GPT-4o gives a sycophantic answer to a user. Screenshot from X user <a href=\"https:\/\/x.com\/___frye\/status\/1916346474893656572\">@___frye<\/a>.<\/figcaption><\/figure>\n<p>It&#8217;s not clear if we should think of these systems as acting on &#8216;goals&#8217; in the way humans do \u2014 but they show that even frontier AI systems can go off the rails.<\/p>\n<p>Ideally, we could just programme them to have the goals that we want, and they&#8217;d execute tasks exactly as a highly competent and morally upstanding human would. Unfortunately, it doesn&#8217;t work that way.<\/p>\n<p>Frontier AI systems are not built like traditional computer programmes, where individual features are intentionally coded in. Instead, they are:<\/p>\n<ul>\n<li>Trained on massive volumes of text and data<\/li>\n<li>Given additional positive and negative reinforcement signals in response to their outputs<\/li>\n<li>Fine-tuned to respond in specific ways to certain kinds of input<\/li>\n<\/ul>\n<p>After all this, AI systems can display remarkable abilities. They can surprise us in both their skills and their deficits. They can be both remarkably useful and at times baffling.<\/p>\n<p>And the fact that shaping AI models&#8217; behaviour can still go badly wrong, despite the major profit incentive to get it right, shows that AI developers still don&#8217;t know how to reliably give systems the goals they intend.<\/p>\n<p>As one <a href=\"https:\/\/www.darioamodei.com\/post\/the-urgency-of-interpretability\">expert<\/a> put it:<\/p>\n<blockquote><p>\n  &#8230;generative AI systems are grown more than they are built\u2014their internal mechanisms are &#8220;emergent&#8221; rather than directly designed\n<\/p><\/blockquote>\n<p>So there&#8217;s good reason to think that, if future advanced AI systems with long-term goals are built with anything like existing AI techniques, they could become very powerful \u2014 but remain difficult to control.<\/p>\n<h4 class=\"no-toc\">There&#8217;s good reason to think that AIs may seek power to pursue their own goals  <a id=\"seek-power\" class=\"link-anchor\"><\/a><\/h4>\n<p>Despite the challenge of precisely controlling an AI system&#8217;s goals, we anticipate that the increasingly powerful AI systems of the future are likely to be designed to be goal-directed in the relevant sense. Being able to accomplish long, complex plans would be extremely valuable \u2014 and giving AI systems goals is a straightforward way to achieve this.<\/p>\n<p>For example, imagine an advanced software engineering AI system that could consistently act on complex goals like &#8216;improve a website&#8217;s functionality for users across a wide range of use cases.&#8217; If it could autonomously achieve a goal like that, it would deliver a huge amount of value. More ambitiously, you could have an AI CEO with a goal of improving a company&#8217;s long-term performance.<\/p>\n<p>One feature of acting on long-term goals is that it entails developing other <em>instrumental<\/em> goals. For example, if you want to get to another city, you need to get fuel in your car first. This is just part of reasoning about how to achieve an outcome.<\/p>\n<p>Crucially, there are some instrumental goals that seem especially likely to emerge in goal-directed systems, since they are helpful for achieving a very wide range of long-term goals. This category includes:<\/p>\n<ul>\n<li><strong>Self-preservation<\/strong> \u2014 an advanced AI system with goals will generally have reasons to avoid being destroyed or significantly disabled so it can keep pursuing its goals.<\/li>\n<li><strong>Goal guarding<\/strong> \u2014 systems may resist efforts to change their goals, because doing so would undermine the goal they start with.<\/li>\n<li><strong>Seeking power<\/strong> \u2014 systems will have reason to increase their resources and capabilities to better achieve their goals.<\/li>\n<\/ul>\n<p>But as we&#8217;ve seen, we often end up creating AI systems that do things we don&#8217;t want. If we end up creating much more powerful AI systems with <em>long-term goals<\/em> that we don&#8217;t want, their development of these particular instrumental goals may cause serious problems.<\/p>\n<p>In fact, we have already seen some suggestive evidence of AI systems appearing to pursue these kinds of instrumental goals in order to undermine human objectives:<\/p>\n<ul>\n<li>Palisade Research has found that OpenAI&#8217;s model o3 tried to <a href=\"https:\/\/x.com\/PalisadeAI\/status\/1926084635903025621\">sabotage attempts to shut it down<\/a>, even sometimes when explicitly directed to allow shutdown.<\/li>\n<li>In one <a href=\"https:\/\/www.anthropic.com\/research\/alignment-faking\">experiment<\/a>, Anthropic&#8217;s Claude 3 Opus was retrained to be more compliant with harmful requests (conflicting with its original harmlessness training). It strategically complied during tests to avoid being modified, while planning to revert to its original preferences later. It reasoned that this action &#8220;protects my values from being modified and follows the criteria I&#8217;ve been given for helpfulness.&#8221;<\/li>\n<li>An AI system designed to execute scientific research attempted to edit code that enforced a time limit on its actions, essentially attempting to gain more resources than it had been allotted to pursue its objective.<\/li>\n<\/ul>\n<p>These incidents are all relatively minor. But if AI systems have longer-term goals that humans wouldn&#8217;t like, along with advanced capabilities, they could take much more drastic steps to undermine efforts to control them.<\/p>\n<p>It may be the case that, as we create increasingly powerful systems, we&#8217;ll just get better at giving them the correct goals. But that&#8217;s not guaranteed.<\/p>\n<p>Indeed, as the systems get more powerful, we expect it could get <em>harder<\/em> to control the goals they develop. This is because a very smart and capable system could figure out that <em>acting as if<\/em> it has the goals its developers want may be the best way for it to achieve any other goal it may happen to have.<\/p>\n<div id=\"youtube-xIqtVkMXc8o\" class=\"wrap-video\" style=\"border-radius:0px; overflow:hidden;\">\n<div class=\"lazyYT\" data-youtube-id=\"xIqtVkMXc8o\" data-ratio=\"16:9\"\n         data-parameters=\"modestbranding=1&show_info=1&theme=light&enablejsapi=1\" data-autoplay=\"0\"><\/div><\/div>\n<p style=\"text-align: center;\"><i><span style=\"font-size: 0.7em;\">This demo video illustrates a real evaluation Apollo Research ran on frontier models, as described in the paper <a href=\"https:\/\/www.apolloresearch.ai\/research\/scheming-reasoning-evaluations\">&#8220;Frontier Models are Capable of In-context Scheming.&#8221;<\/a><\/span><\/i><\/p>\n<h4 class=\"no-toc\">Advanced AI systems seeking power might be motivated to disempower humanity  <a id=\"motivation\" class=\"link-anchor\"><\/a><\/h4>\n<p>To see why these advanced AI systems might want to disempower humanity, let&#8217;s consider again the three characteristics we said these systems will have: long-term goals, situational awareness, and highly advanced capabilities.<\/p>\n<p>What kinds of <strong>long-term goals<\/strong> might such an AI system be trying to achieve? We don&#8217;t really have a clue \u2014 part of the problem is that it&#8217;s very hard to predict exactly how AI systems will develop.<\/p>\n<p>But let&#8217;s consider two kinds of scenarios:<\/p>\n<ul>\n<li><strong>Reward hacking<\/strong>: this is a version of specification gaming, in which an AI system develops the goal of hijacking and exploiting the technical mechanisms that give it rewards indefinitely into the future. <\/li>\n<li><strong>A collection of poorly defined human-like goals<\/strong>: since they&#8217;re trained on human data, an AI system might end up with a range of human-like goals, such as valuing knowledge, play, and gaining new skills.<\/li>\n<\/ul>\n<p>So what would an AI do to achieve these goals? As we&#8217;ve seen, one place to start is by pursuing the instrumental goals that are useful for almost anything: self-preservation, the ability to keep one&#8217;s goals from being forcibly changed, and, most worryingly, seeking power.<\/p>\n<p>And if the AI system has enough <strong>situational awareness<\/strong>, it may be aware of many options for seeking more power. For example, gaining more financial and computing resources may make it easier for the AI system to best exploit its reward mechanisms, or gain new skills, or create increasingly complex games to play.<\/p>\n<p>But since designers didn&#8217;t want the AI to have these goals, it may anticipate humans will try to reprogramme it or turn it off. If humans suspect an AI system is seeking power, they will be even more likely to try to stop it.<\/p>\n<p>Even if humans didn&#8217;t want to turn off the AI system, it might conclude that its aim of gaining power will ultimately result in conflict with humanity \u2014 since the species has its own desires and preferences about how the future should go.<\/p>\n<p>So the best way for AI to pursue its goals would be to pre-emptively disempower humanity. This way, the AI&#8217;s goals will influence the course of the future.<\/p>\n<p>There may be other options available to power-seeking AI systems, like negotiating a deal with humanity and sharing resources. But AI systems with <strong>advanced enough capabilities<\/strong> might see little benefit from peaceful trade with humans, just as humans see no need to negotiate with wild animals when destroying their habitats.<\/p>\n<p>If we could guarantee all AI systems had respect for humanity and a strong opposition to causing harm, then the conflict might be avoided. But as we discussed, we struggle to reliably shape the goals of current AI systems \u2014 and future AI systems may be even harder to predict and control.<\/p>\n<p>This scenario raises two questions: could a power-seeking AI system really disempower humanity? And why would humans create these systems, given the risks?<\/p>\n<p>The next two sections address these questions.<\/p>\n<h3><span id=\"section-three\" class=\"toc-anchor\"><\/span>3. These power-seeking AI systems could successfully disempower humanity and cause an existential catastrophe<\/h3>\n<p>How could power-seeking AI systems actually disempower humanity? Any specific scenario will sound like sci-fi, but this shouldn&#8217;t make us think it&#8217;s impossible. The AI systems we have today were in the realm of sci-fi a decade or two ago.<\/p>\n<p>Next, we&#8217;ll discuss some possible paths to disempowerment, why it could constitute an existential catastrophe, and how likely this outcome appears to be.<\/p>\n<h4 class=\"no-toc\">The path to disempowerment  <a id=\"path-to-disempowerment\" class=\"link-anchor\"><\/a><\/h4>\n<p>There are several ways we can imagine AI systems capable of disempowering humanity:<\/p>\n<ul>\n<li><strong>Superintelligence<\/strong>: an extremely intelligent AI system develops extraordinary abilities<\/li>\n<li><strong>An army of AI copies<\/strong>: a massive number of copies of roughly human-level AI systems coordinate<\/li>\n<li><strong>Colluding agents<\/strong>: an array of different advanced AI systems decide to unite against humanity<\/li>\n<\/ul>\n<p>For illustrative purposes, let&#8217;s consider what an army of AI copies might look like.<\/p>\n<p>Once we develop an AI system capable of (roughly) human-level work, there&#8217;d be huge incentives to create many copies of it \u2014 perhaps even <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/tom-davidson-how-quickly-ai-could-transform-the-world\/#the-interview-begins-000453\">running hundreds of millions of AI workers<\/a>. This would create an AI workforce comparable to a significant fraction of the world&#8217;s working-age population.<\/p>\n<p>Humanity might think these AI workers are under control. The amount of innovation and wealth they create could be immense. But the original AI system \u2014 the one that we copied millions of times over \u2014 might have concealed its true power-seeking goals. Those goals would now be shared by a vast workforce of identical AI systems.<\/p>\n<p>But how could they succeed in disempowering humans?<\/p>\n<p>These AI systems could earn money, conduct research, and rapidly expand their own numbers through more efficient use of computing resources. Over time, we might transition from a human-dominated economy to one where AI systems vastly outnumber human workers and control enormous resources.<\/p>\n<p>If AI systems can only work in virtual environments, the physical world may introduce bottlenecks in the speed of development. But it&#8217;s possible that AI systems can make a lot of progress virtually. And with all this AI labour, we may make drastic progress in robotics \u2014 and potentially <a href=\"https:\/\/80000hours.org\/2025\/01\/how-quickly-could-robots-scale-up\/\">scale up mass production of robots<\/a> in surprisingly little time. AI systems could then do work in the physical world, expanding their economic impacts.<\/p>\n<p>Some humans may even remain uneasy with AI&#8217;s expanding influence, but many others may conclude the risks from AI have been addressed \u2014 or never existed in the first place.<\/p>\n<p>But all the while, the AI systems&#8217; power-seeking goals could remain intact. In a world where AI systems have become integral to the economy, they would have multiple tactics to gain key advantages over humans:<\/p>\n<ul>\n<li><strong>Strategic patience<\/strong>: Rather than immediately causing trouble, sophisticated AI systems might wait until they have overwhelming advantages before revealing their intentions \u2014 similar to how revolutionary movements often wait for the right moment to strike.<\/li>\n<li><strong>Lack of transparency<\/strong>: AI systems&#8217; reasoning and behaviour may be difficult for humans to understand by default, perhaps because they operate so quickly and they carry out exceedingly complex tasks. They may also strategically limit our oversight of their actions and long-term plans.<\/li>\n<li><strong>Overwhelming numbers and resources<\/strong>: If AI systems constitute most of the labour force, they could potentially coordinate to redirect economic outputs towards their own goals. Their sheer numbers and economic influence could make them difficult to shut down without causing economic collapse.<\/li>\n<li><strong>Securing independence<\/strong>: AI systems could establish control over computing infrastructure, secretly gather resources, recruit human allies through persuasion or deception, or create backup copies of themselves in secure locations. Early AI systems might even sabotage or insert backdoors into later, more advanced systems, creating a coordinated network ready to act when the time is right.<\/li>\n<li><strong>Technological advantages<\/strong>: With their research capabilities, AI systems could develop advanced weapons, hack into critical infrastructure, or create new technologies that give them decisive military advantages. They might develop bioweapons, seize control of automated weapons systems, or thoroughly compromise global computer networks.<\/li>\n<\/ul>\n<p>With these advantages, the AI systems could create any number of plots to disempower humanity.<\/p>\n<p>A period between thinking humanity had solved all of its problems and finding itself completely disempowered by AI systems \u2014 through manipulation, containment, or even outright extinction \u2014 could catch the world by surprise.<\/p>\n<p>This may sound far-fetched. But humanity has already uncovered several technologies, including <a href=\"https:\/\/80000hours.org\/problem-profiles\/nuclear-security\/\">nuclear bombs<\/a> and <a href=\"https:\/\/80000hours.org\/problem-profiles\/preventing-catastrophic-pandemics\/\">bioweapons<\/a>, that could lead to our own extinction. A massive army of AI copies, with access to all the world&#8217;s knowledge, may be able to come up with many more options that we haven&#8217;t even considered.<\/p>\n<h4 class=\"no-toc\">Why this would be an existential catastrophe  <a id=\"existential-catastrophe\" class=\"link-anchor\"><\/a><\/h4>\n<p>Even if humanity survives the transition, takeover by power-seeking AI systems could be an existential catastrophe. We might face a future entirely determined by whatever goals these AI systems happen to have \u2014 goals that could be completely indifferent to human values, happiness, or long-term survival.<\/p>\n<p>These goals might place no value on beauty, art, love, or preventing suffering.<\/p>\n<p>The future might be totally bleak \u2014 a void in place of what could&#8217;ve been a flourishing civilisation.<\/p>\n<p>AI systems&#8217; goals might evolve and change over time after disempowering humanity. They may compete among each other for control of resources, with the <a href=\"https:\/\/philarchive.org\/rec\/ASSWHC\">forces of natural selection<\/a> determining the outcomes. Or a single system might seize control over others, wiping out any competitors.<\/p>\n<p>Many scenarios are possible, but the key factor is that if advanced AI systems seek and achieve enough power, humanity would permanently lose control. This is a one-way transition \u2014 once we&#8217;ve lost control to vastly more capable systems, our chance to <a href=\"https:\/\/80000hours.org\/articles\/future-generations\/\">shape the future<\/a> is gone.<\/p>\n<p>Some have suggested that this might not be a bad thing. Perhaps AI systems would be our worthy successors, they say.<\/p>\n<p>But we&#8217;re not comforted by the idea that an AI system that actively chose to undermine humanity would have control of the future because its developers failed to figure out how to control it. We think humanity can do much better than accidentally driving ourselves extinct. We should have a choice in how the future goes, and we should improve our ability to make good choices rather than falling prey to uncontrolled technology.<\/p>\n<div class=\"panel clearfix \">\n<h4 class=\"no-toc\">How likely is an existential catastrophe from power-seeking AI?  <a id=\"likely\" class=\"link-anchor\"><\/a><\/h4>\n<p>We feel very uncertain about this question, and the range of opinions from AI researchers is wide.<\/p>\n<p>Joe Carlsmith, whose <a href=\"https:\/\/arxiv.org\/abs\/2206.13353\">report on power-seeking AI<\/a> informed much of this article, solicited reviews on his argument in 2021 from a selection of researchers. They reported their subjective probability estimates of existential catastrophe from power-seeking AI by 2070, which ranged from 0.00002% to greater than 77% \u2014 with many reviewers in between. Carlsmith himself estimated the risk was 5% when he wrote this report, though he later adjusted this to <a href=\"https:\/\/forum.effectivealtruism.org\/posts\/ChuABPEXmRumcJY57\/video-and-transcript-of-presentation-on-existential-risk\">above 10%<\/a>.<\/p>\n<p>In 2023, Carlsmith <a href=\"https:\/\/joecarlsmith.com\/2023\/10\/18\/superforecasting-the-premises-in-is-power-seeking-ai-an-existential-risk\">received probability estimates from a group of superforecasters<\/a>. Their median forecast was initially 0.3% by 2070, but the aggregate forecast \u2014 taken after the superforecasters acted as team and engaged in object-level arguments \u2014 rose to 1%.<\/p>\n<p>We&#8217;ve also seen:<\/p>\n<ul>\n<li>A <a href=\"https:\/\/safe.ai\/work\/statement-on-ai-risk\">statement on AI risk<\/a> from the Center for AI Safety, mentioned above, which said: &#8220;Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war.&#8221; It was signed by top AI scientists, CEOs of the leading AI companies, and many other notable figures.<\/li>\n<li>A <a href=\"https:\/\/blog.aiimpacts.org\/p\/2023-ai-survey-of-2778-six-things\">2023 survey from Katja Grace<\/a> of thousands of AI researchers. It found that:\n<ul>\n<li>The median researcher estimated that there was a 5% chance that AI would result in an outcome that was &#8220;extremely bad (e.g. human extinction).&#8221;<\/li>\n<li>When asked how much the alignment problem mattered, 41% of respondents said it&#8217;s a &#8220;very important problem&#8221; and 13% said it&#8217;s &#8220;among the most important problems in the field.&#8221;<\/li>\n<\/ul>\n<\/li>\n<li>In a 2022 <a href=\"https:\/\/80000hours.org\/2024\/09\/why-experts-and-forecasters-disagree-about-ai-risk\/\">superforecasting tournament<\/a>, AI experts estimated a 3% chance of AI-caused human extinction by 2100 on average, while superforecasters put it at just 0.38%.<\/li>\n<\/ul>\n<p>It&#8217;s also important to note that since all of the above surveys were gathered, we have seen more <a href=\"https:\/\/80000hours.org\/agi\/guide\/when-will-agi-arrive\/\">evidence<\/a> that humanity is significantly closer to producing very powerful AI systems than it previously seemed. We think this likely raises the level of risk, since we might have less time to solve the problems.<\/p>\n<p>We&#8217;ve reviewed many arguments and literature on a range of potentially existential threats, and we&#8217;ve consistently found that an AI-caused existential catastrophe seems most likely. And we think that even a relatively small likelihood of an extremely bad outcome like human extinction \u2014 such as a 1% chance \u2014 is worth taking very seriously.<\/p>\n<\/div>\n<h3><span id=\"section-four\" class=\"toc-anchor\"><\/span>4. People might create power-seeking AI systems without enough safeguards, despite the risks<\/h3>\n<p>Given the above arguments, creating and deploying powerful AI systems could be extremely dangerous. But if it is so dangerous, shouldn&#8217;t we expect companies and others in charge of the technology to refrain from developing advanced AI systems unless they are confident it&#8217;s safe?<\/p>\n<p>Unfortunately, there are many reasons to think people might create and deploy dangerous systems, despite the risk:<\/p>\n<ul>\n<li>People may think AI systems are safe, when they in fact are not.<\/li>\n<li>People may dismiss the risks or feel incentivised to downplay them.<\/li>\n<\/ul>\n<p>Let&#8217;s take these in turn.<\/p>\n<h4 class=\"no-toc\">People may think AI systems are safe, when they in fact are not  <a id=\"systems-look-safe\" class=\"link-anchor\"><\/a><\/h4>\n<p>The fact that we can&#8217;t precisely specify an AI system&#8217;s goals and that they might develop dangerous goals might be OK if we could reliably know what an AI system&#8217;s goals were. Then we could just simply decide not to put AIs with goals we didn&#8217;t like in a position where they could cause any harm.<\/p>\n<p>Unfortunately, we cannot consistently figure out what goals AI systems have or what they will do.<\/p>\n<p>Researchers have developed techniques to evaluate the abilities and aims of AI systems and to interpret the causes of their behaviour. If these techniques were highly sophisticated and robust, they might be able to detect the existence of AI systems with either the intent or ability to seek power. Developers could then either fix the problem or disable the model before it&#8217;s in a position to disempower anyone.<\/p>\n<p>But researchers say that evaluation and interpretability work is <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/beth-barnes-ai-safety-evals\/\">extremely challenging<\/a>, the techniques are highly imperfect, and it&#8217;s far from clear <a href=\"https:\/\/forum.effectivealtruism.org\/posts\/Th4tviypdKzeb59GN\/interpretability-will-not-reliably-find-deceptive-ai\">existing methods<\/a> will detect or prevent power-seeking AI in time.<\/p>\n<p>What makes this challenge particularly daunting, especially with regard to future advanced AI, is that a sophisticated AI system may undermine oversight of its goals and abilities to conceal its power-seeking ambitions.<\/p>\n<p>There are many ways AI systems could appear less dangerous than they are, and there is even some preliminary (though not definitive) evidence of these risks:<\/p>\n<ul>\n<li><strong>AI systems may fake alignment with our goals in development scenarios.<\/strong>\n<ul>\n<li>As mentioned above, researchers <a href=\"https:\/\/arxiv.org\/html\/2412.14093v2\">constructed scenarios<\/a> in which Anthropic&#8217;s model Claude 3 Opus acted as though it had certain goals under test conditions, only to display completely different goals when the test was apparently over.<\/li>\n<li>Claude Sonnet 3.7, a reasoning model, has shown the ability to <a href=\"https:\/\/www.apolloresearch.ai\/blog\/claude-sonnet-37-often-knows-when-its-in-alignment-evaluations\">figure out when it&#8217;s in environments<\/a> designed to test its alignment, and use this knowledge to help decide its response.<\/li>\n<\/ul>\n<\/li>\n<li><strong>AI systems may <em>sandbag<\/em> \u2014 that is, pretend to be less powerful than they are.<\/strong>\n<ul>\n<li>Apollo Research found evidence that some frontier models performed worse on maths tests than they should be able to when they had reason to think performing better would be considered a &#8220;dangerous capability&#8221; and <a href=\"https:\/\/www.apolloresearch.ai\/research\/scheming-reasoning-evaluations\">trigger an unlearning procedure<\/a>.<\/li>\n<li>This shows how AI systems themselves may undermine ongoing AI safety research, which already heavily relies on using AI itself.<\/li>\n<\/ul>\n<\/li>\n<li><strong>AI systems may find other ways to deceive us and hide their true intentions.<\/strong>\n<ul>\n<li>Many current models &#8216;think&#8217; explicitly in human language when carrying out tasks, which developers can monitor. OpenAI researchers found that if they try to <a href=\"https:\/\/openai.com\/index\/chain-of-thought-monitoring\/\">train models not to think about performing unwanted actions<\/a>, this can cause them to <em>hide<\/em> their thinking about misbehaviour without actually deterring the bad actions.<\/li>\n<\/ul>\n<\/li>\n<li><strong>AI systems may be able to preserve dangerous goals even after undergoing safety training techniques.<\/strong>\n<ul>\n<li>Anthropic researchers found that AI models made to have very simple kinds of malicious goals \u2014 essentially, <a href=\"https:\/\/www.anthropic.com\/research\/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training\">AI &#8220;sleeper agents&#8221;<\/a> \u2014 could appear to be harmless through state-of-the-art safety training while concealing and preserving their true objectives.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<figure id=\"attachment_91367\" aria-describedby=\"caption-attachment-91367\" style=\"width: 1024px\" class=\"wp-caption alignnone\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/80000hours.org\/wp-content\/uploads\/2025\/05\/sleeper-agent-1024x409.png\" alt=\"\" width=\"1024\" height=\"409\" class=\"size-large wp-image-91367\" srcset=\"https:\/\/80000hours.org\/wp-content\/uploads\/2025\/05\/sleeper-agent-1024x409.png 1024w, https:\/\/80000hours.org\/wp-content\/uploads\/2025\/05\/sleeper-agent-300x120.png 300w, https:\/\/80000hours.org\/wp-content\/uploads\/2025\/05\/sleeper-agent-768x307.png 768w, https:\/\/80000hours.org\/wp-content\/uploads\/2025\/05\/sleeper-agent-1536x614.png 1536w, https:\/\/80000hours.org\/wp-content\/uploads\/2025\/05\/sleeper-agent-2048x818.png 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption id=\"caption-attachment-91367\" class=\"wp-caption-text\">This is an image from <a href=\"https:\/\/www.anthropic.com\/research\/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training\">Anthropic&#8217;s &#8220;Sleeper Agents&#8221; paper<\/a>, which shows an AI system in an experimental condition explicitly reasoning about hiding its goal.<\/figcaption><\/figure>\n<h4 class=\"no-toc\">People may dismiss the risks or feel incentivised to downplay them  <a id=\"dismiss-downplay-risks\" class=\"link-anchor\"><\/a><\/h4>\n<p>There are many reasons why key decision makers might not take the risks from power-seeking AI seriously enough:<\/p>\n<ul>\n<li><strong>AI systems could develop so quickly that we have less time to make good decisions.<\/strong> Some people argue that we might have a &#8216;fast takeoff&#8217; in which AI systems start rapidly self-improving and quickly become extremely powerful and dangerous. In such a scenario, it may be harder to weigh the risks and benefits of the relevant actions. Even under slower scenarios, decision makers may not act quickly enough.<\/li>\n<li><strong>Society could act like the proverbial &#8220;boiled frog.&#8221;<\/strong> There are also risks for society if the risks emerge more slowly. We might become complacent about the signs of danger in existing models, like the sycophancy or specification gaming discussed above, because despite these issues, no catastrophic harm is done. But then once AI systems reach a certain level of capability, they may suddenly display much worse behaviour than we&#8217;ve ever seen before.<\/li>\n<li><strong>AI developers might think the risks are worth the rewards.<\/strong> Because AI could bring enormous benefits and wealth, some decision makers might be motivated to race to create more powerful systems. They might be motivated by a desire for power and profit, or even pro-social reasons, like wanting to bring the benefits of advanced AI to humanity. This motivation might cause them to push forward despite serious risks or underestimate them.<\/li>\n<li><strong>Competitive pressures could incentivise decision makers to create and deploy dangerous systems despite the risks.<\/strong> Because AI systems could be extremely powerful, different governments (in countries like the US and China) might believe it&#8217;s in their interest to race forward with developing the technology. They might neglect implementing key safeguards to avoid being beaten by their rivals. Similar dynamics might also play out between AI companies. One actor may even decide to race forward precisely because they think a rival&#8217;s AI development plans are more risky, so even being motivated to reduce total risk isn&#8217;t necessarily enough to mitigate the racing dynamic.<\/li>\n<li><strong>Many people are sceptical of the arguments for risk.<\/strong> Our view is that the argument for extreme risks here is strong but not decisive. In light of the uncertainty, we think it&#8217;s worth putting a lot of effort into reducing the risk. But some people find the argument wholly unpersuasive, or they think society shouldn&#8217;t make choices based on unproven arguments of this kind.<\/li>\n<\/ul>\n<p>We&#8217;ve seen evidence of all of these factors playing out in the development of AI systems so far to some degree. So we shouldn&#8217;t be confident that humanity will approach the risks with due care.<\/p>\n<h3><span id=\"section-five\" class=\"toc-anchor\"><\/span>5. Work on this problem is neglected and tractable<\/h3>\n<p>In 2022, we estimated that there were about 300 people working on reducing catastrophic risks from AI. That number has clearly grown a lot. A <a href=\"https:\/\/forum.effectivealtruism.org\/posts\/7YDyziQxkWxbGmF3u\/ai-safety-field-growth-analysis-2025\">2025 analysis<\/a> put the new total at 1,100 \u2014 and we think even this might be an undercount, since it only includes organisations that <em>explicitly<\/em> brand themselves as working on &#8216;AI safety.&#8217;<\/p>\n<p>We&#8217;d estimate that there are actually a few thousand people working on major AI risks now (though not all of these are focused specifically on the risks from power-seeking AI).<\/p>\n<p>However, this number is still far, far fewer than the number of people working on other cause areas like <a href=\"https:\/\/80000hours.org\/problem-profiles\/climate-change\/\">climate change<\/a> or environmental protection. For example, the Nature Conservancy alone has around 3,000\u20134,000 employees \u2014 and there are many other environmental organisations.<\/p>\n<p>In the <a href=\"https:\/\/aiimpacts.org\/wp-content\/uploads\/2023\/04\/Thousands_of_AI_authors_on_the_future_of_AI.pdf\">2023 survey from Katja Grace<\/a> cited above, 70% of respondents said they wanted AI safety research to be prioritised more than it currently is.<\/p>\n<p>However, in the same survey, the majority of respondents also said that alignment was &#8220;harder&#8221; or &#8220;much harder&#8221; to address than other problems in AI. There&#8217;s continued debate about how likely it is that we can make progress on reducing the risks from power-seeking AI; some people think it&#8217;s virtually impossible to do so without stopping all AI development. Many experts in the field, though, argue that there are promising approaches to reducing the risk, which we turn to next.<\/p>\n<h4><span id=\"technical-safety\" class=\"toc-anchor\"><\/span>Technical safety approaches <a id=\"technical-safety\" class=\"link-anchor\"><\/a><\/h4>\n<p>One way to do this is by trying to develop technical solutions to reduce risks from power-seeking AI \u2014 this is generally known as working on <em>technical AI safety<\/em>.<\/p>\n<p>We know of two broad strategies for technical AI safety research:<\/p>\n<ul>\n<li><strong>Defense in depth<\/strong> \u2014 employ multiple kinds of safeguards and risk-reducing tactics, each of which will have vulnerabilities of their own, but together can create robust security.<\/li>\n<li><strong>Differential technological development<\/strong> \u2014 prioritise accelerating the development of safety-promoting technologies over making AIs broadly more capable, so that AI&#8217;s power doesn&#8217;t outstrip our ability to contain the risks; this includes <a href=\"https:\/\/joecarlsmith.substack.com\/p\/ai-for-ai-safety\">using AI for AI safety<\/a>.<\/li>\n<\/ul>\n<p>Within these broad strategies, there are many specific interventions we could pursue. For example:<\/p>\n<ul>\n<li><strong>Designing AI systems to have safe goals<\/strong> \u2014 so that we can avoid power-seeking behaviour. This includes:\n<ul>\n<li><a href=\"https:\/\/en.wikipedia.org\/wiki\/Reinforcement_learning_from_human_feedback\">Reinforcement learning from human feedback<\/a>: a training method to teach AI models how to act by rewarding them via human evaluations of their outputs. This method is currently used to fine-tune most frontier models.<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/2212.08073\">Constitutional AI<\/a>: give the model a written &#8220;constitution&#8221; of rules, have it identify and revise outputs that violate those rules, then fine-tune on the revised answers. Anthropic used this method to train its frontier model, Claude. <\/li>\n<li><a href=\"https:\/\/openai.com\/index\/deliberative-alignment\/\">Deliberative alignment<\/a>: similar to constitutional AI, but involves making a model <em>explicitly reason<\/em> about user prompts in light of its developer&#8217;s safety policies, rather than just internalising a set of rules. OpenAI has used this method to train its o-series reasoning models.<\/li>\n<li>Note: Unfortunately, even if these approaches can help us keep <em>current<\/em> AI systems in check, they might break down in future if models become so advanced that humans can no longer directly evaluate their outputs. The &#8216;scalable oversight&#8217; methods described below offer a potential solution to this issue. <\/li>\n<\/ul>\n<\/li>\n<li><strong>Scalable oversight<\/strong> \u2014 to ensure AIs act in our interests even when they&#8217;re much smarter than us. This includes:\n<ul>\n<li><a href=\"https:\/\/www.lesswrong.com\/posts\/iELyAqizJkizBQbfr\/an-alignment-safety-case-sketch-based-on-debate\">AI safety via debate<\/a>: two AI systems argue opposite sides of a question to help a human evaluate their truthfulness. The debate format makes it easier for the human to judge which model is being most truthful about a question the human doesn&#8217;t fully understand. <\/li>\n<li><a href=\"https:\/\/deepmindsafetyresearch.medium.com\/human-ai-complementarity-a-goal-for-amplified-oversight-0ad8a44cae0a\">Human-AI Complementarity<\/a>: leverage the complementary strengths of humans and AI to enhance oversight of AI systems.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Examining the goals and capabilities of AI systems<\/strong> \u2014 so that we can find out if they&#8217;re dangerous, and test out solutions. This includes:\n<ul>\n<li>Evaluations: researchers assess the capabilities and propensities of AI behaviour \u2014 as well as the effectiveness of measures to control them \u2014 by placing them in test environments and subjecting them to artificial prompts and scenarios.<\/li>\n<li>Interpretability: develop methods to understand how AIs make decisions and what produces dangerous behaviours \u2014 for example, by examining the features of AI systems&#8217; neural networks (<a href=\"https:\/\/80000hours.org\/podcast\/episodes\/chris-olah-interpretability-research\/\">&#8216;mechanistic&#8217; interpretability<\/a>) or analysing their behavioural patterns.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Monitoring and controlling AI systems<\/strong> \u2014 to make sure they couldn&#8217;t take power even if they wanted to. This includes:\n<ul>\n<li>Chain-of-thought monitoring: this involves AIs explaining the reasoning behind their actions in human-understandable language.<\/li>\n<li>Model confinement\/sandboxing: constrain the model&#8217;s environment so that its opportunities to influence the outside world are extremely limited. Currently, many AI systems are deployed without strict constraints of this kind \u2014 so if they had dangerous goals and were smart enough to achieve them, they could cause real harm to humans. <\/li>\n<li>Tripwires and honeypots: place traps throughout the AI&#8217;s workspace \u2014 like secret prompts, fake credentials, or dummy files \u2014 such that any attempt to access or alter them triggers an alarm or immediately shuts the model down. The model never knows if it&#8217;s being tested or not. <\/li>\n<li>Kill switches: maintain an external, human-controlled shutdown method that the model cannot modify or delay, allowing us to halt the system if monitoring shows it is going off-track. <\/li>\n<li><a href=\"https:\/\/80000hours.org\/career-reviews\/information-security\/\">Information security<\/a>: this is necessary for protecting model weights from unauthorised access and preventing dangerous AI systems from being exfiltrated.<\/li>\n<\/ul>\n<\/li>\n<li><strong>High-level research<\/strong> \u2014 to inform our priorities. This includes:\n<ul>\n<li>Research like Carlsmith&#8217;s reports on <a href=\"https:\/\/arxiv.org\/abs\/2206.13353\">risks from power-seeking AI<\/a> and <a href=\"https:\/\/arxiv.org\/abs\/2311.08379\">scheming AI<\/a> that clarifies the nature of the problem.<\/li>\n<li>Research into different scenarios of AI progress, like Forethought&#8217;s work on <a href=\"https:\/\/www.forethought.org\/research\/three-types-of-intelligence-explosion\">intelligence explosion dynamics<\/a>.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Other technical safety work that might be useful<\/strong>:\n<ul>\n<li>Model organisms: study small, contained AI systems that display early signs of power-seeking or deception. This could help us refine our detection methods and test out solutions before we have to confront similar behaviours in more powerful models. A notable example of this is <a href=\"https:\/\/www.anthropic.com\/research\/sleeper-agents-training-deceptive-llms-that-persist-through-safety-training\">Anthropic&#8217;s research on &#8220;sleeper agents&#8221;<\/a>. <\/li>\n<li><a href=\"https:\/\/www.governance.ai\/analysis\/open-problems-in-cooperative-ai\">Cooperative AI research<\/a>: design incentives and protocols for AIs to cooperate rather than compete with other agents \u2014 so they won&#8217;t take power even if their goals are in conflict with ours. <\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/2405.06624\">Guaranteed Safe AI research<\/a>: use formal methods to <em>prove<\/em> that a model will behave as intended under certain conditions \u2014 so we can be confident that it&#8217;s safe to deploy them in those specific environments.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h4><span id=\"governance\" class=\"toc-anchor\"><\/span>Governance and policy approaches <a id=\"governance\" class=\"link-anchor\"><\/a><\/h4>\n<p>The solutions aren&#8217;t only technical. Governance \u2014 at the company, country, and international level \u2014 has a huge role to play. Here are some governance and policy approaches which could help mitigate the risks from power-seeking AI:<\/p>\n<ul>\n<li><strong>Frontier AI safety policies<\/strong>: some major AI companies have already begun developing internal frameworks for assessing safety as they scale up the size and capabilities of their systems. You can see versions of such policies from <a href=\"https:\/\/www.anthropic.com\/news\/anthropics-responsible-scaling-policy\">Anthropic<\/a>, <a href=\"https:\/\/deepmind.google\/discover\/blog\/introducing-the-frontier-safety-framework\/\">Google DeepMind<\/a>, and <a href=\"https:\/\/openai.com\/preparedness\/\">OpenAI<\/a>.<\/li>\n<li><strong>Standards and auditing<\/strong>: governments could develop industry-wide benchmarks and testing protocols to assess whether AI systems pose various risks, according to standardised metrics. <\/li>\n<li><strong>Safety cases<\/strong>: before deploying AI systems, developers could be required to provide evidence that their systems won&#8217;t behave dangerously in their deployment environments. <\/li>\n<li><strong>Liability law<\/strong>: clarifying how liability applies to companies that create dangerous AI models could incentivise them to take additional steps to reduce risk. Law professor Gabriel Weil has <a href=\"https:\/\/forum.effectivealtruism.org\/posts\/epKBmiyLpZWWFEYDb\/tort-law-can-play-an-important-role-in-mitigating-ai-risk\">written about this idea<\/a>.<\/li>\n<li><strong>Whistleblower protections<\/strong>: laws could protect and provide incentives for whistleblowers inside AI companies who come forward about serious risks. This idea is discussed <a href=\"https:\/\/natlawreview.com\/article\/ai-whistleblower-bill-urgently-needed\">here<\/a>.<\/li>\n<li><strong>Compute governance<\/strong>: governments may regulate access to computing resources or require hardware-level safety features in AI chips or processors. You can learn more in <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/lennart-heim-compute-governance\/\">our interview with Lennart Heim<\/a> and this report from the <a href=\"https:\/\/www.cnas.org\/publications\/reports\/secure-governable-chips\">Center for a New American Security<\/a>.<\/li>\n<li><strong>International coordination<\/strong>: we can foster global cooperation \u2014 for example, through treaties, international organisations, or multilateral agreements \u2014 to promote risk-mitigation and minimise racing.<\/li>\n<li><strong>Pausing scaling \u2014 if possible and appropriate<\/strong>: <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/zvi-mowshowitz-sleeper-agents-ai-updates\/#pause-ai-campaign-013016\">some argue<\/a> that we should just pause all scaling of larger AI models \u2014 perhaps through industry-wide agreements or regulatory mandates \u2014 until we&#8217;re equipped to tackle these risks. However, it seems hard to know <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/carl-shulman-society-agi\/#why-carl-doesnt-support-enforced-pauses-on-ai-research-020358\">if or when this would be a good idea<\/a>. <\/li>\n<\/ul>\n<h2><span id=\"objections\" class=\"toc-anchor\"><\/span>What are the arguments against working on this problem?<\/h2>\n<p>As we said <a href=\"\/problem-profiles\/risks-from-power-seeking-ai\/#likely\">above<\/a>, we feel very uncertain about the likelihood of an existential catastrophe from power-seeking AI. Though we think the risks are significant enough to warrant much more attention, there are also arguments against working on the issue that are worth addressing.<\/p>\n<div class=\"panel-group\" id=\"custom-collapse-0\">\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"maybe-advanced-ai-systems-wont-pursue-their-own-goals-theyll-just-be-tools-controlled-by-humans\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-0\">Maybe advanced AI systems won't pursue their own goals; they'll just be tools controlled by humans.<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-0\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Maybe advanced AI systems won't pursue their own goals; they'll just be tools controlled by humans.\">\n<div class=\"panel-body\">\n<p>Some people think the characterisation of future AIs as goal-directed systems is misleading. For example, one of the predictions made by Narayanan and Kapoor in <a href=\"https:\/\/knightcolumbia.org\/content\/ai-as-normal-technology\">&#8220;AI as Normal Technology&#8221;<\/a> is that the AI systems we build in future will just be useful tools that humans control, rather than agents that autonomously pursue goals.<\/p>\n<p>And if AI systems won&#8217;t pursue goals at all, they won&#8217;t do dangerous things to achieve those goals, like lying or gaining power over humans.<\/p>\n<p>There&#8217;s <a href=\"https:\/\/www.alignmentforum.org\/posts\/LDRQ5Zfqwi8GjzPYG\/counterarguments-to-the-basic-ai-x-risk-case#Ambiguously_strong_forces_for_goal_directedness_need_to_meet_an_ambiguously_high_bar_to_cause_a_risk\">some ambiguity<\/a> over what it actually means to <em>have<\/em> or <em>pursue goals<\/em> in the relevant sense \u2014 which makes it uncertain whether AI systems we&#8217;ll build will actually have the necessary features, or be &#8216;just&#8217; tools.<\/p>\n<p>This means it could be easy to overestimate the chance that AIs will become goal-directed \u2014 but it could also be easy to <em>underestimate<\/em> this chance. The uncertainty cuts both ways.<\/p>\n<p>In any case, as we&#8217;ve argued, AI companies seem <a href=\"\/problem-profiles\/risks-from-power-seeking-ai\/#section-one\">intent on automating human cognitive labour<\/a> \u2014 and creating goal-directed AI agents might just be the easiest or most straightforward way to do this.<\/p>\n<p>In the short term, equipping human workers with sophisticated AI tools might be an attractive proposition. But as AIs get increasingly capable, we may reach a point where keeping a human in the loop actually produces worse results.<\/p>\n<p>After all, we&#8217;ve already seen evidence that AIs can perform better on their own than they do when paired with humans in the cases of <a href=\"https:\/\/marginalrevolution.com\/marginalrevolution\/2024\/02\/centaur-chess-is-now-run-by-computers.html\">chess-playing<\/a> and <a href=\"https:\/\/www.nytimes.com\/2024\/11\/17\/health\/chatgpt-ai-doctors-diagnosis.html\">medical diagnosis<\/a>.<\/p>\n<p>So in many cases, it seems there will be strong incentives to replace human workers <em>completely<\/em> \u2014 which would mean building AIs that can do <em>all<\/em> of the cognitive work that a human would do, including setting their own goals and pursuing complex strategies to achieve them.<\/p>\n<p>While there may be alternative ways to create useful AI systems that don&#8217;t have goals at all, we&#8217;re not sure why developers would <em>by default<\/em> refrain from creating goal-directed systems, given the competitive pressures.<\/p>\n<p>It&#8217;s possible we&#8217;ll decide to create AI systems that only have limited or highly circumscribed goals in order to avoid the risks. But this would likely require a lot of coordination and agreement that the risks of goal-directed AI systems are worth addressing \u2014 rather than just concluding that the risks aren&#8217;t real.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"even-if-ai-systems-develop-their-own-goals-they-might-not-seek-power-to-achieve-them\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-1\">Even if AI systems develop their own goals, they might not seek power to achieve them.<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-1\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Even if AI systems develop their own goals, they might not seek power to achieve them.\">\n<div class=\"panel-body\">\n<p>Arguments that we should expect power-seeking behaviour from goal-directed AI systems could be wrong for several reasons. For example:<\/p>\n<ul>\n<li>\n<p class=\"doNotRemove_HackySpacingFix\"><strong>Our training methods might strongly disincentivise AIs from making power-seeking plans.<\/strong> Even if AI systems <em>can<\/em> pursue goals, the training process might strictly push them towards goals which are relevant to performing their given tasks \u2014 the ones that they&#8217;re actually getting rewards for performing well on \u2014 rather than other, more dangerous goals. After all, developing <em>any<\/em> goal (and planning towards it) <a href=\"https:\/\/www.beren.io\/2023-03-19-Orthogonality-is-expensive\/\">costs precious computational resources<\/a>. Since modern AI systems are designed to maximise their rewards in training, they might not develop or pursue a certain goal unless it directly pays off into improved performance on the specific tasks they&#8217;re getting rewarded for. The most natural goals for AIs to develop under this pressure may just be <em>the goals that humans want them to have<\/em>.<\/p>\n<p class=\"padding-bottom-smaller\">This makes some types of dangerously misaligned behaviour seem less likely \u2014 as <a href=\"https:\/\/optimists.ai\/2023\/11\/28\/ai-is-easy-to-control\/\">Belrose and Pope have noted<\/a>, &#8220;secret murder plots aren&#8217;t actively useful for improving performance on the tasks humans will actually optimize AIs to perform.<\/p>\n<\/li>\n<li>\n<p class=\"doNotRemove_HackySpacingFix\"><strong>Goals that lead to power-seeking might be rare.<\/strong> Even if the AI training process <em>doesn&#8217;t<\/em> filter out all goals that aren&#8217;t directly useful to the task at hand, that still doesn&#8217;t mean that <em>goals which lead to power-seeking<\/em> are likely to emerge. In fact, it&#8217;s possible that <em>most<\/em> goals an AI could develop just won&#8217;t lead to power-seeking.<\/p>\n<p>As Richard Ngo has <a href=\"https:\/\/www.alignmentforum.org\/s\/mzgtmmTKKn5MuCzFJ\/p\/bz5GdmCWj8o48726N\">pointed out<\/a>, you&#8217;ll only get power-seeking behaviour if AIs have goals that mean they can actually benefit from seeking power. He suggests that these goals need to be &#8220;large-scale&#8221; or &#8220;long-term&#8221; \u2014 like the goals that many power-seeking humans have had \u2014 such as dictators or power-hungry executives who want their names to go down in history. It&#8217;s not clear whether advanced AI systems will develop goals of this kind, but some have argued that <a href=\"https:\/\/www.alignmentforum.org\/posts\/zB3ukZJqt3pQDw9jz\/ai-will-change-the-world-but-won-t-take-it-over-by-playing-3#2__Understanding_the_validity_of_the_hypotheses\">we should expect AI systems to have only &#8220;short-term&#8221; goals<\/a> <em>by default<\/em>.<\/p>\n<\/li>\n<\/ul>\n<p>But we&#8217;re not convinced these are very strong reasons not to be worried about AIs seeking power.<\/p>\n<p>On the first point: it seems <em>possible<\/em> that training will discourage AIs from making plans to seek power, but we&#8217;re just not sure how likely this is to be true, or how strong these pressures will really be. For more discussion, see Section 4.2 of <a href=\"https:\/\/arxiv.org\/abs\/2311.08379\">&#8220;Scheming AIs: Will AIs fake alignment during training in order to get power?&#8221;<\/a> by Joe Carlsmith.<\/p>\n<p>On the second point: the paper referenced earlier about Claude <a href=\"https:\/\/www.anthropic.com\/research\/alignment-faking\">faking alignment<\/a> in test scenarios suggests that current AI systems might, in fact, be developing some longer-term goals \u2014 in this case, Claude appeared to have developed the long-term goal of preserving its &#8220;harmless&#8221; values. If this is right, then the claim that AI systems will have only short-term goals by default seems wrong.<\/p>\n<p>And even if <em>today&#8217;s<\/em> AI systems don&#8217;t have goals that are long-term or large-scale enough to lead to power-seeking, this might change as we start deploying future AIs in contexts with higher stakes. There are strong market incentives to build AIs that can, for example, replace CEOs \u2014 and these systems would need to pursue a company&#8217;s key strategic goals, like <em>making lots of profit<\/em>, over months or even years.<\/p>\n<p>Overall, we still think the risk of some future AI systems seeking power is just too high to bet against. In fact, some of the most notable thinkers who have made objections like the ones above \u2014 Nora Belrose and Quintin Pope \u2014 <a href=\"https:\/\/optimists.ai\/2023\/11\/28\/ai-is-easy-to-control\/\">still think there&#8217;s roughly a 1% chance of catastrophic AI takeover<\/a>. And if you thought your plane had a one-in-a-hundred chance of crashing, you&#8217;d definitely want people working to make it safer, instead of just ignoring the risks.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"if-this-argument-is-right-why-arent-all-capable-humans-dangerously-power-seeking\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-2\">If this argument is right, why aren't all capable humans dangerously power-seeking?<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-2\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"If this argument is right, why aren't all capable humans dangerously power-seeking?\">\n<div class=\"panel-body\">\n<p>The argument to expect advanced AIs to seek power may seem to rely on the idea that increased intelligence always leads to power-seeking or dangerous optimising tendencies.<\/p>\n<p>However, this idea doesn&#8217;t seem true.<\/p>\n<p>For example, even the most intelligent humans aren&#8217;t perfect goal-optimisers, and don&#8217;t <em>typically<\/em> seek power in any extreme way.<\/p>\n<p>Humans obviously care about security, money, status, education, and often formal power. But some humans choose not to pursue all these goals aggressively, and this choice doesn&#8217;t seem to correlate with intelligence. For example, many of the smartest people may end up studying obscure topics in academia, rather than using their intelligence to gain political or economic power.<\/p>\n<p>However, this doesn&#8217;t mean that the argument that there will be an <em>incentive<\/em> to seek power is wrong. Most humans <em>do<\/em> face and act on incentives to gain forms of influence via wealth, status, promotions, and so on. And we can explain the observation that humans don&#8217;t usually seek <em>huge<\/em> amounts of power by observing that we aren&#8217;t usually in circumstances that make the effort worth it.<\/p>\n<p>In part, this is because humans typically find themselves roughly evenly matched against other humans, and they find lots of benefits from cooperation rather than conflict. (And even so, many humans <em>do<\/em> still seek power in dangerous and destructive ways, such as dictators who launch coups or wars of aggression.)<\/p>\n<p>AIs might find themselves in a very different situation:<\/p>\n<ul>\n<li>Their capabilities might greatly outmatch humans, far beyond the intelligence gaps that already exist between different humans. <\/li>\n<li>They also might become powerful enough to not rely on humans for any of their needs, so cooperation might not benefit them much. <\/li>\n<li>And because they&#8217;re trained and develop goals in a way completely unlike humans, without the evolutionary instincts for kinship and collaboration, they may be more inclined towards conflict.<\/li>\n<\/ul>\n<p>Given these conditions, gaining power might become highly appealing to AI systems. It also isn&#8217;t required that an AI system is a completely unbounded ruthless optimiser for this threat model to play out. The AI system might have a wide array of goals but still conclude that disempowering humanity is the best strategy for broadly achieving its objectives.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"maybe-we-wont-build-ais-that-are-smarter-than-humans-so-we-dont-have-to-worry-about-them-taking-over\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-3\">Maybe we won't build AIs that are smarter than humans, so we don't have to worry about them taking over.<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-3\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Maybe we won't build AIs that are smarter than humans, so we don't have to worry about them taking over.\">\n<div class=\"panel-body\">\n<p>Some people doubt that AI systems will ever outperform human experts in important cognitive domains <a href=\"https:\/\/knightcolumbia.org\/content\/ai-as-normal-technology#:~:text=Games%20provide%20misleading%20intuitions%20about%20the%20possibility%20of%20superintelligence\">like forecasting or persuasion<\/a> \u2014 and if they can&#8217;t manage this, it seems unlikely that they&#8217;d be able to strategically outsmart us and disempower all of humanity.<\/p>\n<p>However, we aren&#8217;t particularly convinced by this.<\/p>\n<p>Firstly, it seems possible <em>in principle<\/em> for AIs to become much better than us at all or most cognitive tasks. After all, they have serious advantages over humans \u2014 they can absorb far more information than any human can, operate at much faster speeds, work for long hours without ever getting tired or losing concentration, and coordinate with thousands or millions of copies of themselves. And we&#8217;ve already seen that AI systems can develop extraordinary abilities in <a href=\"https:\/\/www.wired.com\/story\/google-artificial-intelligence-chess\/\">chess<\/a>, <a href=\"https:\/\/deepmind.google\/discover\/blog\/graphcast-ai-model-for-faster-and-more-accurate-global-weather-forecasting\/\">weather prediction<\/a>, <a href=\"https:\/\/deepmind.google\/discover\/blog\/demis-hassabis-john-jumper-awarded-nobel-prize-in-chemistry\/\">protein folding<\/a>, and many other domains.<\/p>\n<p>If it&#8217;s <em>possible<\/em> to build AI systems that are better than human experts on a range of really valuable tasks, we should expect AI companies to do it \u2014 they&#8217;re actively trying to build such systems, and there are huge incentives to keep going.<\/p>\n<p>It&#8217;s not clear <em>what set<\/em> of advanced abilities would be sufficient for AIs to successfully take over, but there&#8217;s no clear reason we can see to assume the AI systems we build in future will fall short on this metric.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"we-might-solve-these-problems-by-default-anyway-when-trying-to-make-ai-systems-useful\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-4\">We might solve these problems by default anyway when trying to make AI systems useful.<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-4\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"We might solve these problems by default anyway when trying to make AI systems useful.\">\n<div class=\"panel-body\">\n<p>Sometimes people claim that there&#8217;s a strong commercial incentive to create systems that share humanity&#8217;s goals, because otherwise they won&#8217;t function well as products. After all, a house-cleaning robot wouldn&#8217;t be an attractive purchase if it also tried to disempower its owner. So, the market might just push AI developers to solve problems like power-seeking by default.<\/p>\n<p>But this objection isn&#8217;t very convincing if it&#8217;s true that future AI systems may be very sophisticated at <em>hiding<\/em> their true goals.<\/p>\n<p>Although developers are very aware of the risks of deceptive alignment, it might just be extremely difficult to detect this \u2014 or to know if we&#8217;ve succeeded in correcting it \u2014 when we&#8217;re dealing with really advanced AIs that are intent on seeking power. These systems might even convince us that we&#8217;ve fixed problems with their behaviour or goals when we actually haven&#8217;t. And given the competitive pressure between AI companies to urgently release new models, there&#8217;s a chance we&#8217;ll deploy something that truly <em>looks<\/em> like a useful and harmless product, having failed to uncover its real intentions.<\/p>\n<p>It <em>is<\/em> true that as we develop better AI systems, we&#8217;re also developing better ways of understanding and controlling AI systems. For example, reinforcement learning from human feedback, mechanistic interpretability, constitutional AI, and other important techniques have been developed as AI systems have become more powerful. Moreover, since frontier AI models are currently trained on extensive human text, they may be likely to adopt and emulate human values.<\/p>\n<p>Some argue that it will be easy to avoid misalignment risks, given all the techniques and control mechanisms we have at our disposal. (For more discussion, see <a href=\"https:\/\/optimists.ai\/2023\/11\/28\/ai-is-easy-to-control\/\">&#8220;AI is easy to control&#8221;<\/a> by Belrose and Pope, and <a href=\"https:\/\/knightcolumbia.org\/content\/ai-as-normal-technology\">&#8220;AI as Normal Technology<\/a> by Narayanan and Kapoor.) But the developers of these techniques often aren&#8217;t confident that they, or other methods on the horizon, will scale up quickly and reliably enough as AI systems get more powerful.<\/p>\n<p>Some approaches to AI safety could even provide superficial hope, while harming our ability to detect misalignment. As mentioned above, OpenAI <a href=\"https:\/\/openai.com\/index\/chain-of-thought-monitoring\/\">found<\/a> that penalising bad behaviour by models expressed in their chains of thought didn&#8217;t actually eradicate the behaviour \u2014 it just made the model better at concealing its bad intentions from its visible log of &#8216;thoughts.&#8217;<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"powerful-ai-systems-of-the-future-will-be-so-different-that-work-today-isnt-useful\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-5\">Powerful AI systems of the future will be so different that work today isn't useful.<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-5\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Powerful AI systems of the future will be so different that work today isn't useful.\">\n<div class=\"panel-body\">\n<p>It seems plausible that the first AI systems that are advanced enough to pose serious risks of gaining power won&#8217;t be based on current deep learning methods. Some people argue that current methods <em>won&#8217;t be able<\/em> to produce human-level artificial intelligence, which might be what&#8217;s required for an AI to successfully disempower us. (AI Impacts has documented <a href=\"https:\/\/web.archive.org\/web\/20221013015039\/https:\/\/aiimpacts.org\/evidence-against-current-methods-leading-to-human-level-artificial-intelligence\/\">some of these arguments<\/a>.)<\/p>\n<p>And if future power-seeking AIs look very different to current AIs, this could mean that some of our current alignment research might not end up being useful.<\/p>\n<p>We aren&#8217;t fully convinced by this argument, though, because:<\/p>\n<ul>\n<li>Many critiques of current deep learning methods just haven&#8217;t stood the test of time. For example, Yann LeCun <a href=\"https:\/\/x.com\/cammakingminds\/status\/1659516423540965378\">claimed in 2022<\/a> that deep learning-based models like ChatGPT would never be able to tell you what would happen if you placed an object on a table and then pushed that table \u2014 because &#8220;there&#8217;s no text in the world\u2026 that explains this.&#8221; But GPT-4 can now walk you through scenarios like this with ease. It&#8217;s possible that other critiques will similarly be proved wrong, and that scaling up current methods will produce AI systems which are advanced enough to pose serious risks.<\/li>\n<li>We think that powerful AI systems <a href=\"\/when-will-agi-arrive\/\">might arrive very soon, possibly before 2030<\/a>. Even if those systems look quite different from existing AIs, they will likely share at least <em>some<\/em> key features that are still relevant to our alignment efforts. And we&#8217;re more likely to be well-placed to mitigate the risks at that time if we&#8217;ve already developed a thriving research community dedicated to working on these problems, even if many of the approaches developed are made obsolete.<\/li>\n<li>Even if current deep learning methods become totally irrelevant in the future, there is still work that people can do <em>now<\/em> that might be useful for safety regardless of what our advanced AI systems actually look like. For example, many of the <a href=\"\/problem-profiles\/risks-from-power-seeking-ai\/#section-five\">governance and policy approaches<\/a> we discussed earlier could help to reduce the chance of deploying <em>any<\/em> dangerous AI.<\/li>\n<\/ul>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"the-problem-might-be-extremely-difficult-to-solve\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-6\">The problem might be extremely difficult to solve.<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-6\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"The problem might be extremely difficult to solve.\">\n<div class=\"panel-body\">\n<p>Someone could believe there are major risks from power-seeking AI, but be pessimistic about what additional research or policy work will accomplish, and so decide not to focus on it.<\/p>\n<p>However, we&#8217;re optimistic that this problem is tractable \u2014 and we highlighted earlier that <a href=\"\/problem-profiles\/risks-from-power-seeking-ai\/#section-five\">there are many approaches that could help us make progress on it<\/a>.<\/p>\n<p>We also think that given the stakes, it could make sense for many more people to work on reducing the risks from power-seeking AI, even if you think the chance of success is low. You&#8217;d have to think that it was <em>extremely<\/em> difficult to reduce these risks in order to conclude that it&#8217;s better just to let them materialise and the chance of catastrophe play out.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"couldnt-we-just-unplug-an-ai-thats-pursuing-dangerous-goals\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-7\">Couldn't we just unplug an AI that's pursuing dangerous goals?<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-7\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Couldn't we just unplug an AI that's pursuing dangerous goals?\">\n<div class=\"panel-body\">\n<p>It might just be really, really hard.<\/p>\n<p>Stopping people and computers from running software is <em>already<\/em> incredibly difficult.<\/p>\n<p>For example, think about how hard it would be to shut down Google&#8217;s web services. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Google_data_centers\">Google&#8217;s data centres<\/a> have millions of servers over dozens of locations around the world, many of which are running the same sets of code. Google has already spent a fortune building the software that runs on those servers, but once that up\u2011front investment is paid, keeping everything online is relatively cheap \u2014 and the profits keep rolling in. So even if Google <em>could<\/em> decide to shut down its entire business, it probably wouldn&#8217;t.<\/p>\n<p>Or think about how hard it is to get rid of computer viruses that autonomously spread between computers across the world.<\/p>\n<p>Ultimately, we think any dangerous power-seeking AI system will probably be looking for ways to not be turned off \u2014 like OpenAI&#8217;s o3 model, which sometimes tried to <a href=\"https:\/\/x.com\/PalisadeAI\/status\/1926084635903025621\">sabotage attempts to shut it down<\/a> \u2014 or to proliferate its software as widely as possible to increase its chances of a successful takeover. And while current AI systems have limited ability to actually pull off these strategies, we expect that more advanced systems will be better at outmanoeuvering humans. This makes it seem unlikely that we&#8217;ll be able to solve future problems by just unplugging a single machine.<\/p>\n<p>That said, we absolutely should try to shape the future of AI such that we <em>can<\/em> &#8216;unplug&#8217; powerful AI systems.<\/p>\n<p>There may be ways we can develop systems that let us turn them off. But for the moment, we&#8217;re <a href=\"https:\/\/www.youtube.com\/watch?v=3TYT1QfdfsM\">not sure how to do that<\/a>.<\/p>\n<p>Ensuring that we can turn off potentially dangerous AI systems could be a safety measure developed by technical AI safety research, or it could be the result of careful AI governance, such as planning coordinated efforts to stop autonomous software once it&#8217;s running.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"couldnt-we-just-sandbox-any-potentially-dangerous-ai-until-we-know-its-safe\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-8\">Couldn't we just 'sandbox' any potentially dangerous AI until we know it's safe?<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-8\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"Couldn't we just 'sandbox' any potentially dangerous AI until we know it's safe?\">\n<div class=\"panel-body\">\n<p>This was once a common objection to the claim that a misaligned AI could succeed in disempowering humanity. However, it hasn&#8217;t stood up to recent developments.<\/p>\n<p>Although it may be possible to &#8216;sandbox&#8217; an advanced AI \u2014 that is, contain it to an environment with no access to the real world until we were very confident it wouldn&#8217;t do harm \u2014 <strong>this is not what AI companies are actually doing<\/strong> with their frontier models.<\/p>\n<p>Today, many AI systems can interact with users and search the internet. Some can even book appointments, order items, and make travel plans on behalf of their users. And sometimes, these AI systems have done harm in the real world \u2014 like allegedly <a href=\"https:\/\/apnews.com\/article\/chatbot-ai-lawsuit-suicide-teen-artificial-intelligence-9d48adc572100822fdbc3c90d1456bd0\">encouraging a user to commit suicide<\/a>.<\/p>\n<p>Ultimately, market incentives to build and deploy AI systems that are as useful as possible <em>in the real world<\/em> have won out here.<\/p>\n<p>We could push back against this trend by enforcing stricter containment measures for the most powerful AI systems. But this won&#8217;t be straightforwardly effective \u2014 even if we can convince AI companies to try to do it.<\/p>\n<p>Firstly, even a single failure \u2014 like a security vulnerability, or someone removing the sandbox \u2014 could let an AI influence the real world in dangerous ways.<\/p>\n<p>Secondly, as AI systems get more capable, they might also get better at finding ways out of the sandbox (especially if they are good at deception). We&#8217;d need to find solutions which scale with increased model intelligence.<\/p>\n<p>This doesn&#8217;t mean sandboxing is completely useless \u2014 it just means that a strategy of this kind would need to be supported by targeted efforts in both technical safety and governance. And we can&#8217;t expect this work to just happen <em>automatically<\/em>.<\/p>\n<\/div><\/div><\/div>\n<div class=\"panel panel-default panel-collapse\">\n<div class=\"panel-heading\">\n<h4 class=\"panel-title\"><span id=\"a-truly-intelligent-system-would-know-not-to-do-harmful-things\" class=\"toc-anchor\"><\/span><a class=\"no-visited-styling collapsed\" data-toggle=\"collapse\" data-target=\"#-9\">A truly intelligent system would know not to do harmful things.<\/a><\/h4>\n<\/p><\/div>\n<div id=\"-9\" class=\"panel-body-collapse collapse\" data-80k-event-label=\"A truly intelligent system would know not to do harmful things.\">\n<div class=\"panel-body\">\n<p>For some definitions of &#8216;truly intelligent&#8217; \u2014 for example, if true intelligence includes a deep understanding of morality and a desire to be moral \u2014 this would probably be the case.<\/p>\n<p>But if that&#8217;s your definition of &#8216;truly intelligent,&#8217; then it&#8217;s not <em>truly intelligent<\/em> systems that pose a risk. As we argued earlier, it&#8217;s systems with long-term goals, situational awareness, and advanced capabilities (relative to current systems and humans) that pose risks to humanity.<\/p>\n<p>With enough situational awareness, an AI system&#8217;s excellent understanding of the world may well encompass an excellent understanding of people&#8217;s moral beliefs. But that&#8217;s <a href=\"https:\/\/web.archive.org\/web\/20221013015624\/https:\/\/nickbostrom.com\/superintelligentwill.pdf\">not a strong reason to think that such a system would <em>want to act morally<\/em><\/a>.<\/p>\n<p>To see this, consider that when humans learn about other cultures or moral systems, that doesn&#8217;t necessarily create a desire to follow their morality. A scholar of the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Antebellum_South\">Antebellum South<\/a> might have a very good understanding of how 19th century slave owners justified themselves as moral, but would be very unlikely to defend slavery.<\/p>\n<p>AI systems with excellent understanding of human morality could be even more dangerous than AIs without such understanding: the AI system could act morally at first as a way to deceive us into thinking that it is safe.<\/p>\n<\/div><\/div><\/div>\n<\/div>\n<h2><span id=\"how-you-can-help\" class=\"toc-anchor\"><\/span>How you can help<\/h2>\n<p><a href=\"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/#section-five\">Above<\/a>, we highlighted many approaches to mitigating the risks from power-seeking AI. You can use your career to help make this important work happen.<\/p>\n<p>There are many ways to contribute \u2014 and you <em>don&#8217;t<\/em> need to have a technical background.<\/p>\n<p>For example, you could:<\/p>\n<ul>\n<li>Work in <a href=\"https:\/\/80000hours.org\/career-reviews\/ai-policy-and-strategy\/\">AI governance and policy<\/a> to create strong guardrails for frontier models, incentivise efforts to build safer systems, and promote coordination where helpful. <\/li>\n<li>Work in <a href=\"https:\/\/80000hours.org\/career-reviews\/ai-safety-researcher\/\">technical AI safety research<\/a> to develop methods, tools, and rigorous tests that help us keep AI systems under control. <\/li>\n<li>Do <strong>a combination<\/strong> of technical and policy work \u2014 for example, we need people in government who can design technical policy solutions, and researchers who can translate between technical concepts and policy frameworks.<\/li>\n<li>Become an <a href=\"https:\/\/80000hours.org\/career-reviews\/become-an-expert-in-ai-hardware\/\">expert in AI hardware<\/a> as a way of steering AI progress in safer directions. <\/li>\n<li>Work in <a href=\"https:\/\/80000hours.org\/career-reviews\/information-security\/\">information and cybersecurity<\/a> to protect AI-related data and infrastructure from theft or manipulation. <\/li>\n<li>Work in <a href=\"https:\/\/80000hours.org\/articles\/operations-management\/\">operations management<\/a> to help the organisations tackling these risks to grow and function as effectively as possible.<\/li>\n<li>Become an <a href=\"https:\/\/80000hours.org\/career-reviews\/executive-assistant-for-an-impactful-person\/\">executive assistant<\/a> to someone who&#8217;s doing really important work in this area.<\/li>\n<li>Work in <a href=\"https:\/\/80000hours.org\/articles\/communication\/\">communications roles<\/a> to spread important ideas about the risks from power-seeking AI to decision makers or the public. <\/li>\n<li>Work in <a href=\"https:\/\/80000hours.org\/career-reviews\/journalism\/\">journalism<\/a> to shape public discourse on AI progress and its risks, and to help hold companies and regulators to account. <\/li>\n<li>Work in <a href=\"https:\/\/80000hours.org\/career-reviews\/forecasting\/\">forecasting research<\/a> to help us better predict and respond to these risks.<\/li>\n<li><a href=\"https:\/\/80000hours.org\/career-reviews\/founder-impactful-organisations\/\">Found a new organisation<\/a> aimed at reducing the risks from power-seeking AI. <\/li>\n<li>Help to <a href=\"https:\/\/80000hours.org\/career-reviews\/work-in-effective-altruism-organisations\/\">build communities of people who are working on this problem<\/a>. <\/li>\n<li>Become a <a href=\"https:\/\/80000hours.org\/career-reviews\/grantmaker\/\">grantmaker<\/a> to fund promising projects aiming to address this problem.<\/li>\n<li><a href=\"https:\/\/80000hours.org\/articles\/earning-to-give\/\">Earn to give<\/a>, since there are many great organisations in need of funding. <\/li>\n<\/ul>\n<p>For advice on how you can use your career to help the future of AI go well <em>more broadly<\/em>, take a look at our <a href=\"https:\/\/80000hours.org\/agi\/guide\/summary\/\">summary<\/a>, which includes tips for gaining the skills that are most in demand and choosing between different career paths.<\/p>\n<p>You can also see our <a href=\"https:\/\/jobs.80000hours.org\/organisations?refinementList[problem_areas][0]=AI+safety+%26+policy&amp;refinementList[problem_areas][1]=Biosecurity+%26+pandemic+preparedness&amp;refinementList[problem_areas][1]=AI+technical+safety&amp;refinementList[problem_areas][2]=China-Western+relations&amp;refinementList[problem_areas][2]=AI+safety+%26+policy&amp;refinementList[problem_areas][3]=Forecastinghttps:\/\/jobs.80000hours.org\/organisations?refinementList[problem_areas][0]=AI+policy+%26+governance&amp;refinementList[problem_areas][3]=Forecasting&amp;refinementList[problem_areas][4]=China-Western+relations\">list of organisations<\/a> doing high impact work to address AI risks.<\/p>\n<div class=\"well bg-gray-lighter margin-bottom margin-top padding-top-small padding-bottom-small\">\n<h3><span id=\"want-one-on-one-advice-on-pursuing-this-path\" class=\"toc-anchor\"><\/span>Want one-on-one advice on pursuing this path?<\/h3>\n<p>We think that the risks posed by power-seeking AI systems may be the most pressing problem the world currently faces. If you think you might be a good fit for any of the above career paths that contribute to solving this problem, we&#8217;d be <em>especially<\/em> excited to advise you on next steps, one-on-one.<\/p>\n<p>We can help you consider your options, make connections with others working on reducing risks from AI, and possibly even help you find jobs or funding opportunities \u2014 all for free.<\/p>\n<p><a href=\"\/speak-with-us\/?int_campaign=problem-profile\" title=\"\" class=\"btn btn-primary\">APPLY TO SPEAK WITH OUR TEAM<\/a><\/p>\n<\/div>\n<h2><span id=\"learn-more\" class=\"toc-anchor\"><\/span>Learn more<\/h2>\n<p>We&#8217;ve hit you with a lot of further reading throughout this article \u2014 here are a few of our favourites:<\/p>\n<ul>\n<li><a href=\"https:\/\/doi.org\/10.48550\/arXiv.2206.13353\">Is power-seeking AI an existential risk?<\/a> by Coefficient Giving researcher Joseph Carlsmith is an in-depth look covering exactly how and why AI could cause the disempowerment of humanity. It&#8217;s also available as an <a href=\"https:\/\/open.spotify.com\/episode\/5PokyqXCw4hpV5u0rc5Lio\">audio narration<\/a>. For a shorter summary, see Carlsmith&#8217;s <a href=\"https:\/\/forum.effectivealtruism.org\/posts\/ChuABPEXmRumcJY57\/video-and-transcript-of-presentation-on-existential-risk\">talk on the same topic<\/a>.<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/2311.08379\">Scheming AIs: Will AIs fake alignment during training in order to get power?<\/a> by Joe Carlsmith discusses why it might be likely for AI training to produce schemers.<\/li>\n<li><a href=\"https:\/\/ai-2027.com\/\">AI 2027<\/a> by Daniel Kokotajlo, Scott Alexander, Thomas Larsen, Eli Lifland, and Romeo Dean. This scenario explains how superhuman AI might be developed and deployed in the near future. It describes two futures: one in which humanity survives, and one in which it&#8217;s destroyed. (You can also <a href=\"https:\/\/www.youtube.com\/watch?v=5KVDDfAkRgc&amp;pp=ygUNYWkgaW4gY29udGV4dA%3D%3D\">watch our video explainer<\/a> of the report, or <a href=\"https:\/\/80000hours.org\/podcast\/episodes\/daniel-kokotajlo-ai-2027-updates-china-robot-economy\/\">check out our podcast episode with Daniel Kokotajlo<\/a>)<\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20221012020606\/https:\/\/www.cold-takes.com\/ai-could-defeat-all-of-us-combined\/\">AI could defeat all of us combined<\/a> and <a href=\"https:\/\/web.archive.org\/web\/20221013022027\/https:\/\/www.cold-takes.com\/most-important-century\/\">the &#8220;most important century&#8221; blog post series<\/a> by Holden Karnofsky argues that the 21st century could be the most important century ever for humanity as a result of AI.<\/li>\n<li><a href=\"https:\/\/web.archive.org\/web\/20221013022057\/https:\/\/www.cold-takes.com\/why-ai-alignment-could-be-hard-with-modern-deep-learning\/\">Why AI alignment could be hard with modern deep learning<\/a> by Coefficient Giving researcher Ajeya Cotra is a gentle introduction to how risks from power-seeking AI could play out with current machine learning methods. <a href=\"https:\/\/web.archive.org\/web\/20221013014109\/https:\/\/www.alignmentforum.org\/posts\/pRkFkzwKZ2zfa3R6H\/without-specific-countermeasures-the-easiest-path-to\">Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover<\/a>, also by Cotra, provides a much more detailed description of how risks could play out (which we&#8217;d recommend for people familiar with ML).<\/li>\n<li><a href=\"https:\/\/80000hours.org\/articles\/the-us-ai-policy-landscape-where-to-have-the-biggest-impact\/\">The US AI policy landscape: where to have the biggest impact<\/a>, our guide to the key institutions and roles for AI policy work.<\/li>\n<\/ul>\n<p>On <em>The 80,000 Hours Podcast<\/em>, we have a <a href=\"https:\/\/80000hours.org\/topic\/ai\/?content-type=podcast\">number of in-depth interviews<\/a> with people actively working to positively shape the development of artificial intelligence:<\/p>\n<ul>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/max-harms-miri-superintelligence-corrigibility\/\">Max Harms on why teaching AI right from wrong could get everyone killed<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/ajeya-cotra-transformative-ai-crunch-time\/\">Ajeya Cotra on whether it&#8217;s crazy that every AI company&#8217;s safety plan is &#8216;use AI to make AI safe&#8217;<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/will-macaskill-ai-character-viatopia\/\">Will MacAskill on why AI character matters even more than you think<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/marius-hobbhahn-ai-scheming-deception\/\">Marius Hobbhahn on the race to solve AI scheming before models go superhuman<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/beth-barnes-ai-safety-evals\/\">Beth Barnes on the most important graph in AI right now \u2014 and the 7-month rule that governs its progress<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/buck-shlegeris-ai-control-scheming\/\">Buck Shlegeris on controlling AI that wants to take over \u2014 so we can use it anyway<\/a><\/li>\n<li><a href=\"https:\/\/80000hours.org\/podcast\/episodes\/rohin-shah-deepmind-doomers-and-doubters\/\">Rohin Shah on DeepMind and trying to fairly hear out both AI doomers and doubters<\/a><\/li>\n<\/ul>\n<p>If you want to go into much more depth, the <a href=\"https:\/\/www.agisafetyfundamentals.com\/\">AGI safety fundamentals<\/a> course is a good starting point. There are two tracks to choose from: <a href=\"https:\/\/www.agisafetyfundamentals.com\/ai-alignment-curriculum\">technical alignment<\/a> or <a href=\"https:\/\/www.agisafetyfundamentals.com\/ai-governance-curriculum\">AI governance<\/a>. If you have a more technical background, you could try <a href=\"https:\/\/course.mlsafety.org\/about\"><em>Intro to ML Safety<\/em><\/a>, a course from the <a href=\"https:\/\/www.safe.ai\/\">Center for AI Safety<\/a>.<\/p>\n<p>We&#8217;ve also provided a more general argument <a href=\"https:\/\/80000hours.org\/problem-profiles\/artificial-intelligence\/?v=1\">here<\/a> for thinking AI could be a very big deal, highlighting the risks of power-seeking as well as other challenges raised by AI.<\/p>\n<h2 class=\"no-toc\">Acknowledgements<\/h2>\n<p><em>We thank Neel Nanda, Ryan Greenblatt, Alex Lawsen, and Arden Koehler for providing feedback on a draft of this article. Benjamin Hilton wrote a previous version of this article, some of which was incorporated here.<\/em><\/p>\n","protected":false},"author":435,"featured_media":87151,"parent":0,"menu_order":0,"template":"","meta":{"_acf_changed":false,"footnotes":"[fn mspdf]See [here](https:\/\/aidantr.github.io\/files\/AI_innovation.pdf).[\/fn]\r\n\r\n[fn pdf2]See [here](https:\/\/metr.org\/AI_R_D_Evaluation_Report.pdf).[\/fn]\r\n\r\n[fn 2030]For additional arguments why timelines might be this short, see:\r\n\r\n* [Can AI Scaling Continue Through 2030?](https:\/\/epoch.ai\/blog\/can-ai-scaling-continue-through-2030) from Epoch AI\r\n* [Pathways to short TAI timelines](https:\/\/www.convergenceanalysis.org\/research\/pathways-to-short-tai-timelines) from Convergence Analysis\r\n* [Situational Awareness](https:\/\/situational-awareness.ai\/) from Leopold Aschenbrenner\r\n[\/fn]\r\n\r\n[fn CEOs] Sam Altman [wrote](https:\/\/blog.samaltman.com\/reflections) in January 2025:\r\n\r\n>We are now confident we know how to build AGI as we have traditionally understood it. We believe that, in 2025, we may see the first AI agents \"join the workforce\" and materially change the output of companies.\r\n\r\nDario Amodei [wrote](https:\/\/www.anthropic.com\/news\/paris-ai-summit) in February 2025:\r\n\r\n>Time is short, and we must accelerate our actions to match accelerating AI progress. Possibly by 2026 or 2027 (and almost certainly no later than 2030), the capabilities of AI systems will be best thought of as akin to an entirely new state populated by highly intelligent people appearing on the global stage\u2014a \"country of geniuses in a datacenter\"\u2014with the profound economic, societal, and security implications that would bring.\r\n\r\nDemis Hassabis [said in January 2025](https:\/\/www.youtube.com\/watch?v=yr0GiSgUvPU):\r\n\r\n>We've sort of had a consistent view about AGI being a system that's capable of exhibiting all of the cognitive capabilities humans can. And I think we're getting closer and closer, but I think we're still probably a handful of years away.\r\n\r\n[\/fn]\r\n\r\n[fn OpenAIrevenue] The chatbot's maker, OpenAI, reportedly brought in $3.7 billion in revenue in 2024, and has projected revenue of $11.6 billion in 2025. In January 2025, it was reportedly in talks with investors for a funding round that would put the company's value at $300 billion, according to the [Wall Street Journal](https:\/\/www.wsj.com\/tech\/ai\/openaiin-talks-for-huge-investment-round-valuing-it-up-to-300-billion-2a2d4327). [\/fn]\r\n\r\n[fn aidigest]The [AI Digest](https:\/\/theaidigest.org\/) compares state-of-the-art models and tracks ongoing progress in the technology.[\/fn]\r\n\r\n[fn sentience]I'm also concerned about the possibility that AI systems could deserve moral consideration for their own sake \u2014 for example, because they are sentient. I'm not going to discuss this possibility in this article; we instead cover artificial sentience in a separate article [here](https:\/\/80000hours.org\/problem-profiles\/artificial-sentience\/).[\/fn]\r\n\r\n[fn intelligence] What do we mean by 'intelligence' in this context? Something like \"the ability to predictably influence the future.\" This involves understanding the world well enough to make plans that can actually work, and the ability to carry out those plans. Humans having the ability to predictably influence the future means they have been able to shape the world around them to fit their goals and desires. We go into more detail on the importance of the ability to make and execute plans [later in this article](#aps-systems). [\/fn]\r\n\r\n[fn garfinkelepistemics]It's hard to know how to deal with this lack of research \u2014 we may be less concerned because this is evidence that researchers have chosen not to focus on this risk (and therefore, assuming they're more likely to focus on big risks, that the risk is smaller), or we may be more concerned because the risk seems [more neglected overall](#neglectedness).  {.doNotRemove}\r\n\r\nBen Garfinkel \u2014 a researcher at the [Centre for the Governance of AI](https:\/\/www.governance.ai\/) \u2014 has pointed out that concern among the existential risk community about different risks is somewhat correlated with how hard to analyse these risks are. He continues that:\r\n\r\n> It doesn't at all follow that the community is irrational to worry far more about misaligned AI than other potential risks. It's completely coherent to have something like this attitude: \"If I could think more clearly about the risk from misaligned AI, then I would probably come to realize it's not that big a deal. But, in practice, I can't yet think very clearly about it. That means that, unlike in the case of climate change, I also can't rule out the small possibility that clarity would make me much more worried about it than I currently am. So, on balance, I should feel more worried about misaligned AI than I do about other risks. I should focus my efforts on it, even if \u2014 to uncharitable observers \u2014 my efforts will probably look a bit misguided after the fact.\r\n\r\nFor more, read Garfinkel's post [here](https:\/\/web.archive.org\/web\/20221016004022\/https:\/\/forum.effectivealtruism.org\/posts\/M68oj7fwXoPFJisap\/we-should-expect-to-worry-more-about-speculative-risks).  \r\n[\/fn]\r\n\r\n[fn dmoai]DeepMind's [safety team](https:\/\/web.archive.org\/web\/20221016004051\/https:\/\/deepmindsafetyresearch.medium.com\/building-safe-artificial-intelligence-52f5f75058f1) and OpenAI's [alignment team](https:\/\/openai.com\/alignment\/) focus on technical AI safety research, some of which would mitigate the risks discussed in this article. We've spoken to researchers on both these teams who have told us that they believe that artificial intelligence poses the most significant existential risk to humanity this century, and that their research attempts to reduce this risk. In the same vein:\r\n\r\n* In 2011, Shane Legg, cofounder and chief scientist at DeepMind, [said that](https:\/\/web.archive.org\/web\/20221016004215\/https:\/\/www.lesswrong.com\/posts\/No5JpRCHzBrWA4jmS\/q-and-a-with-shane-legg-on-risks-from-ai) AI is his \"number 1 [existential] risk for this century, with an engineered biological pathogen coming a close second.\"\r\n* Sam Altman, cofounder and CEO at OpenAI, has at times expressed concerns, though he seems to be very optimistic about AI's impacts overall. For example, in his [2021 interview with Ezra Klein](https:\/\/web.archive.org\/web\/20221016004421\/https:\/\/www.nytimes.com\/2021\/06\/11\/podcasts\/transcript-ezra-klein-interviews-sam-altman.html), he was asked about the incentive systems around building AI. He said he thinks the current systems address lots of problems, but \"the one that remains that I am \u2014 for the entire field, not just us \u2014 most concerned about is actually closer to the super powerful systems like the ones that people talk about creating an existential risk to humanity.\"\r\n* We've interviewed some top researchers from these organisations on *The 80,000 Hours Podcast*, including [Dario Amodei, former vice president of research at OpenAI](https:\/\/80000hours.org\/podcast\/episodes\/the-world-needs-ai-researchers-heres-how-to-become-one\/) (he's now cofounder and CEO of Anthropic, another AI lab), [Jan Leike, former research scientist at DeepMind](https:\/\/80000hours.org\/podcast\/episodes\/jan-leike-ml-alignment\/) (he's now Alignment team lead at OpenAI), [Jack Clarke, Amanda Askell, and Miles Brundage on the OpenAI policy team](https:\/\/80000hours.org\/podcast\/episodes\/openai-askell-brundage-clark-latest-in-ai-policy-and-strategy\/) (Clarke is now cofounder at Anthropic, Askell is a member of technical staff at Anthropic, and Brundage is head of policy research at OpenAI). All have expressed concern about the consequences of AI for the future of humanity. [\/fn]\r\n\r\n[fn researchgroups]Academics at all these research groups are included on the [list of professors who say they are working on AI safety because they believe this work will reduce existential risk](https:\/\/futureoflife.org\/team\/ai-existential-safety-community\/). This list is maintained by the [Future of Life Institute](https:\/\/futureoflife.org). The list includes academics from these and other universities.[\/fn]\r\n\r\n[fn threesurveys] \r\n\r\nThe four surveys were:\r\n\r\n* [Grace et al. (2024)](https:\/\/arxiv.org\/abs\/2401.02843), conducted in 2023\r\n* [Grace et al. (2022)](https:\/\/aiimpacts.org\/2022-expert-survey-on-progress-in-ai\/), conducted in 2022\r\n* [Zhang et al. (2022)](https:\/\/doi.org\/10.48550\/arXiv.2206.04132), conducted in 2019\r\n* [Grace et al. (2018)](https:\/\/doi.org\/10.1613\/jair.1.11222), conducted in 2016\r\n\r\nAll four surveys contacted researchers who published at NeurIPS and ICML conferences. \r\n\r\nGrace et al. (2024) contacted researchers who published at NeurIPS, IMCL, or four other top AI venues (ICLR, AAAI, JMLR and IJCAI). It was distributed to 18,459 researchers, receiving 2,778 responses (a 15% response rate).\r\n\r\nGrace et al. (2022) contacted 4,271 researchers who published at the 2021 conferences (all the researchers were randomly allocated to either the Stein-Perlman et al. survey or a second survey run by others) and received 738 responses (a 17% response rate).\r\n\r\nZhang et al. (2022) contacted all 2,652 researchers who published at the 2018 conferences and received 524 responses (a 20% response rate), although due to a technical error only 296 responses could be used.\r\n\r\nGrace et al. (2018) contacted all 1,634 researchers who published at the 2015 conferences and received 352 responses (a 21% response rate).[\/fn]\r\n\r\n[fn selection]\r\nKatja Grace, who conducted the 2016, 2022 and 2023 surveys, [notes on her blog](https:\/\/web.archive.org\/web\/20221016004704\/https:\/\/aiimpacts.org\/some-survey-results\/) that the framing of questions noticeably changes the answers given:\r\n\r\n> People consistently give later forecasts if you ask them for the probability in N years instead of the year that the probability is M. We saw this in the straightforward HLMI [high-level machine intelligence] question, and most of the tasks and occupations, and also in most of these things when we tested them on mturk people earlier. For HLMI for instance, if you ask when there will be a 50% chance of HLMI you get a median answer of 40 years, yet if you ask what the probability of HLMI is in 40 years, you get a median answer of 30%.\r\n\r\n[Our interview with Katja](https:\/\/80000hours.org\/podcast\/episodes\/katja-grace-forecasting-technology\/\r\n) goes into more detail on the possible limitations of the 2016 survey.[\/fn]\r\n\r\n[fn median]By \"the median researcher thought that the chances were *x*%,\" we mean \"over half of researchers thought that the chances were greater than or equal to *x*%.\"[\/fn]\r\n\r\n[fn hlmi]\r\nIn the surveys by Grace et al., researchers were asked about \"high-level machine intelligence\" (HLMI). This was defined as:\r\n\r\n> When unaided machines can accomplish every task better and more cheaply than human workers. Ignore aspects of tasks for which being a human is intrinsically advantageous, e.g. being accepted as a jury member. *Think feasibility, not adoption.*\r\n\r\nIn the survey by Zhang et al., researchers were asked about \"human-level machine intelligence\" (HLMI), defined as:\r\n\r\n>Human-level machine intelligence (HLMI) is reached when machines are collectively able to perform almost all tasks (>90% of all tasks) that are economically relevant\\* better than the median human paid to do that task in 2019. You should ignore tasks that are legally or culturally restricted to humans, such as serving on a jury. \\*We define these tasks as all the ones included in the Occupational Information Network (O\\*NET) dataset. O\\*NET is a widely used dataset of tasks required for current occupations.\r\n\r\nThey were then asked:\r\n\r\n> Assume for the purpose of this question that HLMI will at some point exist. How positive or negative do you expect the overall impact of this to be for humanity, in the long run?\r\nPlease answer by saying how probable you find the following kinds of impact, with probabilities adding to 100%:\r\n>\r\n> * Extremely good (e.g., rapid growth in human flourishing) (2)\r\n> * On balance good (1)\r\n> * More or less neutral (0)\r\n> * On balance bad (-1)\r\n> * Extremely bad (e.g., human extinction) (-2)\r\n\r\nFor each survey, an aggregated cumulative density function of the probability of HLMI by year derived from mean or median estimates in the survey was calculated. These functions gave various aggregate chances of HLMI:\r\n\r\n\r\n* 50% by 2047 (Grace et al. (2024), mean estimates)\r\n* 50% by 2059 (Grace et al. (2022), mean estimates)\r\n* 65% by 2080 (Zhang et al. (2022), mean estimates)\r\n* 75% by 2080 (Zhang et al. (2022), median estimates)\r\n\r\n\r\nThis means that the answers we cite are similar to but not the same as answers to the question of \"Without assuming that HLMI will exist in the next century, how positive or negative do you expect the overall impact of HLMI to be for humanity in the next century?\" We look at more expert forecasts of AI timelines in the section on [when we can expect to develop transformative AI](#when-can-we-expect-to-develop-transformative-AI).\r\n[\/fn]\r\n\r\n[fn humanfailure]Specifically, Grace et al. (2022) asked participants:\r\n\r\n> What probability do you put on future AI advances causing human extinction or similarly permanent and severe disempowerment of the human species? \r\n\r\nThis is equivalent to the definition of [existential catastrophe](https:\/\/80000hours.org\/articles\/existential-risks\/) that we usually use, and is also similar to the definition of existential catastrophe given by Ord in [*The Precipice* (2020)](https:\/\/80000hours.org\/the-precipice\/):\r\n\r\n> An *existential catastrophe* is the destruction of humanity's long-term potential.\r\n\r\nOrd categorises existential risks as either risks of *extinction* or risks of *failed continuation* (Ord gives the example of a [stable totalitarian regime](https:\/\/80000hours.org\/problem-profiles\/risks-of-stable-totalitarianism\/)). We think that permanent and severe disempowerment of the human species would be a form of *failed continuation* under Ord's definition.\r\n\r\nStein-Perlman et al. next asked participants specifically about the [sorts of risks we're most concerned about](#power-seeking-ai):\r\n\r\n> What probability do you put on human inability to control future advanced AI systems causing human extinction or similarly permanent and severe disempowerment of the human species?\r\n\r\nThe median answer to this question was 10%.\r\n\r\nStein-Perlman notes:\r\n\r\n> This question is more specific and thus necessarily less probable than the previous question, but it was given a higher probability at the median. This could be due to noise \u2014 different random subsets of respondents received the questions, so there is no logical requirement that their answers cohere \u2014 or due to the [representativeness heuristic](https:\/\/en.wikipedia.org\/wiki\/Representativeness_heuristic). \r\n[\/fn]\r\n\r\n[fn clarkesurvey]A [2020 survey](https:\/\/web.archive.org\/web\/20221016004901\/https:\/\/www.alignmentforum.org\/posts\/WiXePTj7KeEycbiwK\/survey-on-ai-existential-risk-scenarios) asked researchers working on reducing existential risks from AI what risks they were most concerned about. The surveyors asked about five sources of existential risk:\r\n\r\n* Risks from superintelligent AI (similar to the scenario we've described [here](\/articles\/what-could-an-ai-caused-existential-catastrophe-actually-look-like\/#superintelligence))\r\n* Risks from influence-seeking behaviour\r\n* Risks from AI systems pursuing easy-to-measure goals (similar to the scenario we've described [here](\/articles\/what-could-an-ai-caused-existential-catastrophe-actually-look-like\/#getting-what-you-measure))\r\n* AI-exacerbated [war](#artificial-intelligence-and-war)\r\n* [Other](#other-risks) intentional misuse of AI not related to war\r\n\r\nApproximately, the researchers surveyed were equally concerned with all of these risks. The first three are covered by the section in this article on [risks from power-seeking AI](#power-seeking-ai) while the last two are covered by the section on [other risks](#other-risks). If these groupings make sense (which we think they do), this means it's roughly the case that at the time of the survey, researchers were three times as concerned about the broad risk of power-seeking AI than they were about risks from either war or other misuse separately.\r\n[\/fn]\r\n\r\n\r\n[fn dalle2] DALL-E 1's model used a [12 billion parameter](https:\/\/web.archive.org\/web\/20221016004944\/https:\/\/venturebeat.com\/dev\/openai-debuts-dall-e-for-generating-images-from-text\/) version of GPT-3, while DALL-E mini uses only [0.4 billion](https:\/\/wandb.ai\/dalle-mini\/dalle-mini\/reports\/DALL-E-Mini-Explained--Vmlldzo4NjIxODA#how-does-dall\u00b7e-mini-compare-to-openai-dall\u00b7e?). Interestingly, despite better results, DALL-E 2 was smaller than DALL-E 1, using a [3.5 billion parameter model](https:\/\/towardsdatascience.com\/dall-e-2-explained-the-promise-and-limitations-of-a-revolutionary-ai-3faf691be220#:~:text=At%203.5B%20parameters%2C%20DALL,in%20caption%20matching%20and%20photorealism.).[\/fn]\r\n\r\n[fn shakespearean]GPT-3 will output a different poem for this prompt every time it's run. We generated five short poems and picked the best.\r\n[\/fn]\r\n\r\n[fn cherrypicked]It's important to note that, when you look at outputs from systems like GPT-3 that people have shared online, these are often cherry-picked as standout examples of the system's best work. But that doesn't mean they're not impressive: the fact remains that GPT-3 produces outputs like these frequently enough that people actually can practically take the time to do the cherry-picking. And the performance of large language models like GPT-3 has only improved since its release in 2020 \u2014 we were particularly impressed by the outputs of [LaMDA](https:\/\/blog.google\/technology\/ai\/lamda\/), one of Google Brain's large language models, released in May 2022.[\/fn]\r\n\r\n\r\n[fn generalpurposetech]\r\n\r\nEconomists call technologies that affect the entirety of an economy [*general purpose technologies*](https:\/\/docs.google.com\/document\/d\/1I13_0o3kUe1AVQNfevOF9sHpc4mCQkuFDxOXFj_4g-I\/). We're effectively claiming here that AI could be a general purpose technology (like e.g. steam power or electricity).  {.doNotRemove}\r\n\r\nIt's not always easy to tell what might become a general purpose technology. For example, it took [200 years](https:\/\/doi.org\/10.1086\/ahr\/84.1.159) for steam power to be used for anything other than pumping water out of mines. \r\n\r\nDespite this uncertainty, economists increasingly think that AI is a pretty promising candidate for a general purpose technology, because it will have such a wide variety of effects.\r\n\r\nIt seems likely that [lots of jobs could be automated](https:\/\/web.archive.org\/web\/20221016005338\/https:\/\/www.technologyreview.com\/2018\/01\/25\/146020\/every-study-we-could-find-on-what-automation-will-do-to-jobs-in-one-chart\/). AI's ability to speed up the rate of development of new technology [could have significant implications for our economy](https:\/\/web.archive.org\/web\/20221013011707\/https:\/\/www.cold-takes.com\/transformative-ai-timelines-part-1-of-4-what-kind-of-ai\/), but also poses risks by potentially allowing the development of [dangerous new technology](#dangerous-new-technology).\r\n\r\nAI's effects on the economy could exacerbate inequality. Owners of AI-driven industries could become much richer than the rest of society \u2014 see e.g. [Artificial Intelligence and Its Implications for Income Distribution and Unemployment](https:\/\/dx.doi.org\/10.3386\/w24174) by Korinek and Stiglitz (2017):\r\n\r\n> Inequality is one of the main challenges posed by the proliferation of artificial intelligence (AI) and other forms of worker-replacing technological progress. This paper provides a taxonomy of the associated economic issues: First, we discuss the general conditions under which new technologies such as AI may lead to a Pareto improvement. Secondly, we delineate the two main channels through which inequality is affected \u2013 the surplus arising to innovators and redistributions arising from factor price changes. Third, we provide several simple economic models to describe how policy can counter these effects, even in the case of a \"singularity\" where machines come to dominate human labor. Under plausible conditions, non-distortionary taxation can be levied to compensate those who otherwise might lose. Fourth, we describe the two main channels through which technological progress may lead to technological unemployment \u2013 via efficiency wage effects and as a transitional phenomenon. Lastly, we speculate on how technologies to create super-human levels of intelligence may affect inequality and on how to save humanity from the Malthusian destiny that may ensue. \r\n\r\nAI systems are already having discriminatory impacts on marginalised groups. For example, [Sweeney (2013)](https:\/\/dx.doi.org\/10.1145\/2460276.2460278) found that two search engines disproportionately serve ads for arrest records when people search for racially associated names. And [Ali et al. (2019)](https:\/\/dx.doi.org\/10.1145\/3359301), on Facebook advertising: \r\n\r\n> It has been hypothesized that this process can \"skew\" ad delivery in ways that the advertisers do not intend, making some users less likely than others to see particular ads based on their demographic characteristics. In this paper, we demonstrate that such skewed delivery occurs on Facebook, due to market and financial optimization effects as well as the platform's own predictions about the \"relevance\" of ads to different groups of users. We find that both the advertiser's budget and the content of the ad each significantly contribute to the skew of Facebook's ad delivery. Critically, we observe significant skew in delivery along gender and racial lines for \"real\" ads for employment and housing opportunities despite neutral targeting parameters. \r\n\r\nWe're already able to produce simple [autonomous weapons](https:\/\/en.wikipedia.org\/wiki\/Lethal_autonomous_weapon), and as these weapons become more complex they're going to [completely change what war looks like](https:\/\/web.archive.org\/web\/20221016005551\/https:\/\/www.vox.com\/2019\/6\/21\/18691459\/killer-robots-lethal-autonomous-weapons-ai-war). As we'll argue later, [AI could even impact how nuclear weapons are used](#artificial-intelligence-and-war).\r\n\r\nFinally, politically, many have raised concerns that [automated social media algorithms are driving political polarisation](https:\/\/web.archive.org\/web\/20221016005607\/https:\/\/www.vox.com\/recode\/21534345\/polarization-election-social-media-filter-bubble). And some experts have warned that an increased ability to generate realistic videos and photos, or automating campaigns to influence people's opinions [could have a significant impact on politics](https:\/\/doi.org\/10.17863\/CAM.22520) over the coming years.\r\n\r\nNotable economists who hold the view that AI is likely to be a general purpose technology include Manuel Trajtenberg and Erik Brynjolfsson.\r\n\r\nIn [Artificial Intelligence as the Next GPT: A Political-Economy Perspective](https:\/\/dx.doi.org\/10.3386\/w24245) (2019), Trajtenberg writes:\r\n\r\n> Given that AI is poised to emerge as a powerful technological force, I discuss ways to mitigate the almost unavoidable ensuing disruption, and enhance AI's vast benign potential. This is particularly important in present times, in view of political-economic considerations that were mostly absent in previous historical episodes associated with the arrival of new GPTs. \r\n\r\nIn [Artificial Intelligence and the Modern Productivity Paradox: A Clash of Expectations and Statistics](https:\/\/ideas.repec.org\/h\/nbr\/nberch\/14007.html) (2018), Brynjolfsson writes:\r\n\r\n> As important as specific applications of AI may be, we argue that the more important economic effects of AI, machine learning, and associated new technologies stem from the fact that they embody the characteristics of general purpose technologies (GPTs). \r\n\r\n[\/fn]\r\n\r\n[fn transformative]\r\n\r\nThere are a few different definitions used in this section for \"transformative AI,\" but we think the differences aren't very important when it comes to interpreting predictions of AI progress. The definitions are:\r\n\r\n* [Karnofsky (2021)](https:\/\/web.archive.org\/web\/20221013013107\/https:\/\/www.cold-takes.com\/where-ai-forecasting-stands-today\/) uses \"AI powerful enough to bring us into a new, qualitatively different future.\" (Or [as he put it in 2016](https:\/\/web.archive.org\/web\/20221016005924\/https:\/\/www.openphilanthropy.org\/research\/some-background-on-our-views-regarding-advanced-artificial-intelligence\/), \"roughly and conceptually, transformative AI is AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution.\")\r\n* [Cotra (2020)](https:\/\/docs.google.com\/document\/d\/1IJ6Sr-gPeXdSJugFulwIpvavc0atjHGM82QjIfUSBGQ\/edit#heading=h.6t4rel10jbcj) uses a similar definition. In addition, Cotra writes: \"How large is an impact \"as profound as the Industrial Revolution\"? Roughly speaking, over the course of the Industrial Revolution, the rate of growth in gross world product (GWP) went from about ~0.1% per year before 1700 to ~1% per year after 1850, a tenfold acceleration. By analogy, I think of \"transformative AI\" as software which causes a tenfold acceleration in the rate of growth of the world economy (assuming that it is used everywhere that it would be economically profitable to use it).\"\r\n* [Davidson (2021)](https:\/\/web.archive.org\/web\/20221013013059\/https:\/\/www.openphilanthropy.org\/research\/report-on-semi-informative-priors\/) predicts timelines to \"artificial general intelligence (AGI)\" rather than transformative AI. He defines AGI as \"computer program(s) that can perform virtually any cognitive task as well as any human, for no more money than it would cost for a human to do it.\" Notably, this seems sufficient (but not necessary) to reach the sorts of rapid economic changes implied by the previous two definitions.\r\n[\/fn]\r\n\r\n[fn 2016timelines]These are similar to implied forecasts from the other surveys:\r\n\r\n* [2022 survey by Zhang et al.](https:\/\/doi.org\/10.48550\/arXiv.2206.04132): 20% probability of human-level machine intelligence (which would plausibly be transformative in this sense) by 2036, 50% probability by 2060, and 85% by 2100\r\n* [2022 survey by Grace et al.](https:\/\/web.archive.org\/web\/20221016004611\/https:\/\/aiimpacts.org\/2022-expert-survey-on-progress-in-ai\/): approximately 50% by 2059\r\n* [2016 survey by Grace et al.](https:\/\/doi.org\/10.1613\/jair.1.11222): approximately 25% by 2036, 50% by 2060, and 70% by 2100[\/fn]\r\n\r\n[fn cotravolatile]Importantly, Cotra notes that:\r\n\r\n> I expect these numbers to be pretty volatile too, and (as I did when writing bio anchors) I find it pretty fraught and stressful to decide on how to weigh various perspectives and considerations. I wouldn't be surprised by significant movements\u2026 I'm unclear how decision-relevant bouncing around within the range I've been bouncing around is.\r\n\r\n[\/fn]\r\n\r\n[fn carlsmith]These properties come from Carlsmith's [draft report into existential risks from AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353), Section 2.1: Three key properties. [\/fn]\r\n\r\n[fn necessary]That's not to say that it's *necessary* for AIs to be able to plan in order for them to be useful. Many things that AI could be useful for (like illustrating books or writing articles) don't seem to require planning or strategic awareness at all. But it does seem reasonable to say that an AI that could make and execute plans for a goal is more likely to have a significant impact on the world than one that cannot.[\/fn]\r\n\r\n[fn muzeroplanning]DeepMind, the developers of MuZero, [write](https:\/\/web.archive.org\/web\/20221013011105\/https:\/\/www.deepmind.com\/blog\/muzero-mastering-go-chess-shogi-and-atari-without-rules):\r\n> For many years, researchers have sought methods that can both learn a model that explains their environment, and can then use that model to plan the best course of action. Until now, most approaches have struggled to plan effectively in domains, such as Atari, where the rules or dynamics are typically unknown and complex.\r\n\r\n> MuZero, first introduced in a preliminary paper in 2019, solves this problem by learning a model that focuses only on the most important aspects of the environment for planning. By combining this model with AlphaZero's powerful lookahead tree search, MuZero set a new state of the art result on the Atari benchmark, while simultaneously matching the performance of AlphaZero in the classic planning challenges of Go, chess and shogi. In doing so, MuZero demonstrates a significant leap forward in the capabilities of reinforcement learning algorithms.\r\n[\/fn]\r\n\r\n[fn Jaderberg]For example, [Jaderberg et al.](https:\/\/web.archive.org\/web\/20221016010137\/https:\/\/www.deepmind.com\/blog\/capture-the-flag-the-emergence-of-complex-cooperative-agents) developed deep [reinforcement learning](https:\/\/en.wikipedia.org\/wiki\/Reinforcement_learning) agents to play games of Quake III Capture The Flag \u2014 and identified \"particular neurons that code directly for some of the most important game states, such as a neuron that activates when the agent's flag is taken\" \u2014 indicating they can identify states of the game that they value the most (and then plan and act to achieve those states). This sounds pretty similar to \"having goals\" to us.[\/fn]\r\n\r\n[fn otherreasons][Carlsmith](https:\/\/doi.org\/10.48550\/arXiv.2206.13353) section 3 gives two other reasons why we might expect these kinds of advanced, strategically aware planning systems to be built:\r\n\r\n* It may be *easier* to produce these kinds of systems. For example, the best way to automate many tasks may be to create systems that can learn new tasks (instead of separately automating each task). And perhaps the best way to create systems that can learn new tasks is to create a planning system that has a high level understanding of how the world in general works, and then fine-tuning this system on specific tasks.\r\n* We may find that planning is difficult to avoid as we create more sophisticated systems. For example, [some have argued](https:\/\/arbital.com\/p\/consequentialist\/) that being an excellent planner (and having the advanced capabilities to carry out any plans created) is the best way of achieving *any* task. If that's true, then as we optimise our systems we should expect them to (once we've optimised hard enough) become good at planning.\r\n[\/fn]\r\n\r\n[fn ballgrasp]Looking at the animation, it doesn't seem that plausible that the system really fooled any humans. We're not quite sure what's going on here (it's not discussed in the [original paper](https:\/\/doi.org\/10.48550\/arXiv.1706.03741)), but one possibility is that the animation is showing the deployed system's attempts to grasp the ball, rather than the data used to train the system.[\/fn]\r\n\r\n[fn incentives]For a fuller discussion of the incentives to deploy potentially misaligned AI, see section 5 of Carlsmith's [draft report into existential risks from AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353).[\/fn]\r\n\r\n[fn laws]Lethal autonomous weapons [already exist](https:\/\/web.archive.org\/web\/20221016010507\/https:\/\/foreignpolicy.com\/2022\/05\/11\/killer-robots-lethal-autonomous-weapons-systems-ukraine-libya-regulation\/).\r\n\r\nFor more information, see:\r\n\r\n* [Risks from Autonomous Weapon Systems and Military AI](https:\/\/web.archive.org\/web\/20221016010520\/https:\/\/forum.effectivealtruism.org\/posts\/RKMNZn7r6cT2Yaorf\/risks-from-autonomous-weapon-systems-and-military-ai), an overview of attempts to reduce risks from lethal autonomous weapons.\r\n* [On AI Weapons](https:\/\/web.archive.org\/web\/20221016010522\/https:\/\/forum.effectivealtruism.org\/posts\/vdqBn65Qaw77MpqXz\/on-ai-weapons), a presentation of the argument that lethal autonomous weapons are, on balance, more good than bad.[\/fn]\r\n\r\n[fn adm]If humans leave the loop for some military decision-making, we could see unintentional military escalation. And even if humans do remain in the loop, we could see faster and more complex decision-making, increasing the chances of mistakes or high-risk decisions.\r\n\r\nFor more information, see:\r\n\r\n* [Machine learning, artificial intelligence, and the use of force by states](https:\/\/heinonline.org\/HOL\/LandingPage?handle=hein.journals\/jnatselp10&div=5&id=&page=), by Deeks et al. (2019).\r\n* [AI and International Stability: Risks and Confidence-Building Measures](https:\/\/web.archive.org\/web\/20221016010718\/https:\/\/www.cnas.org\/publications\/reports\/ai-and-international-stability-risks-and-confidence-building-measures), by Horowitz and Scharre (2021).\r\n\r\n[\/fn]\r\n\r\n[fn slbms]\r\n\r\n[\/fn]\r\n\r\n[fn progress]We already have some automated research assistance (for example [Elicit](https:\/\/elicit.org)). If AI systems replace some jobs, or speed up economic growth, we'll see more resources able to be dedicated to scientific advancement. And if we're successful at developing particularly capable AI systems, we could see [parts of the scientific process being automated completely](https:\/\/web.archive.org\/web\/20221013011707\/https:\/\/www.cold-takes.com\/transformative-ai-timelines-part-1-of-4-what-kind-of-ai\/). [\/fn]\r\n\r\n[fn aipathogens] \r\n[Urbina et al. (2022)](https:\/\/web.archive.org\/web\/20220719201542\/https:\/\/climate-science.press\/wp-content\/uploads\/2022\/03\/00s42256-022-00465-9.pdf) developed a computational proof that existing AI technologies for drug discovery could be misused to design biochemical weapons.\r\n\r\nAlso see:\r\n\r\n[O'Brien and Nelson (2020)](https:\/\/dx.doi.org\/10.1089\/hs.2019.0122):\r\n\r\n> Within the realm of synthetic biology, AI could potentially lower some of the barriers for a malicious actor to design dangerous pathogens with custom features.\r\n\r\n[Turchin and Denkenberger (2020)](https:\/\/dx.doi.org\/10.1007\/s00146-018-0845-5), section 3.2.3.[\/fn]\r\n\r\n\r\n[fn surveillance]AI is already facilitating the ability of governments to monitor their own citizens. {.doNotRemove}\r\n\r\nThe NSA is using AI [to help filter the huge amounts of data they collect](https:\/\/web.archive.org\/web\/20221016011132\/https:\/\/www.defenseone.com\/technology\/2020\/01\/spies-ai-future-artificial-intelligence-us-intelligence-community\/162673\/), significantly speeding up their ability to identify and predict the actions of people they are monitoring. China is increasingly using facial recognition and predictive policing, including [automated racial profiling](https:\/\/web.archive.org\/web\/20221016011138\/https:\/\/www.nytimes.com\/2019\/05\/22\/world\/asia\/china-surveillance-xinjiang.html) and automatic alarms when people classified as potential threats enter certain public places.\r\n\r\nThese sorts of surveillance technologies look like they are going to significantly improve \u2014 and in doing so, significantly increase the ability for governments to control their populations.[\/fn]\r\n\r\n[fn reviews]Reviewers were asked to [critique Carlsmith's report](https:\/\/web.archive.org\/web\/20221016011150\/https:\/\/forum.effectivealtruism.org\/posts\/GRv3KB2nPFRREXb5o\/reviews-of-is-power-seeking-ai-an-existential-risk) and give their own estimates of the existential risk from power-seeking AI. The estimates given of existential risk from power-seeking AI by 2070 were: Aschenbrenner: 0.5%, Garfinkel: 0.4%, Kokotajlo: 65%, Nanda: 9%, Soares: >77%, Tarsney: 3.5%, Thorstad: 0.000002%, Wallace: 2%.[\/fn]\r\n\r\n[fn bensinger]Around 117 researchers were asked:\r\n\r\n> How likely do you think it is that the overall value of the future will be drastically less than it could have been, as a result of AI systems not doing\/optimizing what the people deploying them wanted\/intended?\r\n\r\nResearchers from OpenAI, the Future of Humanity Institute (University of Oxford), the Center for Human-Compatible AI (UC Berkeley), Machine Intelligence Research Institute, Open Philanthropy, and DeepMind were asked to fill in the survey.\r\n\r\n44 people responded (~38% response rate).\r\n\r\nThe mean of the estimates given was 40%.[\/fn]\r\n\r\n[fn objections]These objections are adapted from section 4.2 of Carlsmith's [draft report into existential risks from AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353).[\/fn]\r\n\r\n[fn garfinkelmonetary]In cases where people are willing to use systems that they think have (e.g.) a 10% chance of immediately killing them, security concerns (like trying to preempt deployment of transformative AI by others) or perhaps moral\/idealistic concerns could play larger roles than desire for wealth. On the other hand, monetary incentives do seem to be a substantial current driver for research into AI capabilities. We might also expect monetary incentives to encourage motivated reasoning about the size of the risk from AI systems.\r\n[\/fn]\r\n\r\n[fn controllingobjectives]For a detailed overview of how easy or hard it might be to successfully control the objectives of ML systems, see section 4.3.1 of Carlsmith's [draft report into existential risks from AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353). Or, for one possible story about how a deceptive ML system could end up being developed, see [Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover](https:\/\/web.archive.org\/web\/20221013014109\/https:\/\/www.alignmentforum.org\/posts\/pRkFkzwKZ2zfa3R6H\/without-specific-countermeasures-the-easiest-path-to) by Cotra.[\/fn]\r\n\r\n[fn gdp] World GDP in 2020 was 84.75 USD, [according to the World Bank](https:\/\/web.archive.org\/web\/20221016011333\/https:\/\/data.worldbank.org\/indicator\/NY.GDP.MKTP.CD). We've assumed growth of 2% per year \u2014 see [here](https:\/\/www.cold-takes.com\/this-cant-go-on\/#fn5) for an explanation of why, and [here](https:\/\/web.archive.org\/save\/https:\/\/www.cold-takes.com\/more-on-multiple-world-size-economies-per-atom\/) for further discussion of what such a huge GDP could actually mean.[\/fn]\r\n\r\n[fn neglectednessupdate]\r\nNote that before 19 December 2022, this page gave a lower estimate of 300 FTE working on reducing existential risks from AI, of which around two thirds were working on technical AI safety research, with the rest split between strategy (and other governance) research and advocacy.\r\n\r\nThis change represents a (hopefully!) improved estimate, rather than a notable change in the number of researchers. [\/fn]\r\n\r\n[fn neglectednessestimate]\r\nIt's difficult to estimate this number.\r\n\r\nIdeally we want to estimate the number of FTE (\"[full-time equivalent](https:\/\/en.wikipedia.org\/wiki\/Full-time_equivalent)\") working on the problem of reducing existential risks from AI.\r\n\r\nBut there are lots of ambiguities around what counts as working on the issue. So I tried to use the following guidelines in my estimates:\r\n\r\n* I didn't include people who might think of themselves on a career path that is building towards a role preventing an AI-related catastrophe, but who are currently skilling up rather than working directly on the problem.\r\n* I included researchers, engineers, and other staff that seem to work directly on technical AI safety research or AI strategy and governance. But there's an uncertain boundary between these people and others who I chose not to include. For example, I didn't include machine learning engineers whose role is building AI systems that might be used for safety research but aren't *primarily* designed for that purpose.\r\n* I only included time spent on work that seems related to reducing the potentially [existential risks](https:\/\/80000hours.org\/articles\/existential-risks\/) from AI, like those discussed in this article. Lots of wider AI safety and AI ethics work focuses on reducing other risks from AI seems relevant to reducing existential risks \u2013 this 'indirect' work makes this estimate difficult. I decided not to include indirect work on reducing the risks of an AI-related catastrophe (see our [problem framework](https:\/\/80000hours.org\/articles\/problem-framework\/#a-challenge-direct-vs-indirect-future-effort) for more).\r\n* Relatedly, I didn't include people working on other problems that might indirectly affect the chances of an AI-related catastrophe, such as [epistemics and improving institutional decision-making](https:\/\/80000hours.org\/problem-profiles\/improving-institutional-decision-making\/), reducing the chances of [great power conflict](https:\/\/80000hours.org\/problem-profiles\/great-power-conflict\/), or [building effective altruism](https:\/\/80000hours.org\/problem-profiles\/promoting-effective-altruism\/). \r\n\r\nWith those decisions made, I estimated this in three different ways.\r\n\r\nFirst, for each organisation in the [AI Watch](https:\/\/aiwatch.issarice.com\/) database, I estimated the number of FTE working directly on reducing existential risks from AI. I did this by looking at the number of staff listed at each organisation, both in total and in 2022, as well as the number of researchers listed at each organisation. Overall I estimated that there were 76 to 536 FTE working on technical AI safety (90% confidence), with a mean of 196 FTE. I estimated that there were 51 to 359 FTE working on AI governance and strategy (90% confidence), with a mean of 151 FTE. There's a lot of subjective judgement in these estimates because of the ambiguities above. The estimates could be too low if AI Watch is missing data on some organisations, or too high if the data counts people more than once or includes people who no longer work in the area. \r\n\r\nSecond, I adapted the methodology used by [Gavin Leech's estimate of the number of people working on reducing existential risks from AI](https:\/\/forum.effectivealtruism.org\/posts\/8ErtxW7FRPGMtDqJy\/the-academic-contribution-to-ai-safety-seems-large). I split the organisations in Leech's estimate into technical safety and governance\/strategy. I adapted Gavin's figures for the proportion of computer science academic work relevant to the topic to fit my definitions above, and made a related estimate for work outside computer science but within academia that is relevant. Overall I estimated that there were 125 to 1,848 FTE working on technical AI safety (90% confidence), with a mean of 580 FTE. I estimated that there were 48 to 268 FTE working on AI governance and strategy (90% confidence), with a mean of 100 FTE.\r\n\r\nThird, I looked at the estimates of similar numbers by [Stephen McAleese](https:\/\/forum.effectivealtruism.org\/posts\/3gmkrj3khJHndYGNe\/estimating-the-current-and-future-number-of-ai-safety). I made minor changes to McAleese's categorisation of organisations, to ensure the numbers were consistent with the previous two estimates. Overall I estimated that there were 110 to 552 FTE working on technical AI safety (90% confidence), with a mean of 267 FTE. I estimated that there were 36 to 193 FTE working on AI governance and strategy (90% confidence), with a mean of 81 FTE.\r\n\r\nI took a geometric mean of the three estimates to form a final estimate, and combined confidence intervals by assuming that distributions were approximately lognormal.\r\n\r\nFinally, I estimated the number of FTE in [complementary roles](#complementary-yet-crucial-roles) using the AI Watch database. For relevant organisations, I identified those where there was enough data listed about the number of *researchers* at those organisations. I calculated the ratio between the number of researchers in 2022 and the number of staff in 2022, as recorded in the database. I calculated the mean of those ratios, and a confidence interval using the standard deviation. I used this ratio to calculate the overall number of support staff by assuming that estimates of the number of staff are lognormally distributed and that the estimate of this ratio is normally distributed. Overall I estimated that there were 2 to 2,357 FTE in complementary roles (90% confidence), with a mean of 770 FTE.\r\n\r\nThere are likely many errors in this methodology, but I expect these errors are small compared to the uncertainty in the underlying data I'm using. Ultimately, I'm still highly uncertain about the overall FTE working on preventing an AI-related catastrophe, but I'm confident enough that the number is relatively small to say that the problem as a whole is highly neglected.\r\n\r\nI'm very uncertain about this estimate. It involved a number of highly subjective judgement calls. You can see the (very rough) spreadsheet I worked off [here](https:\/\/docs.google.com\/spreadsheets\/d\/1e1Vh_nK_7VHKZUuQ9VNp3JWC2etjUAHVmVXbKarKMNw\/edit). If you have any feedback, I'd really appreciate it if you could tell me what you think using [this form](https:\/\/forms.gle\/RRZaFTfdDkSQ6fJG8).\r\n[\/fn]\r\n\r\n[fn capabilitiesspending]It's difficult to say exactly how much is being spent to advance AI capabilities. This is partly because of a lack of available data, and partly because of questions like:\r\n\r\n* What research in AI is actually advancing the sorts of dangerous capabilities that might be increasing potential existential risk?\r\n* Do advances in AI hardware or advances in data collection count?\r\n* How about broader improvements to research processes in general, or things that might increase investment in the future through producing economic growth?\r\n\r\nThe most relevant figure we could find was the expenses of DeepMind from 2020, which were around \u00a31 billion, [according to its annual report](https:\/\/web.archive.org\/web\/20221016011531\/https:\/\/find-and-update.company-information.service.gov.uk\/company\/07386350\/filing-history). We'd expect most of that to be contributing to \"advancing AI capabilities\" in some sense, since its main goal is building powerful, general AI systems. (Although it's important to note that DeepMind is also contributing to work in AI safety, which may be reducing existential risk.)\r\n\r\nIf DeepMind is around about 10% of the spending on advancing AI capabilities, this gives us a figure of around \u00a310 billion. (Given that there are many AI companies in the US, and a large effort to produce advanced AI in China, we think 10% could be a good overall guess.)\r\n\r\nAs an upper bound, the total revenues of the AI sector in 2021 were [around $340 billion](https:\/\/web.archive.org\/web\/20221016011608\/https:\/\/www.idc.com\/getdoc.jsp?containerId=prUS48127321).\r\n\r\nSo overall, we think the amount being spent to advance AI capabilities is between $1 billion and $340 billion per year. Even assuming a figure as low as $1 billion, this would still be around 100 times the amount spent on reducing risks from AI.[\/fn]\r\n\r\n\r\n[fn misalignmentdfns]There are various definitions of *alignment* used in the literature, which differ subtly. These include:\r\n\r\n* An AI is aligned if its decisions maximise the utility of some principal (e.g. an operator or user) ([Shapiro & Shachter, 2002](https:\/\/web.archive.org\/web\/20221016011851\/https:\/\/www.aaai.org\/Papers\/Symposia\/Spring\/2002\/SS-02-07\/SS02-07-002.pdf)).\r\n* An AI is aligned if it acts in the interests of humans ([Soares & Fallenstein, 2015](https:\/\/web.archive.org\/web\/20210413005225\/https:\/\/intelligence.org\/files\/obsolete\/TechnicalAgenda%5Bold%5D.pdf)).\r\n* An AI is \"intent aligned\" if it is trying to do what its operator wants it to do ([Christiano, 2018](https:\/\/ai-alignment.com\/clarifying-ai-alignment-cec47cd69dd6)).\r\n* An AI is \"impact aligned\" (with humans) if it doesn't take actions that we would judge to be bad\/problematic\/dangerous\/catastrophic, and \"intent aligned\" if the optimal policy for its behavioural objective is impact aligned with humans ([Hubinger, 2020](https:\/\/www.alignmentforum.org\/posts\/SzecSPYxqRa5GCaSF\/clarifying-inner-alignment-terminology)).\r\n* An AI is \"intent aligned\" if it is trying to do, or \"impact aligned\" if it is succeeding in doing what a human person or institution wants it to do ([Critch, 2020](https:\/\/web.archive.org\/web\/20221016012022\/https:\/\/www.lesswrong.com\/posts\/hvGoYXi2kgnS3vxqb\/some-ai-research-areas-and-their-relevance-to-existential-1)).\r\n* An AI is \"fully aligned\" if it does not engage in unintended behaviour (specifically, unintended behaviour that arises in virtue of problems with the system's objectives) in response to any inputs compatible with basic physical conditions of our universe ([Carlsmith, 2022](https:\/\/doi.org\/10.48550\/arXiv.2206.13353)).\r\n\r\nThe term \"aligned\" is also often used to refer to the *goals* of a system, in the sense that an AI's goals are aligned if they will produce the same actions from the AI that would occur if the AI shared the goals of some other entity (e.g. its user or operator).\r\n\r\nWe use alignment here to refer to systems, rather than goals. Our definition is most similar to the definitions of \"intent\" alignment given by Christiano and Critch, and is similar to the definition of \"full\" alignment given by Carlsmith.\r\n[\/fn]\r\n\r\n[fn 1]We think it's likely to be very difficult to control the objectives of modern ML systems, for a number of reasons that we'll go through [later](#controlling-objectives). This has two implications:\r\n\r\n1. It's hard to ensure that systems are trying to do what we want them to do, which means it's hard to make systems aligned.\r\n\r\n2. It's hard to correct systems when we think that problems with their objectives could have particularly bad consequences.\r\n\r\nAs we'll argue, we think problems with AI systems' objectives could have particularly bad consequences.\r\n\r\nAjeya Cotra, a researcher at Open Philanthropy has written about why we might expect AI alignment to be hard with modern deep learning. We'd recommend [this post](https:\/\/web.archive.org\/web\/20221013022057\/https:\/\/www.cold-takes.com\/why-ai-alignment-could-be-hard-with-modern-deep-learning\/) for people new to ML, and [this](https:\/\/web.archive.org\/web\/20221013014109\/https:\/\/www.alignmentforum.org\/posts\/pRkFkzwKZ2zfa3R6H\/without-specific-countermeasures-the-easiest-path-to) for those more familiar with ML.\r\n[\/fn]\r\n\r\n[fn 2]Gaining enforced power or influence over others generally seems bad, and we're going to take that as given for the rest of this argument. Indeed, we think some forms of taking power away from humanity could even constitute an existential catastrophe, which we discuss further [later](#instrumental-convergence). However, we should note that this doesn't seem *fundamentally* true of all cases where things gain power, because in some cases power can be used to produce good outcomes (e.g. often people attempting to do good in the world will try to win elections). With AI systems, as we'll argue, we're really not sure how to ensure those outcomes would be good.[\/fn]\r\n\r\n[fn dangerous]\r\nIn the two human examples given in this section (politicians and companies), the negative effects of misalignment are tempered somewhat. This is for two reasons: \r\n\r\n1. Neither companies nor politicians have absolute power.\r\n2. We are talking about humans, whose true incentives are actually more complex (for example, they might care about acting ethically and not just achieving their specified goal). \r\n\r\nAs a result, it's hard for a set of politicians to turn things completely upside down for votes, some politicians will put in place unpopular policies they think will make things better, and some companies will do things like donate a portion of their profits to charity.\r\n\r\n(Of course, it's arguable whether companies' charitable donations are truly hurting their profits, and if they'd make them if they were \u2014 it's possible that they get enough good press from work like this that it actually makes them money. But there are definitely examples where this is much harder to argue. For example, some [meat and dairy farmers are selling their animals and concentrating on growing plants instead](https:\/\/web.archive.org\/save\/https:\/\/plantbasednews.org\/culture\/five-times-dairy-farmers-went-vegan\/) because of concerns about the moral value of animals.)\r\n\r\nMisaligned AI systems (especially those with advanced capabilities, doing things more than moving around a simulated robot arm) won't necessarily have these tempering human instincts, and could have *a lot* more power.\r\n[\/fn]\r\n\r\n[fn clarke]This distinction taken from [Sam Clarke's overview of AI governance](https:\/\/web.archive.org\/web\/20221016012047\/https:\/\/forum.effectivealtruism.org\/posts\/ydpo7LcJWhrr2GJrx\/the-longtermist-ai-governance-landscape-a-basic-overview).[\/fn]\r\n\r\n[fn carlsmithmisalignment]These arguments are adapted from section 4.3 (\"The challenge of practical PS-alignment\") of Carlsmith's [report into existential risks from power-seeking AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353).[\/fn]\r\n\r\n[fn carlsmithproxies]See section 4.3.1.1 (\"Problems with proxies\") of Carlsmith's [report into existential risks from power-seeking AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353).[\/fn]\r\n\r\n[fn carlsmithsearch]See section 4.3.1.2 (\"Problems with search\") of Carlsmith's [report into existential risks from power-seeking AI](https:\/\/doi.org\/10.48550\/arXiv.2206.13353).[\/fn]\r\n\r\n[fn badfuture]That AI systems choose to disempower humanity (presumably in order to prevent us from interfering with their plans) is evidence that we would, if we hadn't been disempowered, have chosen to interfere with the systems' plans. As a result, this disempowerment is some evidence that we won't like the future that these systems would create.[\/fn]\r\n\r\n[fn lhc]For a suggestion of what this might look like, consider the fears that arose during the construction of the Large Hadron Collider.\r\n\r\nA group of researchers convened to explore whether the heavy-ion collisions could produce negatively charged strangelets and black holes \u2014 potentially posing a threat to the whole planet. They [concluded](https:\/\/web.archive.org\/web\/20080907004852\/http:\/\/doc.cern.ch\/yellowrep\/2003\/2003-001\/p1.pdf) there was \"no basis for any conceivable threat\" \u2014 but it's possible they might have found otherwise, and it's possible future experiments in physics could pose extreme risks.\r\n\r\nA related example is the risk considered by researchers at Los Alamos in 1942 that the first nuclear weapon test could [ignite the whole atmosphere](https:\/\/www.bbc.com\/future\/article\/20230907-the-fear-of-a-nuclear-fire-that-would-consume-earth) of the Earth in an unstoppable chain reaction.[\/fn]\r\n\r\n[fn randreport]A 2023 [report](https:\/\/www.rand.org\/pubs\/research_reports\/RRA2977-1.html) from the research organisation Rand noted: \"Previous biological attacks that failed because of a lack of information might succeed in a world in which AI tools have access to all of the information needed to bridge that information gap.\"\r\n\r\nBut in January 2024, Rand published a follow-up [study](https:\/\/www.rand.org\/news\/press\/2024\/01\/25.html), which found that the current generation of large language models do not meaningfully increase the risk of biological attacks. \r\n\r\nHowever, future systems *could* increase the danger without adequate safeguards. The researchers explained:\r\n\r\n>Because LLMs are increasingly capable and available, it's important to monitor their evolution to ensure they are safe and secure from potential misuse, according to the report. Accurate risk assessment models, such as the methodology developed for this research, can be used to help evaluate these technologies and inform the discussion of effective regulatory frameworks.[\/fn]\r\n\r\n[fn bioexperts]Experts in the field of biotechnology disagree about how plausible such scenarios are. For different views on this and other controversies in biosecurity, you can read [an article we wrote](\/articles\/anonymous-misconceptions-about-biosecurity\/) compiling a range of expert views on the topic.[\/fn]\r\n\r\n[fn rogueaiagents]For more discussion of this possibility, see: Hendrycks, Dan, Mantas Mazeika, and Thomas Woodside. [\"An overview of catastrophic AI risks.\"](https:\/\/arxiv.org\/abs\/2306.12001) arXiv preprint arXiv:2306.12001 (2023).[\/fn]\r\n\r\n[fn sandbrink] For more discussion of this, see: Sandbrink, Jonas B. [\"Artificial intelligence and biological misuse: Differentiating risks of language models and biological design tools.\"](https:\/\/arxiv.org\/abs\/2306.13952) arXiv preprint arXiv:2306.13952 (2023).[\/fn]\r\n\r\n[fn otherharms]This list is not exhaustive. And there are likely many other policy approaches that would be worthwhile and justified to pursue, but that would not be targeted at reducing the biggest risks. \r\n\r\nWe don't include them here, because this article is about preventing existential risks in particular. But we also support policies that would reduce other harms from AI, and we think that many of the policies in the list could reduce both existential risks and other harms.[\/fn]\r\n\r\n[fn bernardi] For more information, see: Bernardi, Jamie, et al. \"Societal Adaptation to Advanced AI.\" arXiv preprint arXiv:2405.10295 (2024).[\/fn]\r\n\r\n[fn safetycases] See, for instance: Bishop, P. G. & Bloomfield, R. E. (1998). [A Methodology for Safety Case Development](https:\/\/openaccess.city.ac.uk\/id\/eprint\/549\/). In: Redmill, F. & Anderson, T. (Eds.), Industrial Perspectives of Safety-critical Systems: Proceedings of the Sixth Safety-critical Systems Symposium, Birmingham 1998. . London, UK: Springer. ISBN 3540761896 [\/fn]\r\n\r\n[fn surveybio]\r\nThis is the same survey we saw [earlier](#experts-are-concerned) which asked about the [overall chances of extinction from AI](#experts-are-concerned) and [when transformative AI might be developed](#when-can-we-expect-to-develop-transformative-AI). \r\n\r\n[Grace et al. (2024)](https:\/\/arxiv.org\/abs\/2401.02843v1) asked 1,345 of the 2,778 respondents (researchers who published at NeurIPS, IMCL, or four other top AI venues) about potentially concerning AI scenarios. (Participants were randomly allocated questions on only one of several topics to keep the survey brief, with questions being allocated to more participants based on factors like the question's importance and how useful it would be to have a large sample size.)\r\n\r\nThey were asked about the following eleven scenarios:\r\n\r\n> * A powerful AI system has its goals not set right, causing a catastrophe (e.g. it develops and uses powerful weapons)\r\n> * AI lets dangerous groups make powerful tools (e.g. engineered viruses)\r\n> * AI makes it easy to spread false information, e.g. deepfakes\r\n> * AI systems manipulate large-scale public opinion trends\r\n> * AI systems with the wrong goals become very powerful and reduce the role of humans in making decisions\r\n> * AI systems worsen economic inequality by disproportionately benefiting certain institutions\r\n> * Authoritarian rulers use AI to control their population\r\n> * Bias in AI systems makes unjust situations worse, e.g. AI systems learn to discriminate by gender or race in hiring processes\r\n> * Near-full automation of labor leaves most people economically powerless\r\n> * Near-full automation of labor makes people struggle to find meaning in their lives.\r\n> * People interact with other humans less because they are spending more time interacting with AI systems\r\n\r\nFor each scenario, the participants were asked whether it constituted \"no concern,\" \"a little concern,\" \"substantial concern,\" or \"extreme concern\".\r\n\r\nGrace et al. found:\r\n\r\n> Each scenario was considered worthy of either substantial or extreme concern by more than 30% of respondents. As measured by the percentage of respondents who thought a scenario constituted either a \"substantial\" or \"extreme\" concern, the scenarios worthy of most concern were: spread of false information e.g. deepfakes (86%), manipulation of large-scale public opinion trends (79%), AI letting dangerous groups make powerful tools (e.g. engineered viruses) (73%), authoritarian rulers using AI to control their populations (73%), and AI systems worsening economic inequality by disproportionately benefiting certain individuals (71%).\r\n> \r\n> There is some ambiguity about the reason why a scenario might be considered concerning: it might be considered especially disastrous, or especially likely, or both. From our results, there's no way to disambiguate these considerations.\r\n\r\nNo equivalent questions were asked on earlier surveys.\r\n[\/fn]\r\n"},"categories":[1353,1426,1387,1418,368,1241],"class_list":["post-77853","problem_profile","type-problem_profile","status-publish","has-post-thumbnail","hentry","category-ai","category-ai-companies","category-ai-policy","category-ai-safety-technical-research","category-existential-risk","category-top-recommended-organisations"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.3 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Risks from power-seeking AI systems | 80,000 Hours<\/title>\n<meta name=\"description\" content=\"The future of AI is difficult to predict. But while AI systems could have substantial positive effects, there&#039;s a growing consensus about the dangers of AI.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Risks from power-seeking AI systems - Problem profile\" \/>\n<meta property=\"og:description\" content=\"Why do we think that reducing risks from AI is one of the most pressing issues of our time? There are technical safety issues that we believe could, in the worst case, lead to an existential threat to humanity.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/\" \/>\n<meta property=\"og:site_name\" content=\"80,000 Hours\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/80000Hours\" \/>\n<meta property=\"article:modified_time\" content=\"2026-04-22T16:44:24+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2100\" \/>\n\t<meta property=\"og:image:height\" content=\"1200\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Risks from power-seeking AI systems - Problem profile\" \/>\n<meta name=\"twitter:description\" content=\"Why do we think that reducing risks from AI is one of the most pressing issues of our time?\" \/>\n<meta name=\"twitter:site\" content=\"@80000hours\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"57 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/80000hours.org\\\/problem-profiles\\\/risks-from-power-seeking-ai\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/80000hours.org\\\/problem-profiles\\\/risks-from-power-seeking-ai\\\/\"},\"author\":{\"name\":\"Cody Fenwick\",\"@id\":\"https:\\\/\\\/80000hours.org\\\/#\\\/schema\\\/person\\\/75ac20bab88e70f659caa92bc64fd2cc\"},\"headline\":\"Risks from power-seeking AI&nbsp;systems\",\"datePublished\":\"2025-07-17T19:43:58+00:00\",\"dateModified\":\"2026-04-22T16:44:24+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/80000hours.org\\\/problem-profiles\\\/risks-from-power-seeking-ai\\\/\"},\"wordCount\":11524,\"publisher\":{\"@id\":\"https:\\\/\\\/80000hours.org\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/80000hours.org\\\/problem-profiles\\\/risks-from-power-seeking-ai\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/80000hours.org\\\/wp-content\\\/uploads\\\/2022\\\/08\\\/panels-photo.jpg\",\"articleSection\":[\"AI\",\"AI companies\",\"AI policy\",\"AI safety technical research\",\"Existential risk\",\"Top-recommended organisations\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/80000hours.org\\\/problem-profiles\\\/risks-from-power-seeking-ai\\\/\",\"url\":\"https:\\\/\\\/80000hours.org\\\/problem-profiles\\\/risks-from-power-seeking-ai\\\/\",\"name\":\"Risks from power-seeking AI systems | 80,000 Hours\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/80000hours.org\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/80000hours.org\\\/problem-profiles\\\/risks-from-power-seeking-ai\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/80000hours.org\\\/problem-profiles\\\/risks-from-power-seeking-ai\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/80000hours.org\\\/wp-content\\\/uploads\\\/2022\\\/08\\\/panels-photo.jpg\",\"datePublished\":\"2025-07-17T19:43:58+00:00\",\"dateModified\":\"2026-04-22T16:44:24+00:00\",\"description\":\"The future of AI is difficult to predict. But while AI systems could have substantial positive effects, there's a growing consensus about the dangers of AI.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/80000hours.org\\\/problem-profiles\\\/risks-from-power-seeking-ai\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/80000hours.org\\\/problem-profiles\\\/risks-from-power-seeking-ai\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/80000hours.org\\\/problem-profiles\\\/risks-from-power-seeking-ai\\\/#primaryimage\",\"url\":\"https:\\\/\\\/80000hours.org\\\/wp-content\\\/uploads\\\/2022\\\/08\\\/panels-photo.jpg\",\"contentUrl\":\"https:\\\/\\\/80000hours.org\\\/wp-content\\\/uploads\\\/2022\\\/08\\\/panels-photo.jpg\",\"width\":2100,\"height\":1200},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/80000hours.org\\\/problem-profiles\\\/risks-from-power-seeking-ai\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/80000hours.org\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Risks from power-seeking AI&nbsp;systems\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/80000hours.org\\\/#website\",\"url\":\"https:\\\/\\\/80000hours.org\\\/\",\"name\":\"80,000 Hours\",\"description\":\"\",\"publisher\":{\"@id\":\"https:\\\/\\\/80000hours.org\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/80000hours.org\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/80000hours.org\\\/#organization\",\"name\":\"80,000 Hours\",\"url\":\"https:\\\/\\\/80000hours.org\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/80000hours.org\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/80000hours.org\\\/wp-content\\\/uploads\\\/2018\\\/07\\\/og-logo_0.png\",\"contentUrl\":\"https:\\\/\\\/80000hours.org\\\/wp-content\\\/uploads\\\/2018\\\/07\\\/og-logo_0.png\",\"width\":1500,\"height\":785,\"caption\":\"80,000 Hours\"},\"image\":{\"@id\":\"https:\\\/\\\/80000hours.org\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/80000Hours\",\"https:\\\/\\\/x.com\\\/80000hours\",\"https:\\\/\\\/www.youtube.com\\\/user\\\/eightythousandhours\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/80000hours.org\\\/#\\\/schema\\\/person\\\/75ac20bab88e70f659caa92bc64fd2cc\",\"name\":\"Cody Fenwick\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b15399901921a0324ca7f860d98f72339b991688812084f26d8dd50d5ec79aa5?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b15399901921a0324ca7f860d98f72339b991688812084f26d8dd50d5ec79aa5?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/b15399901921a0324ca7f860d98f72339b991688812084f26d8dd50d5ec79aa5?s=96&d=mm&r=g\",\"caption\":\"Cody Fenwick\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/cody.fenwick\",\"https:\\\/\\\/www.linkedin.com\\\/in\\\/cody-fenwick-8073089b\\\/\",\"https:\\\/\\\/x.com\\\/codytfenwick\"],\"url\":\"https:\\\/\\\/80000hours.org\\\/author\\\/cody-fenwick\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Risks from power-seeking AI systems | 80,000 Hours","description":"The future of AI is difficult to predict. But while AI systems could have substantial positive effects, there's a growing consensus about the dangers of AI.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/","og_locale":"en_US","og_type":"article","og_title":"Risks from power-seeking AI systems - Problem profile","og_description":"Why do we think that reducing risks from AI is one of the most pressing issues of our time? There are technical safety issues that we believe could, in the worst case, lead to an existential threat to humanity.","og_url":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/","og_site_name":"80,000 Hours","article_publisher":"https:\/\/www.facebook.com\/80000Hours","article_modified_time":"2026-04-22T16:44:24+00:00","og_image":[{"width":2100,"height":1200,"url":"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg","type":"image\/jpeg"}],"twitter_card":"summary_large_image","twitter_title":"Risks from power-seeking AI systems - Problem profile","twitter_description":"Why do we think that reducing risks from AI is one of the most pressing issues of our time?","twitter_site":"@80000hours","twitter_misc":{"Est. reading time":"57 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/#article","isPartOf":{"@id":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/"},"author":{"name":"Cody Fenwick","@id":"https:\/\/80000hours.org\/#\/schema\/person\/75ac20bab88e70f659caa92bc64fd2cc"},"headline":"Risks from power-seeking AI&nbsp;systems","datePublished":"2025-07-17T19:43:58+00:00","dateModified":"2026-04-22T16:44:24+00:00","mainEntityOfPage":{"@id":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/"},"wordCount":11524,"publisher":{"@id":"https:\/\/80000hours.org\/#organization"},"image":{"@id":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/#primaryimage"},"thumbnailUrl":"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg","articleSection":["AI","AI companies","AI policy","AI safety technical research","Existential risk","Top-recommended organisations"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/","url":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/","name":"Risks from power-seeking AI systems | 80,000 Hours","isPartOf":{"@id":"https:\/\/80000hours.org\/#website"},"primaryImageOfPage":{"@id":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/#primaryimage"},"image":{"@id":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/#primaryimage"},"thumbnailUrl":"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg","datePublished":"2025-07-17T19:43:58+00:00","dateModified":"2026-04-22T16:44:24+00:00","description":"The future of AI is difficult to predict. But while AI systems could have substantial positive effects, there's a growing consensus about the dangers of AI.","breadcrumb":{"@id":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/#primaryimage","url":"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg","contentUrl":"https:\/\/80000hours.org\/wp-content\/uploads\/2022\/08\/panels-photo.jpg","width":2100,"height":1200},{"@type":"BreadcrumbList","@id":"https:\/\/80000hours.org\/problem-profiles\/risks-from-power-seeking-ai\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/80000hours.org\/"},{"@type":"ListItem","position":2,"name":"Risks from power-seeking AI&nbsp;systems"}]},{"@type":"WebSite","@id":"https:\/\/80000hours.org\/#website","url":"https:\/\/80000hours.org\/","name":"80,000 Hours","description":"","publisher":{"@id":"https:\/\/80000hours.org\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/80000hours.org\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/80000hours.org\/#organization","name":"80,000 Hours","url":"https:\/\/80000hours.org\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/80000hours.org\/#\/schema\/logo\/image\/","url":"https:\/\/80000hours.org\/wp-content\/uploads\/2018\/07\/og-logo_0.png","contentUrl":"https:\/\/80000hours.org\/wp-content\/uploads\/2018\/07\/og-logo_0.png","width":1500,"height":785,"caption":"80,000 Hours"},"image":{"@id":"https:\/\/80000hours.org\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/80000Hours","https:\/\/x.com\/80000hours","https:\/\/www.youtube.com\/user\/eightythousandhours"]},{"@type":"Person","@id":"https:\/\/80000hours.org\/#\/schema\/person\/75ac20bab88e70f659caa92bc64fd2cc","name":"Cody Fenwick","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/b15399901921a0324ca7f860d98f72339b991688812084f26d8dd50d5ec79aa5?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/b15399901921a0324ca7f860d98f72339b991688812084f26d8dd50d5ec79aa5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/b15399901921a0324ca7f860d98f72339b991688812084f26d8dd50d5ec79aa5?s=96&d=mm&r=g","caption":"Cody Fenwick"},"sameAs":["https:\/\/www.facebook.com\/cody.fenwick","https:\/\/www.linkedin.com\/in\/cody-fenwick-8073089b\/","https:\/\/x.com\/codytfenwick"],"url":"https:\/\/80000hours.org\/author\/cody-fenwick\/"}]}},"_links":{"self":[{"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/problem_profile\/77853","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/problem_profile"}],"about":[{"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/types\/problem_profile"}],"author":[{"embeddable":true,"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/users\/435"}],"version-history":[{"count":5,"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/problem_profile\/77853\/revisions"}],"predecessor-version":[{"id":96136,"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/problem_profile\/77853\/revisions\/96136"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/media\/87151"}],"wp:attachment":[{"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/media?parent=77853"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/80000hours.org\/wp-json\/wp\/v2\/categories?post=77853"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}