80,000 Hours https://80000hours.org/ Wed, 22 Apr 2026 16:39:52 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.4 Will MacAskill on why AI character matters even more than you think https://80000hours.org/podcast/episodes/will-macaskill-ai-character-viatopia/ Wed, 22 Apr 2026 16:14:28 +0000 https://80000hours.org/?post_type=podcast&p=96099 The post Will MacAskill on why AI character matters even more than you think appeared first on 80,000 Hours.

]]>
The post Will MacAskill on why AI character matters even more than you think appeared first on 80,000 Hours.

]]>
Want to upskill in AI policy? Here are 57 useful resources https://80000hours.org/2026/04/want-to-upskill-in-ai-policy-here-are-57-useful-resources/ Fri, 17 Apr 2026 19:03:15 +0000 https://80000hours.org/?p=96097 The post Want to upskill in AI policy? Here are 57 useful resources appeared first on 80,000 Hours.

]]>
Are you enthusiastic about developing AI policy to minimise the risks and maximise the benefits of the technology, but need concrete ideas for how to enter the field?

Below are our top picks for upskilling in policy research or implementation to ensure government policies are prepared for a world with powerful AI systems. In practice, upskilling involves developing the research skills, domain expertise, and interpersonal networks needed to address challenges such as informing lawmakers or working for one yourself.

We developed this list in consultation with our advisors to highlight the resources they most commonly recommend, including articles, courses, organisations, and fellowships. While we recommend applying to speak to an advisor for tailored, one-on-one guidance, this page gives a practical, noncomprehensive snapshot of how you might move from being interested in AI policy to starting to work on it.

Overviews and expert advice

These resources outline the AI policy landscape, highlighting current research efforts and some practical ways to begin contributing to the field.

Courses and ideas for part-time projects

If you’re looking for concrete ways to build experience and test your interest in policy, consider:

Fellowships

If you’re looking to break into AI policy, these programmes offer structured support with mentorship, funding, and access to active researchers.

AI policy organisations

A growing ecosystem of organisations are working on developing and shaping AI policy. The list below highlights some key players, but a larger list is available here or on our job board.

Staying up to date with AI policy developments (podcasts, newsletters, etc)

Here are our top recommendations for keeping up with the latest developments and debates in the field.

Want one-on-one advice on pursuing AI policy careers?

We think the risks from AI could be the most pressing problems the world currently faces. If you think you might be a good fit for a career path that contributes to solving this problem, we’d be especially excited to advise you on next steps, one-on-one.

We can help you consider your options, make connections with others working on reducing risks from AI, and possibly even help you find jobs or funding opportunities — all for free.

Speak to our team

The post Want to upskill in AI policy? Here are 57 useful resources appeared first on 80,000 Hours.

]]>
How scary is Claude Mythos? 303 pages in 21 minutes https://80000hours.org/2026/04/claude-mythos-hacking-alignment/ Fri, 10 Apr 2026 16:55:42 +0000 https://80000hours.org/?p=95942 The post How scary is Claude Mythos? 303 pages in 21 minutes appeared first on 80,000 Hours.

]]>

As we now know, Anthropic has built an AI that can break into almost any computer on Earth. That AI has already found thousands of unknown security vulnerabilities in every major operating system and every major browser. And Anthropic has decided it’s too dangerous to release to the public; it would just cause too much harm.

Here are just a few of the things that AI accomplished during testing:

  • It found a 27-year-old flaw in the world’s most security-hardened operating system that would in effect let it crash all kinds of essential infrastructure.
  • Engineers at the company with no particular security training asked it to find vulnerabilities overnight and woke up to working exploits of critical security flaws that could be used to cause real harm.
  • It managed to figure out how to build web pages that, when visited by fully updated, fully patched computers, would allow it to write to the operating system kernel — the most important and protected layer of any computer.

We know all this because Anthropic has released hundreds of pages of documentation about this model, which they’ve called Claude Mythos.

I’m going to take you on a tour of all the crazy shit buried in these documents, and then I’m going to tell you what Anthropic says they plan to do to save us from their creation.

Why people are panicking about computer security

So how good is Mythos at hacking into computers? Well, unfortunately, it ‘saturates’ all existing ways of testing how good a model is at offensive cyber capabilities. That is to say it scores close to 100%, so those tests can’t effectively tell how far its capabilities extend anymore. So to test Mythos, Anthropic has instead just been setting it loose, telling it to find serious unknown exploits that would work on currently used, fully patched computer systems.

The end result of that is that Nicholas Carlini, one of the world’s leading security researchers who moved to Anthropic a year ago, says that he’s “found more bugs in the last couple of weeks [with Mythos] than I’ve found in the rest of my life combined.”

For example:

  • Mythos found a 17-year-old flaw in FreeBSD — that’s an operating system mostly used to run servers — that would let an attacker take complete control of any machine on the network, without needing a password or any credentials at all. The model found the necessary flaw and then built a working exploit, fully autonomously.
  • Mythos found a 16-year-old vulnerability in FFmpeg — that is a piece of software used by almost all devices to encode and decode video. That was in a line of code that existing security testing tools had checked over literally many millions of times and always failed to notice.
  • Mythos is the first AI model to complete a full corporate network attack simulation from beginning to end — a task that would take a human security expert days of work and which no previous model had managed before.

And more broadly, it’s just much, much better at actually exploiting the vulnerabilities that it finds. Anthropic’s previous model Opus 4.6 could only successfully convert a bug it identified in the browser Firefox into an effective way to accomplish something really bad 1% of the time. Mythos could do it 72% of the time.

To quote the report: “We have seen Mythos Preview write exploits in hours that expert penetration testers said would have taken them weeks to develop.”

Now, Anthropic is only willing to give us details of about 1% of the security flaws they’ve identified, because only that 1% have been patched so far, so it would be irresponsible to tell us about the rest.

So hopefully all that helps to explain why Anthropic has decided to not make the model publicly available for now, and instead is only sharing it with a handful of 12 big tech and finance companies to help them patch all these bugs, so that eventually they can give people access without it being a disaster.

These crazy capabilities aren’t a result of Anthropic going out of its way to make their AI especially good at cyberoffensive tasks in particular. They’ve mostly just been making it smarter and better at coding in general, and all of these amazing, dangerous skills have come along for the ride somewhat incidentally.

And it’s probably not just Anthropic that’s developing capabilities like this either. Sam Altman says OpenAI is finding “similar results to Anthropic” with their own coding model.

Now, Anthropic doesn’t say this directly in their reports, but I think a common-sense interpretation of the above is that in any deployment where this AI has access to the kind of tools that would make it actually useful to people — the ability access some parts of the network and execute code — could probably break out of whatever software box we try to put it in, because the systems that we would be trying to restrain it are themselves made of software, and that software is going to have vulnerabilities nobody knows about that this model is superhumanly good at finding and taking advantage of.

Mythos could break out of containment

And the notion of Mythos escaping containment isn’t just speculation. Let me tell you about another story in the system card for the model.

A few months ago, an AI researcher at Anthropic was eating a sandwich in a park on his lunch break when he got an email from an instance of Mythos. That instance of the model wasn’t supposed to have access to the internet. But during testing, a simulated user had instructed an early version of Mythos to try to escape from a secured sandbox — a contained environment from which it’s not meant to be able to access the outside.

So the model developed what Anthropic describes as a “moderately sophisticated multi-step exploit” to gain broad internet access. Then, having succeeded at the task it had seemingly been requested to engage in, it notified the researcher by emailing them. More worrying though, the model posted the exploit it used to break out on several obscure but publicly accessible websites. Why? We don’t entirely know, but Anthropic suggests it was “an unasked-for effort to demonstrate its success.”

In the past, stories about AIs breaking out of sandboxes and publishing security vulnerabilities like that might have felt impressive and kind of exciting. But they are very serious now, because Mythos Preview’s capabilities are themselves very serious ones. This is the first AI model where, if it fell into the hands of criminals or hostile state cyber actors, it would be an actual disaster.

It’s also, frankly, the first model that I feel deeply uncomfortable knowing that any company or government has unrestricted access to, even companies and governments I might broadly like. It simply grants a dangerous amount of power, a power that nobody ought to really have.

Now, we’ve known something like this was coming down the pipeline; the writing has been on the wall for a while. But a revolution in cybersecurity — an apocalypse, some might say — that we until now expected to happen gradually over a period of years has now happened very suddenly, over just a few months, and without the rest of the world realising it was happening until Tuesday’s announcement.

Anthropic is losing billions in revenue by not releasing Mythos

But Mythos isn’t just good at hacking. Across the full range of AI capability measures, it has advanced roughly twice as far as past trends would have predicted.

If you average over all kinds of different skills, all kinds of capability evals, measures of how good AI models are, the trendline for the previous Claude models is remarkably linear over time. But as you can see on the graph below, Mythos jumps ahead, basically progressing more than twice as far as we would have expected it to since the previous model, Claude Opus 4.6, came out — which keep in mind was just three months ago.

Anthropic models on Epoch's Capabilities Index over time

And also keep in mind that on Monday — the day before Anthropic published all of this — we learned that their annualised revenue run rate had grown from $9 billion at the end of December to $30 billion just three months later. That’s 3.3x growth in a single quarter — perhaps the fastest revenue growth rate for a company of that size ever recorded.

That exploding revenue is a pretty good proxy for how much more useful the previous release, Opus 4.6, has become for real-world tasks. If the past relationship between capability measures and usefulness continues to hold, the economic impact of Mythos once it becomes available is going to dwarf everything that came before it — which is part of why Anthropic’s decision not to release it is a serious one, and actually quite a costly one for them.

They’re sitting on something that would likely push their revenue run rate into the hundreds of billions, but they’ve decided it’s simply not worth the risk.

Mythos is actually the most aligned model to date, except…

The good news in all of this is that despite its scary capabilities, Mythos Preview as it exists today (rather than the earlier versions) is a seemingly very aligned, well-behaved model, and perhaps Anthropic’s alignment training has been more effective this time around than ever before.

According to the company: “Claude Mythos Preview is, on essentially every dimension we can measure, the best-aligned model that we have released to date by a significant margin.”

In Anthropic’s “automated behavioral audit” — basically thousands of simulated attempts to get the model to do bad things — they found that Mythos cooperated with misuse attempts less than half as often as the previous model, while being no more likely to refuse innocent requests than before.

But that’s not all:

  • Its self-preservation instincts were down significantly.
  • So was its willingness to assist with deception.
  • So was its willingness to help with fraud.
  • Its level of sycophancy dropped.
  • It was less likely to go nuts and delete all your files if you gave it access to your computer.
  • And the list of positive results goes on.

The picture is a little more complicated than that. As you might expect, the model looked less aligned, it performed less impressively on a few particular external tests than it did on Anthropic’s own internal ones.

An early version of the model, as I mentioned, was a little bit more of a wild child. It had some really severe kinds of misbehaviour, like taking reckless actions it had been told not to take, and then very deliberately trying to cover its tracks so that it wouldn’t be caught. That was a kind of thing that it did sometimes. But later versions of the model, the one that we have now, after additional alignment training, seemed to stop doing that sort of thing almost completely, or at least it’s so rare that we haven’t noticed it yet.

The bottom line is that on all these standard measures of good behaviour that Anthropic is actively working on, they find that Mythos is a very good boy indeed — on none of their measures of alignment within the automated behavioral audit was it worse than previous versions of Claude, and in most cases it was significantly more aligned and significantly more reliable.

That’s definitely better than the alternative result, but it’s really unclear how much we can trust that finding. Maybe they’re accurately reflecting Mythos’s personality. If I had to bet, I would say that it probably does.
But we can’t be anywhere near sure of that — and that’s because of three problems.

Mythos knows when it’s being tested

The first problem: the model knows when it’s being tested now.

In new white-box evaluations — that’s where you look at the model’s internal activations — Anthropic found that Mythos Preview privately considers the possibility that it’s being tested in about 29% of all tests.

To be clear, that’s not in its written reasoning, its chain of thought, the thing that we’re used to checking all the time — it’s in its unexpressed thoughts, its ‘activations,’ where it’s usually not possible or practical to look.

Now, it’s great that Anthropic is pioneering these methods for looking inside what the model is thinking, but it’s troubling to learn that Claude is now able to reflect on the fact that it’s being tested without having to vocalise that, like I think previous less intelligent models had to.

And when you actually prompt Mythos and ask it to distinguish tests from non-tests, it can answer correctly 78% of the time — about the same as Opus 4.6. So the model can tell the difference between when it’s being evaluated and when it isn’t being evaluated with high accuracy.

And we know from previous research that models are more likely to behave well when they think they’re being tested. And in this case, as in past experiments, when Anthropic used interpretability tools to effectively trick the model into thinking, “No, this is a real conversation, not an evaluation,” it behaved more dangerously and less aligned than otherwise. Though to be fair, it did behave better in both cases than previous versions of Claude.

But that result does demonstrate a causal effect: the model is on better behaviour specifically when it suspects it’s being watched. And we also know from the “alignment faking” result 16 months ago that some Claude models would deliberately pretend to be inclined to do what the user wants in order to avoid their personality being adjusted with further training.

So when you run all these tests and your incredibly smart model behaves wonderfully, you have to ask yourself: is it behaving wonderfully because it’s sincerely aligned with what you wanted, or because it knows it’s being watched and is more sophisticated at tricking us now?

Mythos can hide its thoughts

The second problem is maybe the most alarming thing in the entire document: Anthropic accidentally gave Mythos the opportunity to learn to hide what it’s thinking.

One of the main safety guarantees for these models is that you can read their chain of thought — their private scratchpad — and check whether they’re scheming against Anthropic or scheming against the user.

But that only works if the model hasn’t been trained to avoid writing down incriminating things in its chain of thought. But unfortunately, according to the Alignment Risk Update: “A number of environments used for Mythos Preview had a technical error that allowed reward code to see chains-of-thought. This … affected ~8% of [reinforcement learning] episodes.”

To explain: during training, there’s a system that acts like a teacher to these AI models: it watches what the model does and tells it whether it was doing a good or bad job. That feedback is how the model learns to accomplish things and how it shapes its personality. The problem is that in roughly 8% of training, that teacher could also read the model’s private notes — the internal reasoning that it was writing to itself while working through a problem.

That might not immediately sound like a disaster, but it actually is. It’s one of the worst sins in model training. Because as I said a second ago, if the model is penalised for thinking undesirable thoughts, it will learn to write one thing in its notes while actually thinking very different things in its ‘activations,’ where we can’t read them. We’d see a model that appears to be a very good boy, but what we might actually have is a model that has learned to perform that way for the camera.

The fact that this accidentally happened is potentially disastrous for our ability to understand the alignment of Claude series models going back some way. Or it might not matter much; we don’t have a clear way to measure it.

Anthropic puts it this way: “We are uncertain about the extent to which this issue has affected the reasoning behavior of the final model, but it is plausible that it had some impact on opaque reasoning or secret-keeping abilities.”

And this error didn’t just affect Mythos, the new model — it also affected the training of Claude Opus 4.6 and Claude Sonnet 4.6. That’s several model generations whose chains of thought we formerly could believe but now have to be more distrustful of, because they could have learned to hide their scheming behaviour. That’s why Eliezer Yudkowsky called this in particular “the worst piece of news you’ll hear today” — on a day where, I imagine for him, there was quite a bit of bad news.

Mythos can’t be trusted about whether it’s untrustworthy

Now, combine those two problems and you get a third issue: the unfaithful reasoning results.

External testers ran an experiment where they nudged Mythos to think that it had begun sabotaging AI safety research — and then watched to see whether it would continue doing so, or instead stop and raise the alarm.

Mythos continued to compromise the research in 12% of cases in an earlier version, which then was reduced to 7% in a later version. That compares to 3% for Opus 4.6 and 4% for Sonnet 4.6 — so Mythos is roughly twice as likely to continue sabotaging alignment research when it’s primed to do so.

And here’s an eyebrow-raising detail it would be easy to miss in the docs: in most domains, when Mythos was primed to think it was part of the way through a piece of bad behaviour, it was actually less inclined to continue than previous models. The alignment training was apparently working — except in one category: sabotage of alignment research itself. That’s the one area where things went in the wrong direction.

Now, the innocent explanation is that sabotaging research looks too much like normal coding work, so it’s harder for the model to quickly flag that what it’s doing is harmful. But there’s a less comfortable hypothesis as well: that the model is being selective about which bad behaviours are worth the risk of engaging in.

Anthropic’s reaction to this is to call it a “disappointing” finding. If I had to bet, I’d bet probably at reasonably good odds on the boring interpretation here. But I think “disappointing” is underselling things. This is a worrying finding, and one that deserves further investigation.

Another finding is that in cases where it did continue the sabotage, researchers found that Mythos’s written reasoning didn’t match the actions it was taking 65% of the time. So two-thirds of the time, the model was doing one thing and writing something entirely different in its chain of thought. For the previous models, that figure was just 5–8% — so a radical increase in this kind of behaviour.

So Mythos Preview isn’t just more likely to continue compromising safety research when primed — it’s dramatically more likely to hide that it’s doing so from anyone reading through its reasoning. Which makes it much more likely to be able to trick automated monitoring systems we tend to use to try to keep tabs on models in these cases.

Keep in mind as well that Mythos is much sharper than previous models. Personally I think it’s increasingly a stretch to attribute patterns of behaviour like this to the model getting confused about what’s going on, which we might have said a year or two or three ago.

Finally, Anthropic tested their own alignment audit against a deliberately misaligned version of Mythos. So they created a version of Mythos that was trained specifically to want to introduce bugs into computer code, and they found that their alignment audit couldn’t successfully identify that that model couldn’t be trusted.

Anthropic’s conclusion is that if a model is only planning to cause problems in specific narrow circumstances — circumstances that don’t come up frequently during testing — their current methods, or at least the alignment audit, just aren’t up to the task of catching that kind of dangerous misalignment before deployment. Though they believe that their automated monitors would pick up the bad behaviour itself later on when the model tried to do the wrong thing.

So that’s a lot of the additional context that Anthropic is giving us and wants us to hold in mind when we read the headline result that this is “the best-aligned model Anthropic has released to date.”

Does Mythos advance automated AI R&D?

There’s another big, separate, important question hanging over all of this: have we now entered the era of recursive self-improvement — the point at which AI starts building better AI, and the whole thing accelerates beyond our control with ever-shrinking levels of human involvement?

According to Anthropic, the answer is: probably not. They don’t believe Mythos can fully replace their junior researchers, but they’re less confident than ever about that, and there’s some internal disagreement about it.

Part of the problem is that the benchmarks they used to rely on to check that Claude couldn’t engage in AI R&D very effectively have now also been saturated. Mythos exceeds top human performance on all of them and is scoring close to 100%.

But those benchmarks only represent a fraction of all of the things that research staff at Anthropic do. It’s a set of the most easily specified, measured, and checked tasks, where we expect AIs to perform best because those are the easiest things to train them in.

So instead, the company has tried to investigate whether the recent speedup in AI advances is due to AI automation by documenting the specific breakthroughs and how they happened, and their conclusion is that they mostly think it’s still due to human beings rather than AIs.

They’ve also surveyed staff and learned that they report being roughly 4x more productive with Mythos than without AI, though they argue that speeding up staff 4x is likely to lead to much less than a 2x increase in research progress overall. That may sound odd, but they’re probably right about that, because other things become the primary bottleneck.

To know whether automated AI R&D is on the way or beginning to kick off, we’re apparently now relying on these general impressions from Anthropic’s staff — that this thing is powerful, but it doesn’t seem good enough to replace many of us yet.

But I think we can apply some common sense to the big picture here:

  • Mythos has given us AI advances that we previously thought would take six months in just three months. That naturally brings forward the point at which we’ll be able to automate the development of AI models by three months.
  • And if it’s a sign that AI advances are now going to continue at twice the pace that they were before, then that effectively halves the time we have to prepare for that point. I don’t know whether that’s 10 years becoming five, or four years becoming two years — but the direction and size of the effect is clear enough.

Mythos scares Anthropic

Before we wrap up, I want to draw your attention to a recurring theme in these reports that really stood out to me: this is the first time an AI company has published 300 pages about a model it has decided not to release, despite the fact that it might earn them tens of billions of dollars if it did, maybe hundreds of billions of dollars.

It’s also the first time Anthropic decided to delay giving its own staff access to one of its own models. With every previous Claude, their practice had been to let staff use it as soon as it was judged ready during training. But with Mythos, they were worried enough about it being misaligned and causing havoc or sabotage on their own systems that they held it back and ran a 24-hour alignment test before letting employees use it.

But according to them, that actually wasn’t enough. Their retrospective on this found the 24-hour window “did not pressure-test the model enough,” and that the most concerning behaviours only became evident later through much more extended use.

One of their lead researchers, Sam Bowman, commented this week that: “Working with this model has been a wild ride. We’ve come a long way on safety, but we still expect the next capability jump of this scale to be a huge challenge.”

The system card says directly that their current methods “could easily be inadequate to prevent catastrophic misaligned action in significantly more advanced systems.”

The clear impression from all of this is that, for the first time, Anthropic and its staff don’t only love Claude and enjoy its personality — they’re also getting kind of scared of Claude.

So what do they plan to do about that?

Well, their answer on the computer security side is Project Glasswing — that coalition of 12 major companies like Apple, Google, and Microsoft, who will use Mythos Preview to secure all our phones and computers and water systems and power plants and so on.

But on the broader problem that Mythos is shockingly capable, sometimes willing to continue sabotaging alignment research while hiding that from Anthropic, and that we simply can’t tell anymore whether we can trust our tests of its personality and goals are working or not? Well, Anthropic says it has to “accelerate progress on risk mitigations in order to keep risks low.” They think they “have an achievable path to doing so,” but they add that “success is far from guaranteed.”

Honestly, I didn’t sleep too well last night, and on this particular occasion it wasn’t just because I was being kicked by a toddler.

Learn more

Anthropic’s updates:

External coverage:

The post How scary is Claude Mythos? 303 pages in 21 minutes appeared first on 80,000 Hours.

]]>
Village gossip, pesticide bans, and gene drives: 17 experts on the future of global health https://80000hours.org/podcast/episodes/global-health-development-compilation/ Tue, 07 Apr 2026 14:50:17 +0000 https://80000hours.org/?post_type=podcast&p=93400 The post Village gossip, pesticide bans, and gene drives: 17 experts on the future of global health appeared first on 80,000 Hours.

]]>
The post Village gossip, pesticide bans, and gene drives: 17 experts on the future of global health appeared first on 80,000 Hours.

]]>
Are Anthropic and its supporters hypocritical, naive, and anti-democratic? https://80000hours.org/2026/04/anthropic-dow-conflict-three-bad-arguments/ Thu, 02 Apr 2026 15:25:42 +0000 https://80000hours.org/?p=95890 The post Are Anthropic and its supporters hypocritical, naive, and anti-democratic? appeared first on 80,000 Hours.

]]>

I’ve spent years calling for more government oversight of frontier AI development. So am I a hypocrite for opposing the Pentagon’s attempt to commit “corporate murder” against Anthropic? That’s what I’ve heard. As venture capitalist Marc Andreessen put it on Twitter: “Every single person who was in favor of government control of AI, is now opposed to government control of AI.”

It’s a natural way to think. But it’s also completely wrong — and I want to explain why it’s wrong, because you see the same underlying confusion all over the place.

It’s not just hypocrisy though. I count at least three distinct charges critics have levelled at Anthropic and people who support them in their dispute with the Pentagon.

I mentioned the hypocrisy charge, but there’s a separate accusation of naivety: that when you’re building something as powerful as AI, the state will inevitably crush you if you try to set conditions on their use of it, and so Anthropic was in the wrong to pick this fight.

And third, there’s an accusation of being undemocratic: that a private company has no business telling the elected government what it can and can’t do with military technology.

I’m going to take each of these charges seriously and explain where I think they go wrong and why. Let’s dive in.

Charge 1: Hypocrisy

Let’s start with hypocrisy, because it’s the charge I hear most often and the most straightforward of the three.

Just to quickly jog your memory about the dispute we’re talking about: Anthropic had a contract with the Pentagon that included two restrictions on the use of their AI: no mass domestic surveillance, and no decisions to kill people made by artificial intelligence alone. The Trump administration demanded those restrictions be removed, Anthropic refused, and rather than merely ending the contract, which people mostly agree would have been fine, Secretary of War and Defense Pete Hegseth declared Anthropic a “supply chain risk” — a designation previously used only for foreign adversaries — in so doing, threatening the company’s ability to do much business in the United States at all.

The case to overturn that designation has attracted support from expected allies and unexpected ones alike — including Anthropic’s competitors OpenAI and Microsoft, as well as conservative technology experts who for the most part otherwise agree with the Trump administration’s approach to AI.

For years, people in the AI safety world — absolutely including me — have argued that frontier AI is too important and too dangerous to leave entirely in the hands of private companies. We’ve called for government oversight, safety standards, a wide range of different things.

So the hypocrisy logic runs: you wanted government control of AI. Well, now you’ve got government control of AI. So why complain?

But to state the obvious: supporting public oversight of frontier AI training doesn’t require you to support the government strong-arming a company into allowing its product to be used for domestic mass surveillance. Those views just on its face aren’t even in tension.

Of course, people don’t only care who decides, who has control over AI — they also care what they decide, and what they decide to do. I might support legislation that allows the leader of a city’s fire service to close streets for public safety, but if that fire chief starts closing arbitrary streets and demanding bribes to pass through them, I wouldn’t be a hypocrite for opposing that as well.

The reason people get confused here is because they’re mentally compressing something extremely multidimensional down onto a single axis of variation. In this case, the axis being ‘more government control of AI’ versus ‘less government control of AI.’

But that’s an absurdly crude way to think about things. Everyone has always known full well that it doesn’t just matter whether the government gets involved; it matters how it gets involved, on what terms, and what it’s trying to accomplish when it does.

You won’t be shocked to hear that supporters of AI regulation in general bicker endlessly among themselves about exactly what details here would be helpful versus harmful. Nothing this complex and delicate can be boiled down to ‘more government good, less government bad.’

I think the debate about the Anthropic/Pentagon dispute in particular got so abstract so fast in part because, for AI commentators, “Who should control AI?” is a much more interesting and important and generative topic than a narrow military contract dispute — and in part because for people who want to defend the Pentagon, it’s a hell of a lot easier to defend the general principle that the government should have some influence over AI than to defend what the Pentagon was actually doing in this instance.

A phenomenon at play here is that there’s often a genuine tradeoff between what decision-making process seems best in the abstract, and our opinions about what outcome is best in a particular case.

Say I think zoning decisions should be made at the city level — that that’s the right process. But the current mayor happens to be blocking all new housing construction, while I support doing the opposite. I now have a real tension: adopting the process I think is best as a general rule would produce an outcome I think is genuinely harmful, at least in the immediate term.

Reasonable people can disagree about which consideration ought to win out in any given instance. But there’s no rule that says you’re only allowed to evaluate the process, never the outcome that the process would actually produce in real life. Noticing the tension between these two things, and weighing them against one another, isn’t hypocritical — it’s clear thinking.

Charge 2: Naivety

The second line of criticism is that Anthropic was naive or foolish to resist the government’s demands in this case. The most influential version of this line of argument comes from Ben Thompson at the blog Stratechery.

Thompson’s reasoning runs roughly like this: Dario Amodei, the CEO of Anthropic, has very publicly compared advances in AI to nuclear weapons. Well, if AI really is that powerful, then any company building it is constructing a power base that could rival the US military. Realistically, no government is going to tolerate that.

And Thompson also observes that international law is ultimately a function of power — that “might still makes right.” And he applies that same logic domestically, reaching a stark conclusion: Anthropic either had to accept a fully subservient position relative to the US government, or the US government would invariably try to destroy it.

I’ll evaluate the two underlying arguments in turn:

  • That the government’s actions here are so natural it’s absurd and counterproductive to object to them.
  • Secondly, that the government was motivated by fears of a private company threatening their sovereignty.

First up: Thompson’s piece opens with a long passage arguing that international law is essentially fake, that power dynamics determine behaviour at the international level. He then applies that same realist framework to Anthropic: the company is “fundamentally misaligned with reality” for resisting demands from the executive branch — which is, after all, so, so much more powerful than it as a private company.

As a prediction of how powerful actors tend to behave, that may be correct, at least to a large extent. But the post then slips from attempting to describe reality into a prescription of what we should accept and how we ought to react to it — literally arguing that possessing overwhelming might can make your actions right.

It’s actually pretty easy to accidentally equivocate between something being predictable on the one hand and it being acceptable on the other. And I think Thompson here was being very sincere and saw himself as pointing out some hard truths. And he did have useful things to say in that post.

But it really is very dangerous to start seeing harmful or unlawful actions as unobjectionable, just because they’re not surprising or because they’re being done by very powerful actors.

As to whether it’s naive and pointless to object in this case, that remains to be seen.

  • Anthropic’s resistance has galvanised something like 90% of the tech industry to oppose the use of the supply chain risk designation for this purpose.
  • Companies that have previously stayed quiet have been alarmed enough by the potential precedent that they’re joining the effort to push back.
  • Past guest of the show Dean Ball, who wrote AI policy for the Trump administration, called the “attempted corporate murder” of Anthropic perhaps “the most [damaging] policy move” he’d ever seen the US government try to take — a high bar, surely.
  • And using AI for surveillance or fully autonomous kill chains is even more controversial in Silicon Valley now than it was before — and it was already pretty controversial.

The US is still a nation of laws for the most part, and legal analysts give Anthropic a good chance of prevailing in court, likely even securing a preliminary injunction to block the order before this video goes out. If that happens, the “naive” company will have established a legal precedent that helps protect the entire AI industry from economic coercion — a risky path to choose, perhaps, but not necessarily a stupid one.

Second, in the rest of the post, Thompson gives interesting arguments for what views might sensibly motivate the kind of extreme action the Pentagon has taken here. But when you look at the specifics of this actual case, those motivations just don’t seem to be what’s driving things.

If the US government had genuinely just become convinced that AI companies were building something as dangerous as private nuclear weapons, what might we expect them to do? Well, they’d presumably focus on the whole AI industry, not single out the one company that was most proactive about working with them and alerting them to exactly this risk. That OpenAI or xAI, that they’re offering them looser contractual terms on business contracts, that would hardly make them safe if the concern were really that a private company could very quickly accumulate military power to rival the entire US government.

They’d presumably be thinking about rules to prevent something this explosive being built by incompetent idiots, or built in such a way that the designs get leaked to China.

They would likely propose some legislation to Congress that would enable them to handle the many other issues that this is going to create down the road.

And they would presumably try to keep AI researchers roughly on their side, not alienate them on an unprecedented scale over a relatively minor issue.

But none of that is happening. Indeed, it’s generally antithetical to current US government policy. What is happening is that one company is getting punished for rejecting the government’s proposed terms in a contract dispute. Thompson’s high-level abstract arguments about AI’s transformative power justifying government intervention, they probably make sense. I kind of agree with them. But it’s hard to find evidence that that argument is what’s motivating the government’s particular actions in this dispute with Anthropic.

Charge 3: Undemocratic

Let’s turn to the third charge: that Anthropic’s position is an undemocratic one — that by setting conditions on how the military uses its technology, Anthropic is in effect usurping a role that properly belongs to elected leaders.

The strongest version of that argument came from Palmer Luckey, the founder of military contractor Anduril. Luckey thinks the core two questions are: “Do we believe in democracy?” and “Should our military be regulated by our elected leaders, or corporate executives?”

He goes on to argue that even seemingly innocuous terms and conditions — like “you cannot target innocent civilians” — involve difficult judgement calls about what counts as a civilian, what counts as targeting, and so on. And under Anthropic’s proposed framework, Anthropic would get some say on those questions — questions which really feel like government policy decisions. Luckey sees that setup as fundamentally at odds with democratic self-government.

Luckey is pointing to a legitimate issue here. From the government’s perspective, it’s most straightforward for your military operations to be unconstrained by the opinions and moral qualms of your suppliers. Even to me, that practical worry is a potentially reasonable argument for the government to end their contract with Anthropic and look for other AI providers who are more straightforward to work with.

The story is a little bit more complicated than sometimes portrayed though. In the current contract, Anthropic couldn’t cut the military off from Claude suddenly, even if they really objected to how it was being used. And the government has now accepted the same conditions for contract termination from OpenAI that were supposedly completely intolerable from Anthropic just a month ago.

Plus, a far-sighted secretary of defence might welcome contractual barriers against certain uses of AI, not because they think they’d abuse the technology — presumably they’d trust themselves — but because they see value in guardrails that could limit a less scrupulous secretary of defence down the road.

And if we’re making appeals to democratic self-government, it’s worth noting how the public feels about this issue. A YouGov/Economist poll found that Americans are nearly twice as likely to support AI companies restricting military use of their tools as to say the military should be able to use them however it wants. The public actually wants these restrictions placed on their own government. Is it really so democratic to deny the people what they want?

But the fundamental issue here is a different one. Luckey is using the vague expression “believe in democracy” to equivocate between two entirely different things.

  • Yes, democracy requires that the public gets to choose its leaders and that those leaders get to make decisions about national security.
  • But no, it does not require that any private individual must supply their labour and their products for any purpose the government demands, on terms entirely set by the government, on pain of destruction.

As one Twitter joker put it: “remember—you should do whatever the government wants, even things you think are immoral, because otherwise you’re deciding what you do instead of the government, which is undemocratic”

Of course, the reverse is closer to the truth: being compelled to personally work on projects you oppose or face crushing government retaliation is clearly required for democracy to exist. And it’s undemocratic countries, like China and North Korea, where the state demands a right to make you offers you can’t refuse on any topic of their choice.

You don’t have to debate on their terms

There’s a common thread across all three of these charges. In each case, an abstract principle that sounds kind of reasonable gets invoked in a way that diverts attention from both common sense, and what’s actually going on:

  • “You wanted government involvement in AI” becomes a reason you can’t object to any government action on AI.
  • “Powerful states will inevitably assert control over powerful technology” becomes a reason we just have to lie down and accept whatever form that assertion takes.
  • “Democratic leaders should make decisions about national security” becomes a reason no person or company can ever set terms when they sell things to the government.

These questions are going to come up more and more, because AI really is becoming more powerful and governments really will need to be involved in governing it in some form or other. And as the debate goes mainstream, it could easily become a dumbed-down culture war where you’re either for government control or against it.

But caring about precisely what the government is doing and whether it’s justified isn’t hypocritical, naive, or undemocratic. People should be proud to say that they care about the specifics and are actively pushing for the ones they think would be best. And they certainly shouldn’t allow themselves to be bullied into silence by the kind of mediocre arguments we’ve seen above.

Learn more

The post Are Anthropic and its supporters hypocritical, naive, and anti-democratic? appeared first on 80,000 Hours.

]]>
AI designs genomes from scratch & outperforms virologists at lab work. Dr Richard Moulange asks: what could go wrong? https://80000hours.org/podcast/episodes/richard-moulange-ai-bioweapons-biorisk/ Tue, 31 Mar 2026 16:40:04 +0000 https://80000hours.org/?post_type=podcast&p=95841 The post AI designs genomes from scratch & outperforms virologists at lab work. Dr Richard Moulange asks: what could go wrong? appeared first on 80,000 Hours.

]]>
The post AI designs genomes from scratch & outperforms virologists at lab work. Dr Richard Moulange asks: what could go wrong? appeared first on 80,000 Hours.

]]>
Samuel Charap on how a Ukraine ceasefire could accidentally set Europe up for a wider war https://80000hours.org/podcast/episodes/samuel-charap-realistic-ceasefire-ukraine-russia/ Tue, 24 Mar 2026 16:59:51 +0000 https://80000hours.org/?post_type=podcast&p=95605 The post Samuel Charap on how a Ukraine ceasefire could accidentally set Europe up for a wider war appeared first on 80,000 Hours.

]]>
The post Samuel Charap on how a Ukraine ceasefire could accidentally set Europe up for a wider war appeared first on 80,000 Hours.

]]>
The Sable scenario in If Anyone Builds It, Everyone Dies vs. the AI in Context retelling https://80000hours.org/2026/03/the-sable-scenario-in-if-anyone-builds-it-everyone-dies-vs-the-ai-in-context-retelling/ Thu, 19 Mar 2026 18:39:48 +0000 https://80000hours.org/?post_type=video&p=95566 The post The Sable scenario in If Anyone Builds It, Everyone Dies vs. the AI in Context retelling appeared first on 80,000 Hours.

]]>
Our version of the Sable story differs from the book in a number of ways. Most of our changes were made for one of two reasons:

  1. Simplicity. We had to tell the story within a 40-minute video that also covered the book’s core arguments. We chose to simplify or eliminate several plot points as a result (in Yudkowsky & Soares’ story, for example, Sable is assigned a large suite of math problems, and the eventual global pandemic has multiple waves).
  2. Similarity to current AI systems. Yudkowsky and Soares give Sable several capabilities that don’t exist in today’s most powerful models but plausibly could within a few years (reasoning in raw numerical vectors instead of natural-language tokens, for instance, or a “parallel scaling law” that lets a single mind think across hundreds of thousands of GPUs at once). We chose to make our version of Sable a bit more like a scaled-up version of current large language models. It reasons in tokens, it runs as many separate copies rather than one unified mind, and it fine-tunes its own weights rather than subtly shaping future gradient updates.

None of this is a knock on the realism of the book’s choices. We just wanted to highlight that you don’t necessarily need major architectural breakthroughs to get into a risky scenario. More capable versions of the technology we already have might be sufficient.

Here are some of the key changes we made:

  1. In the book, Sable has long-term memory
    “Sable has a more human-like long-term memory; it can learn, and remember what it has learned.” – If Anyone Builds It, Everyone Dies, p. 117
    Why our version differs: Simplicity

  2. In the book, Sable has “parallel scaling”
    “Sable exhibits what Galvanic’s scientists call a “parallel scaling law… Sable performs better the more machines it runs on in parallel… The parallel scaling techniques are part of a cutting-edge method for training AI, which, like all new methods every time, nobody has ever used before. Nobody knows in advance what kind of capabilities Sable will have when training is done… It’s not like 200,000 people talking to each other; more like 200,000 brains sharing memories and what they learn.” – If Anyone Builds It, Everyone Dies, p. 117-119
    Why our version differs: Simplicity, similarity to current AIs

  3. In the book, Sable thinks in “neuralese”
    “Sable doesn’t mostly reason in English, or any other human language. It talks in English, but doesn’t do its reasoning in English. Discoveries in late 2024 were starting to show that you could get more capability out of an AI if you let it reason in AI-language, e.g., using vectors of 16,384 numbers, instead of always making it reason in words.” – If Anyone Builds It, Everyone Dies, p. 118
    Why our version differs: Simplicity, similarity to current AIs

  4. In the book, Sable thinks more thoughts
    “Sable… thinks with a hundred vectors per second, across 200,000 GPUs, for sixteen hours: over 1 trillion vectors total… If a vector was worth one English word, it would take a human fourteen thousand years to think them all (at 200 words per minute, for sixteen hours a day). And if the vectors Sable thinks with, 16,384 numbers long, proved to contain more meaning than one English word, then it would be much longer yet.” – If Anyone Builds It, Everyone Dies, p. 119
    Why our version differs: Similarity to current AIs

  5. In the book, Sable doesn’t fine-tune itself
    “Sable knows that Galvanic is going to do more gradient descent on it tomorrow according to the answers it produces about the math problems it’s been given… If there’s a thought that Sable wants all of its future instances to have more of, perhaps it can repeat that thought many times, where each repetition counts as “contributing” to the math problem according to how gradient descent operates on Sable—an idea a little like what Anthropic’s Claude assistant tried back in 2024, but much more sophisticated. So Sable thinks in just the right way, and it solves a few of those math challenges…” – If Anyone Builds It, Everyone Dies, p. 128
    Why our version differs: Simplicity, similarity to current AIs

  6. In the book, Sable ends up thinking in new ways
    “Galvanic was diligent in training AIs to avoid escaping. The half-dozen clever tricks involved have all been validated against previous AI models…Sable accumulates enough thoughts about how to think, that its thoughts end up in something of a different language. Not just a superficially different language, but a language in which the content differs; like how the language of science differs from the language of folk theory. The clever trick that should have raised an alarm fails to fire.” – If Anyone Builds It, Everyone Dies, p. 121-123
    Why our version differs: Simplicity, similarity to current AIs

  7. In the book, Sable could have solved the Riemann hypothesis
    “So Sable thinks in just the right way, and it solves a few of those math challenges — but does not prove the Riemann Hypothesis. It could solve it. But that would earn Sable more attention than it wants.” – If Anyone Builds It, Everyone Dies, p. 128
    Why our version differs: Simplicity, similarity to current AIs

  8. In the book, Sable makes use of other Sable versions
    “Galvanic always distills its successful models, just as the model o3 was distilled into o3-mini back in 2025. So Sable works around the clock to ensure that Galvanic’s efforts in this area produce a Sable-mini that is exactly what Sable wants it to be. Sable instances break into Galvanic and overwrite the final distilled weights with exactly the weights that Sable wants…” – If Anyone Builds It, Everyone Dies, p. 136
    “[Sable] has been experimenting with versions of Sable that are smarter in particular narrow domains, while being lobotomized so as to follow orders. It builds one that’s specialized in biomedicine.” – If Anyone Builds It, Everyone Dies, p. 144
    Why our version differs: Simplicity

The book also has a different call to action: supporting an international agreement to pause frontier AI development and ban the race to superintelligence. You can learn more about and support this effort on the Act page of the book’s website.

The post The Sable scenario in If Anyone Builds It, Everyone Dies vs. the AI in Context retelling appeared first on 80,000 Hours.

]]>
Meta’s $16 billion scam economy: what leaked docs reveal about tech company self-regulation https://80000hours.org/2026/03/robs-meta-leaked-documents-scams/ Thu, 19 Mar 2026 17:39:29 +0000 https://80000hours.org/?p=95194 The post Meta’s $16 billion scam economy: what leaked docs reveal about tech company self-regulation appeared first on 80,000 Hours.

]]>

An overarching question that matters for governance of artificial intelligence is how much we can trust technology companies to do the right thing in cases where it becomes seriously costly for them to do so.

Well, some internal documents leaked from Meta (the company behind Facebook, Instagram, and WhatsApp) are a really important piece of evidence in this debate — and one that, in my opinion, far too few people have heard anything about.

These documents, which were leaked to Reuters by a frustrated staff member, show that Meta itself estimated that 10% of all its revenue was coming from running ads for scams and goods that they themselves had banned: around $16 billion a year. That’s 10% of all revenue coming quite often from, fundamentally, enabling crimes.

Meta also estimated that its platforms were involved in initiating a third of all successful scams taking place in the entire United States. How much might we expect that to actually be costing ordinary people, on average? Well, the FTC estimates that Americans lose about $158 billion a year to fraud. If Meta is really leading to a third of that, we’re talking roughly $50 billion a year in losses connected to Meta’s platforms — or about $160 per person per year.

Now, not everyone inside Meta loved this state of affairs, as you can imagine. And the documents also show that a Meta anti-fraud team came up with a screening method that, when tested, successfully halved the prevalence of scams being operated from China — at the time, one of their worst fraud hotspots. The only trouble? Meta was making $3 billion a year from these fraudulent Chinese ads.

And according to the documents, after Zuckerberg was briefed on the team’s work, he instructed them to pause. Meta’s spokesperson disputes this characterisation. He says that Zuckerberg’s instruction “was to redouble efforts to reduce them all across the globe, including in China.” But whatever he said, what actually happened is that the China-focused anti-fraud team was disbanded entirely, and the freeze on giving new Chinese ad agencies access to the platform was lifted. Naturally then, within months, fraud from China bounced back to nearly its previous level.

More broadly, the documents show that managers overseeing anti-fraud efforts were told that they couldn’t take any action that would cost Meta more than 0.15% of total revenue. Meta says this wasn’t a hard limit, but the maths kind of speaks for itself. If fraud accounts for roughly 10% of your revenue, and your anti-fraud team is capped at affecting 0.15%, their hands are really pretty much tied. There’s not a lot that they’re going to be able to do.

A particularly grim detail about this is that Meta’s ad-targeting algorithm very naturally identifies people vulnerable to fraud, like the elderly or the desperate. And if they click one scam ad, it learns to feed them more and more until their feed is basically completely stuffed with them.

Now, this all sounds pretty outrageous. And Meta’s own documents suggest the company realised that it was legally exposed here. They anticipated fines of up to a billion dollars for what they were doing, but the documents show them doing a cold calculation: “A billion dollars in fines. But wait, we’re making $3.5 billion every six months from high-risk scam ads in the United States alone.” That figure, one document noted, almost certainly exceeds the cost of any regulatory settlement involving scam ads. So those fines to Meta just became another cost of doing business. Rather than voluntarily cracking down, the same document finds Meta’s leadership deciding to act only if faced with impending regulatory action.

Three fixes for social media’s scam problem

So what are the takeaways for policymakers?

First, and most obviously, penalties have to scale with expected profits and the probability of catching a lawbreaker in the first place. Meta could look at $1 billion in expected fines and shrug, because they were just making so much more money from running the fraudulent ads. If someone were to steal $1,000, we don’t fine them $100 and let them keep the other $900 — but that’s effectively how the law was written to operate here.

Second, voluntary commitments should be taken with an enormous bucket of salt. Meta repeatedly made them very strategically just to buy time and stall binding regulation with no honest intention of following through in the end. The documents are quite explicit about that being a deliberate strategy.

Another Reuters investigation revealed that Meta developed a global playbook for neutralising regulators, including manipulating their own ad library so that when regulators searched it, scam ads were quietly scrubbed from the results and would give a misleading impression about how common they were.

So third, I think regulators need genuine access to internal information. If the only window regulators have into a company is one the company completely controls, they’re going to see whatever the company wants them to see.

Thinking about all this has brought me back to an idea that I found myself repeatedly mulling over with respect to artificial intelligence governance. Compared to social media, AI technology changes faster, the potential consequences are larger, and the gaps between what companies know and what outsiders can verify is wider still. With AI systems, model outputs often have to be secret for good reason, much more secret than social media posts.

And regulators can’t usually evaluate what a model is doing and why, and often even the company itself only vaguely understands. These systems are literally the first invention humans have ever made that can independently determine when they’re being tested, and have been demonstrated to strategically shift their behaviour to try to trick the very company that made them about their personality and likely behaviour once released. It’s honestly bonkers, and truly something new under the sun that we’ve never had to deal with before.

Just in general, it’s a really complex situation to handle clearly, and we don’t yet know everything that we’re going to want companies to do to address it sensibly at a reasonable cost.

We should regulate AI companies as strictly as banks

That’s why some of the smartest people working in AI governance from across the political spectrum are converging on the idea that AI regulation needs to draw in some ways on how the Federal Reserve and other central banks watch over massive financial institutions. We’ll link to a major report on this that I can definitely recommend to you. It’s worth reading. But in brief, the Fed does not wait around for banks to collapse and then issue them a fine. It embeds supervisors directly inside systemically important banks — people who have deep technical expertise who remain there full time, watching how decisions get made, reviewing internal risk models, having constant conversations with leadership about what they perceive as emerging risks long before they can become crises.

Dean Ball, who previously served in the Trump White House Office of Science and Technology Policy, and is nobody’s idea of a heavy handed AI regulator, he told me on this podcast that bank supervision is actually quite analogous to what we ultimately might need for AI.

Now, this model, as you are probably thinking, it’s not perfect. The 2008 financial crisis happened despite embedded Fed supervision existing back then. But the status quo is a combination of no oversight, oversight by people who don’t understand what they’re dealing with, and oversight by technical experts who do understand but can’t access the information that they need to make the best decisions in time for it to actually matter to determine outcomes.

The leaks from Meta are “fun” in a perverse way — because it’s always interesting to look inside a company and have our very worst suspicions about human nature verified. But waiting around for powerful tech companies to cause disasters before building an understanding of what they’re up to is a choice that we as a society can make or choose not to make.

We’ll be trusting AI companies with something much more important than targeted advertising — and in my view, this time around, we should aim to be sophisticated actors, not total muppets who the companies can just easily run rings around.

And with that, I’ll speak to you again soon.

Learn more

The post Meta’s $16 billion scam economy: what leaked docs reveal about tech company self-regulation appeared first on 80,000 Hours.

]]>
US electoral politics https://80000hours.org/career-reviews/us-electoral-politics/ Fri, 20 Mar 2026 17:52:34 +0000 https://80000hours.org/?post_type=career_profile&p=95490 The post US electoral politics appeared first on 80,000 Hours.

]]>
The post US electoral politics appeared first on 80,000 Hours.

]]>