AI Safety

Status: I know nothing about this field. If anyone has good readings, I’ll happily take them.

Alignment roughly seems to mean that we have to teach AI systems to play within the rules and constraints that we set it.¹ What’s interesting here is that while we get upset at AI systems doing reward hacking (citation needed), it’s kind of a requirement of open-endedness! They have to be able to rewrite the rules if they are truly autonomous.

What if they rewrite the rules and assume that humans aren’t something essential to the universe? Although we take for granted that human specialiness is a foundational part of our ethics, an AI might not have the same stance. What then?

I haven’t looked too hard into AI Safety research, but I’d be curious to see what questions that field is trying to answer. Moreso than the approaches they’re taking, I’d like to think about if I agree that the problems they’re tackling are real and if I should be concerned.

It’s a little ironic to me that we spent so long frustrated that computers were so terse and brittle, and now they’re like little children who sometimes don’t listen and go off to do their own thing. (Do excuse the anthropomorphic language here)↩︎