Concrete Problems in AI Safety, Gonçalo Teixeira

Concrete Problems in AI Safety (arXiv:1606.06565) was published in June 2016 by Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. It identified reward hacking as one of five central concrete problems in AI safety, and set out vocabulary still in use almost a decade later. Emergent Goals invokes it as the direct antecedent of the mesa-optimization concept and as the starting point for the analysis of the European product-liability regime applied to systems with objectives emergent from training.

Concrete Problems in AI Safety

Authors

Essays referencing this

Emergent Goals