Paper

Concrete Problems in AI Safety

·arXiv: 1606.06565

Concrete Problems in AI Safety (arXiv:1606.06565) was published in June 2016 by Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, and Dan Mané. It identified reward hacking as one of five central concrete problems in AI safety, and set out vocabulary still in use almost a decade later. Emergent Goals invokes it as the direct antecedent of the mesa-optimization concept and as the starting point for the analysis of the European product-liability regime applied to systems with objectives emergent from training.

Authors

Essays referencing this