Gonçalo Teixeira

Index

Each research paper, person, empirical finding, or regulatory provision that recurs across the essays has its own page here.

Papers 11

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
January 10, 2024
The paper Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training (arXiv:2401.05566) was…
Alignment Faking in Large Language Models
December 18, 2024
The paper Alignment Faking in Large Language Models (arXiv:2412.14093) was published on 18 December 2024 by a…
Risks from Learned Optimization in Advanced Machine Learning Systems
June 5, 2019
The paper Risks from Learned Optimization in Advanced Machine Learning Systems (arXiv:1906.01820) was…

People 16

Findings 5

Regulation 14