Research Interests
I’m interested in the place where formal and informal systems meet: in particular, where the rigor and clarity of computer science collide with the fundamental, human-centered questions of philosophy and psychology.
My research pursues this junction through different perspectives (and fields). I’m interested in computational cognitive science—formal models of human cognition—and in AI alignment—formal models of human norms, desires, goals, and values. Recently, that has meant examining the implicit models of human decision making that are inverted by reinforcement learning from human feedback (RLHF), and exploring reward models as rich but flawed artifacts for expressing human preferences, subject to both computational and representational constraints. I am interested in how these models both succeed and fail.
My work has explored what chatbots do and don’t capture about conversation at its best (and worst); charted the computational structure of everyday decision-making dilemmas and shown that humans embody a resource-rational approximation of optimal solutions; and interrogated how reward models operationalize human preferences, norms, and values by turning normative and non-normative dialogues into scalar reward numbers. Through it all I am most engaged with pursuing rigorous notions of optimality, structure, and (mis)specification as they pertain to the core questions of the human experience. This is not my only interest, but after 20 years it is clear that it is inexhaustible.
There has never been a richer or more fascinating—or more urgent—time to pursue this particular set of questions. It is more than a life’s work, and it demands of us—I have always believed—that we be willing to trespass over traditional disciplinary lines to pursue these questions wherever they lead.
Selected Publications
-
Christian, B. (2025). Computational Frameworks for Human Care. In Alison Gopnik, Margaret Levi, and Zachary Ugolnik (Eds.), Dædalus: The Social Science of Caregiving. [PDF]
-
Sandbrink, K. J., Christian, B., Nasvytis, L., de Witt, C. S., & Butlin, P. (2024). Using adaptive intrinsic motivation in RL to model learning across development. In NeurIPS Workshop on Intrinsically-Motivated and Open-Ended Learning (IMOL). [PDF]
-
Adler, S., Hitzig, Z., Jain, S., Brewer, C., Chang, W., DiResta, R., ... Christian, B., et al. (2024). Personhood credentials: Artificial intelligence and the need for privacy-preserving ways to distinguish who is real online. [PDF]
-
Sandbrink, K., Christian, B., Nasvytis, L., de Witt, C. S., & Butlin, P. (2024). Can reinforcement learning model learning across development? Online lifelong learning through adaptive intrinsic motivation. In Proceedings of the 46th Annual Conference of the Cognitive Science Society. [PDF]
-
Eckersley, P., Shron, M., Jha, T., Siddarth, D., Gallagher, B., Wainwright, C., Lehman, J., Christian, B., & Turan, D. (2023). AI Objectives Institute Whitepaper: A Research Agenda for the Production of a Flourishing Civilization. [PDF]
-
Maniam, A., Nelson, A., Garfinkel, B., Christian, B., Ho, D.E., Chou, D., Toner, H., Raji, I.D., et al. (2023). Comment of the AI Policy and Governance Working Group on the NTIA AI Accountability Policy Request for Comment Docket NTIA-230407-0093. [PDF]
-
Vélez, N., Christian, B., Hardy, M., Thompson, B. D., & Griffiths, T. L. (2023). How do humans overcome individual computational limitations by working together? Cognitive Science, 47(1):e13232. [DOI] [PDF]
-
Press, A., Pacer, M., Griffiths, T., & Christian, B. (2014). Caching algorithms and rational models of memory. In Proceedings of the Annual Meeting of the Cognitive Science Society. [PDF]
-
Griffiths, T. L., Christian, B. R., & Kalish, M. L. (2008). Using category structures to test iterated learning as a method for identifying inductive biases. Cognitive Science, 32:68-107. [DOI] [PDF]
-
Griffiths, T. L., Christian, B. R., & Kalish, M. L. (2006). Revealing priors on category structures through iterated learning. In Proceedings of the 28th Annual Conference of the Cognitive Science Society. [PDF]
For a more complete publication list, see my Google Scholar profile.
Current Academic Affiliations
Selected Invited Talks
- Stanford Center for Advanced Study in the Behavioral Science (CASBS), 2024
- The Ethical and Societal Implications of Agentic AI Systems, Google DeepMind and Stanford McCoy Family Center for Ethics in Society, 2024
- Brown University Data Science Initiative, 2023
- United States Federal Trade Commission, 2023
- Summit on AI in Society, UChicago, 2023
- The Santa Fe Institute (SFI), 2023
- The Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM), 2022
- Yale University, Wu Tsai Institute and Schmidt Program on Artificial Intelligence, Emerging Technologies, and National Power, 2022
- UK All-Party Parliamentary Group on Artificial Intelligence and All-Party Parliamentary Group for Future Generations, 2022
- The Linux Foundation, 2021
- The Royal Institution, 2021
- Google DeepMind, 2021
- Mila (Quebec Artificial Intelligence Institute), 2021
- Simons Institute for the Theory of Computing, 2020
- UC Berkeley Center for Human-Compatible AI Workshop (CHAI), 2020