Astra Fellowship

The Astra Fellowship pairs fellows with experienced advisors to collaborate on a two or three month AI safety research project. Fellows will be part of a cohort of talented researchers working out of the Constellation offices in Berkeley, CA, allowing them to connect and exchange ideas with leading AI safety researchers. The program will take place between January 4 and March 15, 2024, though the start and end dates are flexible and partially remote participation is possible. The deadline to apply has passed.

Program advisors

Ajeya Cotra

Senior Program Officer, Open Philanthropy

Read bio

Ajeya Cotra leads Open Philanthropy’s grantmaking on technical research that could help to clarify and reduce catastrophic risks from advanced AI. As part of this role, she conducts analysis on threat models (ways that advanced AI could cause catastrophic harm) and technical agendas (technical work that may help to address these threat models). She also co-founded, edits, and writes for Planned Obsolescence, a blog about AI futurism and AI alignment. She is interested in advising technical writers interested in AI alignment and other aspects of AI safety.

Buck Shlegeris

CEO, Redwood Research

Read bio

Buck Shlegeris is the CTO at Redwood Research, a nonprofit organization that focuses on applied alignment research for artificial intelligence. Previously, he was a researcher at the Machine Intelligence Research Institute. His recent work includes a library for expressing and manipulating tensor computations for neural network interpretability, as well as the papers “Causal Scrubbing,” “Interpretability in the Wild,” “Polysemanticity and Capacity in Neural Networks,” and “Adversarial Training for High-Stakes Reliability.” He is interested in developing evaluation methodologies which AI developers could use to make robust safety cases for their training and deployment plans, examples here, as well as evaluating and improving safety techniques by using these evaluations.

Daniel Kokotajlo

Research Scientist, OpenAI Governance Team  

Read bio

Daniel Kokotajlo has a background in academic philosophy, but now works at OpenAI doing alignment and governance work. In between he did some good work on AI timelines. He's also worked at CLR trying to reduce s-risk and promote cooperative AGI.

Ethan Perez

Research Lead, Anthropic

Read bio

Ethan Perez is a Research Scientist at Anthropic, where he leads a team working on developing model organisms of misalignment. He has recently published work on "Discovering Language Model Behaviors with Model-Written Evaluations,” “Measuring Progress on Scalable Oversight for Large Language Models,” and co-founded the Inverse Scaling Prize. Ethan’s research interests include robustness, model transparency, and the development of techniques to better understand and control AI systems. 

Evan Hubinger

Research Lead, Anthropic

Read bio

Evan Hubinger is a research scientist and team lead at Anthropic leading work on model organisms of misalignment. Before joining Anthropic, Evan was a research fellow at the Machine Intelligence Research Institute. Evan's research focuses on inner alignment and deceptive alignment, concepts Evan and his coauthors coined in the paper "Risks from Learned Optimization in Advanced Machine Learning Systems.” He is interested in work related to model organisms of misalignment, conditioning predictive models, and deceptive alignment.

Fabien Roger

Member of Technical Staff, Redwood Research

Read bio

Fabien Roger is a member of technical staff at Redwood Research. He is interested in finding methods to defend against deceptive models. In particular, he is interested in projects aimed at exploring one of the defenses against potentially deceptive models, such as improving paraphrasing to prevent steganography, training reliable detectors of suspicious activity, or training translators to make models unable to tell if there was a distribution shift.

Hjalmar Wijk

Member of Technical Staff, METR

Read bio

Hjalmar Wijk is a member of technical staff at ARC Evals. He is interested in work clarifying threat models and creating model evaluations to prevent harm from these threat models. In particular, he would like to make progress on the following questions: What are the most likely ways AI systems trained in the next year or two could cause catastrophic harms? What are the capabilities that enable these threat models, and how (at a high level) could we evaluate for them? At what levels do these risks clearly demand strong containment measures like state-proof security? There is a lot of conceptual work involved with mapping out this space, understanding future AI capabilities and identifying key considerations, but there is also an opportunity for more concrete work talking to diverse experts, identifying current bottlenecks to harm and finding reference classes or historical examples.

Lukas Finnveden

Research Analyst, Open Philanthropy

Read bio

Lukas Finnveden is a member of Open Philanthropy's worldview investigation team. He is interested in issues other than intent alignment that might be important if transformative AI happens in the next decade or two, such as: how society should treat digital mind as it becomes increasingly plausible that they are sentient, ways to increase the probability that humanity gets on a good deliberative track to eventually reach good empirical & philosophical beliefs, and ways evidential cooperation in large worlds could be decision-relevant.

Megan Kinniment

Member of Technical Staff, METR

Read bio

Megan Kinniment is a member of technical staff at ARC Evals who leads a variety of research projects. ARC Evals’ current research focuses on eliciting and measuring capabilities of frontier models, as well as threat modeling work to understand the minimum capabilities needed for autonomous AI to pose a catastrophic risk. They are interested in projects relating to LLM agent development, task creation, threat modeling, and model assessment.

Owain Evans

Research Associate, Future of Humanity Institute

Read bio

Owain Evans leads a research group in Berkeley and has mentored 25+ alignment researchers in the past, primarily at Oxford’s Future of Humanity Institute. He is currently focused on defining and evaluating situational awareness in LLMs (relevant paper), learning how to predict the emergence of other dangerous capabilities empirically (see “out-of-context” reasoning and the Reversal Curse), and honesty, lying, truthfulness and introspection in LLMs. 

Richard Ngo

Research Scientist, OpenAI Governance Team

Read bio

Richard Ngo is a member of the OpenAI Governance team. He was previously a research engineer on the AGI safety team at DeepMind and was one of the main developers of the AI Safety Fundamentals curriculum. He is interested in AI threat modeling – understanding both why AIs might be misaligned and what misaligned AIs might do – as well as understanding defensive uses of AI and forecasting the far future. 

Rob Long & Ethan Perez

Research Associate, Center for AI Safety (and Research Lead, Anthropic)

Read bio

Rob Long is a Research Associate at the Center for AI Safety, where he works on issues at the intersection of philosophy of mind, cognitive science, and ethics of AI. Ethan Perez is a Research Scientist at Anthropic, where he leads a team working on developing model organisms of misalignment. They are interested in finding methods to determine whether AI systems possess consciousness, desires, or other states of moral significance. Self-reports, or an AI system’s statements about its own internal states, could provide a promising method for investigating this question. They are interested in testing whether current language models can learn to introspect in ways that generalize to other introspective questions, a necessary condition for eliciting reliable AI self-reports. 

Tom Davidson

Senior Research Analyst, Open Philanthropy

Read bio

Tom Davidson is senior research analyst at Open Philanthropy. He works on assessing whether transformative AI might be developed relatively soon, how suddenly it might be developed, and what impact it might have. See his recent report on AI takeoff speeds. He is interested in various projects related to this work: in particular, whether the biggest AI algorithmic discoveries of the last ten years would have been possible without fast-growing amounts of available compute, and how important Responsible Scaling Policies might be for reducing risks from accelerating AI progress. 

In addition to these advisors the Astra Fellowship offered other potential advisors with specializations in Frontier Model Redteaming, Compute Governance, & Cybersecurity.

Program details

We will provide housing and transportation within Berkeley for the duration of the program. Additionally, we recommended Astra invitees to AI Safety Support (an Australian charity) for independent research grants and they have decided to provide grants of $15k for 10 weeks of independent research to accepted Astra applicants in support of their AI Safety research.

Fellows will conduct research from Constellation’s shared office space, and lunch and dinner will be provided daily. Individual advisors will choose when and how to interact with their fellows, but most advisors will work out of the Constellation office frequently. There will be regular invited talks from senior researchers, social events with Constellation members, and opportunities to receive feedback on research. Before the program begins, we may also provide tutorial support to fellows interested in going through Constellation and Redwood Research’s MLAB curriculum.

We expect to inform all applicants whether they have progressed to the second round of applications within a week of them applying, and to make final decisions by December 1, 2023. For more details on the application process, see the FAQ below. The deadline to apply has passed.

Testimonials

"Participating in MLAB [the Machine Learning for Alignment Bootcamp, jointly run by Constellation and Redwood Research] was probably the biggest single direct cause for me to land my current role. The material was hugely helpful, and the Constellation network is awesome for connecting with AI safety organizations."

Haoxing Du

Member of Technical Staff, METR

“Having research chats with people I met at Constellation has given rise to new research directions I hadn't previously considered, like model organisms. Talking with people at Constellation is how I decided that existential risk from AI is non-trivial, after having many back and forth conversations with people in the office. These updates have had large ramifications for how I’ve done my research, and significantly increased the impact of my research.”

Ethan Perez

Research Lead, Anthropic

"Speaking with AI safety researchers in Constellation was an essential part of how I formed my views on AI threat models and AI safety research prioritization. It also gave me access to a researcher network that I've found very valuable for my career.”

Sam Marks

Postdoc, David Bau's Interpretable Neural Networks Lab

“Participating in MLAB was likely the most important thing I did for upskilling to get my current position and has generally been quite valuable for my research via gaining intuitions on how language models work, gaining more Python fluency, and better understanding ML engineering. I’m excited for other people to have similar opportunities.”

Ansh Radhakrishnan

Researcher, Anthropic

Frequently asked questions

If your question isn't answered here, please reach out to programs@constellation.org

What is Constellation?
Who is eligible to apply?    
What do fellows tend to do after the program?
What is the application process like?
What if I’m not available for these dates?
How does this compare to the Visiting Researcher Program?
What housing and travel will you cover?
Can I refer someone?
Will I receive any compensation for participating in this program?