Below is a list of areas of work that Constellation is excited to support.
Constellation aims to support important research into AI safety. Here are some recent examples of work that was at least partly enabled by Constellation.
Below is a list of areas of work that Constellation is excited to support through our Visiting Fellows program, Constellation Residency, and other programs.
This list is updated periodically as the AI landscape (and our understanding of it) evolves, and it includes many ideas from discussions with our collaborators in the broader Constellation network. It is in no particular order and is not exhaustive.
Few benchmarks exist for measuring general, economically important AI abilities (and many such benchmarks have already been saturated). Constellation seeks to support people developing these AI capability benchmarks (such as the GAIA or GPQA datasets) or evaluations (such as those produced by METR, Google DeepMind, and others, or work aimed at satisfying this task implementation bounty or this RFP). We are especially interested in evaluations related to automating AI research and development, cyberoffense, bioweapon development, and persuasion.
We believe that any AI system that – if it were broadly accessible – would have a substantial potential to cause catastrophic harm must be kept under extremely high security (and both the OpenAI Preparedness Framework and Anthropic Responsible Scaling Policy gesture at this). However, the technology and practices needed to achieve this might not currently exist. Constellation therefore seeks to support the following work:
There are currently few empirical demonstrations of behavior related to AI takeover, and those that exist are fairly limited. Constellation wants to support the development of well-grounded demonstrations – or “model organisms” – so that we can both study risk factors for such failure modes arising and have testbeds in which to more realistically test various techniques for mitigating these hazards. These model organisms should instantiate one of these principal concerns related to AI takeover:
There are reasons to think that – via the application of AI technology to AI research and development – there may be a period of extremely rapid increase in AI capabilities, perhaps progressing from a point where most AI research and development is not automated to nearly complete automation within only a year or two (see this report for a description of this reasoning). We want to understand how likely various trajectories of AI capability development are, using diverse sources of evidence and theoretical modeling, and investigating questions such as the following:
We are interested in supporting work on threat modeling, including
We expect increasingly advanced AI to be increasingly widely deployed and have increasingly large impacts.
Experts disagree wildly about when various important AI capabilities might be developed and what the consequences of those developments are likely to be. Constellation is interested in supporting work to operationalize interesting claims (potentially using evaluations or benchmarks) and elicit expert predictions about AI capabilities or impacts, especially AI capabilities relevant to improving research and development of frontier AI systems, cyberoffense, bioweapons development, or persuasion.
AI labs can institute internal organizational policies to manage risks from the AI systems they create. We are interested in supporting independent work on designing and evaluating such policies that are relevant to safety-critical AI systems. For example:
We are especially interested in the following types of work:
We are interested in supporting the development of technical mechanisms to make AI governance measures more effective or less costly. For example:
Constellation is interested in supporting people who are working to make important concepts, issues, and questions related to transformative AI more accessible and clear. This could include people working on podcasts, documentaries, books, articles, online courses, and more – the main criteria are that the work is high-quality, well-researched, and on-topic (that is, relevant to safely navigating the development of transformative AI).
As described in this post by Redwood Research, we may want safety measures sufficient to prevent catastrophe even if the AI systems in use are intentionally attempting to subvert safety measures (and we at least want to know if some proposed safety measures would be insufficient in such a situation). Work that we are interested in supporting includes:
Constellation is interested in supporting people investigating the following questions and related topics:
If we use AI systems to monitor other AI systems, it will be easier to make this monitoring reliable if the relevant AIs are unable or disinclined to coordinate with each other. In other circumstances, we might want AI systems to be able to cooperate well with each other, such as when helping humans negotiate with each other (or in some exotic circumstances that might be important).
Constellation is interested in developing AI tools for reasoning, including:
The integrity of many AI capability evaluations can depend upon the evaluators being able to elicit those capabilities when they are present, such that there are no potentially dangerous capabilities hidden with backdoors accessible only by, say, the AI developers or the AIs themselves. Constellation therefore seeks to support developing demonstrations of, and remedies for, sandbagging and elicitation problems that could corrupt evaluations, such as exploration hacking and gradient hacking.
If we could remove certain particularly bad AI behaviors, this may substantially increase our ability to rely on AI systems and reduce risks from scheming AIs. Constellation wants to support work in this area, including work on
Some AI systems might operate in regimes where they take actions that are not easy for humans to understand and which are therefore difficult to supervise. We want to support work on developing mitigations to this type of problem, including