Anthropic publishes the ‘system prompts’ that make Claude tick

08/27/2024

Generative AI models aren’t actually humanlike. They have no intelligence or personality — they’re simply statistical systems predicting the likeliest next words in a sentence. But like interns at a tyrannical workplace, they do follow instructions without complaint — including initial “system prompts” that prime the models with their basic qualities and what they should and shouldn’t do. Every generative AI vendor, from OpenAI to Anthropic, uses system prompts to prevent (or at least try to prevent) models from behaving badly, and to steer the general tone and sentiment of the models’ replies. For instance, a prompt might tell a model it should be polite but never apologetic, or to be honest about the fact that it can’t know everything. But vendors usually keep system prompts close to the chest — presumably for competitive reasons, but also perhaps because knowing the system prompt may suggest ways to circumvent it. The only way to expose GPT-4o‘s system prompt, for example, is through a prompt injection attack. And even then, the system’s output can’t be trusted completely. However, Anthropic, in its continued effort to paint itself as a more ethical, transparent AI vendor, has published the system prompts for its latest models (Claude 3 Opus, Claude 3.5 Sonnet and Claude 3.5 Haiku) in the Claude iOS and Android apps and on the web. Alex Albert, head of Anthropic’s developer relations, said in a post on X that Anthropic plans to make this sort of disclosure a regular thing as it updates and fine-tunes its system prompts.

Full story : Anthropic becomes the first major AI vendor to publish a changelog for system prompts of top models, including Claude 3 Opus, Claude 3.5 Sonnet, and Claude 3.5.

Tagged: AI AI Innovation Anthropic Claude AI Large Language Models

Subscribe Sign In

Related Posts