OpenAI provides a bug bounty for finding and reporting vulnerabilities in its AI services, including ChatGPT. Reports can be submitted via Bugcrowd for $200 for “low-severity findings” to $20,000 for “exceptional discoveries.”
The bounty excludes jailbreaking ChatGPT or making it generate harmful code or content. “Issues related to the content of model prompts and responses are strictly out of scope, and will not be rewarded,” reads OpenAI’s Bugcrowd website.
Jailbreaking ChatGPT frequently entails entering complex scenarios to evade its safety controls. For example, these may involve encouraging the chatbot to act as its “evil twin” to elicit restricted answers like hate speech or weapon-making instructions.
“Model safety issues do not fit well within a bug bounty program, as they are not individual, discrete bugs that can be directly fixed,” OpenAI argues. “Addressing these issues often involves substantial research and a broader approach,” the business says. Report such concerns on its model feedback website.
Jailbreaks show the larger weaknesses of AI systems, although they may not directly affect OpenAI as much as typical security failures. For example, last month, rez0 revealed 80 “secret plugins” for the ChatGPT API, unpublished or experimental chatbot add-ons. The vulnerability was patched a day after Rez0 tweeted about it.
One user commented on the tweet thread: “If they only had a paid #BugBounty program – I’m certain the crowd could help them catch these edge-cases in the future.”