Login

1van · (This post was last modified: 10-15-2022, 02:51 PM by 1van.)

Pisanje eksploita prostim jezikom Smile

Quote:OpenAI’s API provides access to GPT-3, which performs a wide variety of natural language tasks, and Codex, which translates natural language to code.

Problemi su opisani ovde:

- https://simonwillison.net/2022/Sep/16/pr...solutions/
- https://simonwillison.net/2022/Sep/12/prompt-injection/

Quote:The more I think about these prompt injection attacks against GPT-3, the more my amusement turns to genuine concern.

I know how to beat XSS, and SQL injection, and so many other exploits.

I have no idea how to reliably beat prompt injection!

As a security-minded engineer this really bothers me. I’m excited about the potential of building cool things against large language models.

But I want to be confident that I can secure them before I commit to shipping any software that uses this technology.

A big problem here is provability. Language models like GPT-3 are the ultimate black boxes. It doesn’t matter how many automated tests I write, I can never be 100% certain that a user won’t come up with some grammatical construct I hadn’t predicted that will subvert my defenses.

Quote:If I had a protection against XSS or SQL injection that worked for 99% of cases it would be only be a matter of time before someone figured out an exploit that snuck through.

And with prompt injection anyone who can construct a sentence in some human language (not even limited to English) is a potential attacker / vulnerability researcher!

Another reason to worry: let’s say you carefully construct a prompt that you believe to be 100% secure against prompt injection attacks (and again, I’m not at all sure that’s possible.)

What happens if you want to run it against a new version of the language model you are using?

Every time you upgrade your language model you effectively have to start from scratch on those mitigations—because who knows if that new model will have subtle new ways of interpreting prompts that open up brand new holes?

1van · 02-04-2023, 12:48 PM

Zanimljiv primer: https://twitter.com/semenov_roman_/statu...7025613825.

[Image: attachment.php?aid=555]

1van · 02-13-2023, 03:43 PM

Još jedan zanimljivi primer: https://twitter.com/kliu128/status/1623472922374574080, i detalji: https://arstechnica.com/information-tech...on-attack/.

[Image: attachment.php?aid=588]

Login
Username/Email:
Password:	Lost Password?
	Remember me