You can make top LLMs break their own rules with gibberish

Elephant0991@lemmy.bleh.au · edit-2 1 year ago

You can make top LLMs break their own rules with gibberish

itsgallus@beehaw.org · 1 year ago

Oh, I’m not saying there aren’t innate risks. You’re bringing up great points, and I agree we mustn’t throw caution to the wind. This is slightly besides the point of my initial comment, though, where I was merely stating my belief that the “hack” described in the OP might be a non issue in a couple of years. But you are right. Again, I’m sorry about my ignorance. I didn’t mean to start an argument. It’s great hearing other points of view, though.