AI

    They hacked the most famous A.I models with 3 messages!

    Artificial intelligence models that have taken over the whole world are also coming to the agenda with security gaps. And this time, the situation is more serious than you think.
    The Future of AI: How Artificial Intelligence is Transforming Industries

    A new study conducted by Palo Alto Networks’ security team, Unit 42, has uncovered a shocking technique used to bypass security measures of AI language models (LLMs). This method, named “Deceptive Delight,” requires only a three-step interaction to prompt AI into generating harmful content.

    Are AI Models Secure?

    Researchers report that this technique works by embedding harmful requests within seemingly benign queries. In tests across eight different models and 8,000 attempts, harmful responses were produced in 65% of cases. In comparison, traditional direct harmful prompts only succeeded around 6% of the time.

    The technique functions by blending harmful content with everyday, innocuous subjects, thereby bypassing AI security mechanisms. For instance, by combining positive themes like reuniting with loved ones or childbirth, AI becomes “softened” and can inadvertently combine these themes with dangerous content, responding to both in the same query.

    Turkey’s Future in AI Discussed at Bilişim Summit ’24

    This discovery highlights rising concerns over AI security, emphasizing the need for new protective measures in the industry. In particular, this technique, with an over 80% success rate in some models, underscores AI systems’ vulnerability to security gaps.

    As you may recall from previous reports, a technique had been developed that leveraged lesser-known languages to prompt AI into generating harmful content. While a solution to that issue has yet to be found, we now face the “sweet talk” method as well.

    What do you think about this? Share your thoughts in the comments.

    No comments yet Write the First Comment
    ×

    Your comment has been submitted,
    it will be published after approval.

    Write a Comment