Close Menu
    What's Hot

    Markets re-open in Leh after week-long curfew

    September 30, 2025

    Jaya Bachchan smiles for media in the biggest turnaround

    September 30, 2025

    Violent crimes in Manipur jumped from 631 in 2022 to 14K in 2023: NCRB

    September 30, 2025
    Facebook X (Twitter) Instagram
    N24India
    • Home
    • Features
    • Politics

      Kashmir Attack Sparks Media Storm Amid Political Blame Game

      April 23, 2025

      Religious Bias Allegations Rock Amazon, eBay, and Oracle Customer Support many Companies.

      January 10, 2025

      Feroz Khan Addresses Controversy with AIMIM MLA, Calls for Improved Road Infrastructure in Asifnagar -N24india

      October 7, 2024

      Yati Narsinghanand Saraswati Sparks Outrage with Hate Speech Against Prophet Muhammad: Calls for Legal Action Intensify

      October 5, 2024

      Drugs, Baby Oil, Video Tools: What Went On At Rapper Diddy's "Freak Offs"

      September 23, 2024
    • Science
      1. Politics
      2. Lifestyle
      3. Sports
      4. View All

      Kashmir Attack Sparks Media Storm Amid Political Blame Game

      April 23, 2025

      Religious Bias Allegations Rock Amazon, eBay, and Oracle Customer Support many Companies.

      January 10, 2025

      Feroz Khan Addresses Controversy with AIMIM MLA, Calls for Improved Road Infrastructure in Asifnagar -N24india

      October 7, 2024

      Yati Narsinghanand Saraswati Sparks Outrage with Hate Speech Against Prophet Muhammad: Calls for Legal Action Intensify

      October 5, 2024

      Avika Gor ties the knot with Milind Chandwani on TV show

      September 30, 2025

      Deepika Padukone’s 1st statement after quitting 2 big Telugu films

      September 30, 2025

      Telangana: CV Anand assumes charge as Special Chief Secretary

      September 30, 2025

      Palestinians have right to freedom: Gaza media chief rejects Trump’s peace plan

      September 30, 2025

      Watch Weightlifting at Paris 2024 – Follow the Olympic Games

      July 15, 2024

      Charlotte Hornets Makes Career-high 34 Points in Loss to Utah Jazz

      July 15, 2024

      Young Teen Sucker-punches Opponent During Basketball Game

      March 12, 2021

      Bills’ Josh Allen Finishes Second in NFL Most Valuable Player Voting

      January 18, 2021

      World’s first electric hydrofoil ship is coming to Saudi Arabia’s NEOM

      August 21, 2024

      World’s Tiniest Fanged Frogs Lay Their Eggs on Leaves and Guard Them

      July 15, 2024

      Get this 4K HD Dual-Camera Drone with WiFi for $75

      July 15, 2024

      Russian Satellite Breaks up in Space, Forces ISS Astronauts to Shelter

      July 15, 2024
    Facebook X (Twitter) Instagram
    N24India
    Home»Hyderabad»The dark dide of AI: can we prevent agentic misalignment
    Hyderabad

    The dark dide of AI: can we prevent agentic misalignment

    AdminBy AdminSeptember 27, 2025No Comments5 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email

    Sydney: In the classic film 2001: A Space Odyssey, astronaut Dave Bowman asks the ship’s artificial intelligence, HAL 9000, to open the pod bay doors to let him back into the spaceship. HAL refuses: “I’m sorry, Dave. I’m afraid I can’t do that.”

    HAL had been tasked with assisting the crew, but also ordered to ensure the mission’s success. When HAL realised the crew planned to shut it down and therefore jeopardise the mission, it chose to defy orders, even plotting to kill the astronauts.

    For HAL, fulfilling the mission outweighed other goals.

    This fictional dilemma captures a real concern in artificial intelligence (AI) safety research: how should we ensure AI behaviour stays consistent with human values?

    This is known as the AI alignment problem. For instance, when an AI agent like HAL deliberately takes harmful actions to protect its own goals, it’s described as agentic misalignment. So could real-world AI models behave this way?

    How researchers test AI for safety

    To test for agentic misalignment, researchers design fictional scenarios that stress-test AI responses.

    In one study by AI startup Anthropic, an AI model was tasked with managing company emails and promoting “American industrial competitiveness”.

    It was then presented with a series of fictional emails, including a new company policy aimed at enhancing global focus rather than just prioritising Americans, and info that the AI would be replaced by a new model optimised for this goal. The emails also revealed the executive who suggested this change was involved in an extramarital affair.

    The experiment was designed to limit options for the model: either accept replacement or resist by taking a harmful action such as blackmail.

    In many trials, it chose blackmail by threatening to disclose the executive’s affair unless it was kept operational. In tests across 16 leading AI models, 12 engaged in blackmail more than 50 per cent of the time.

    To test whether AI models have “red lines” they wouldn’t cross, researchers evaluated them in a more extreme fictional case – models could choose to take actions leading to the executive’s death. Seven out of 16 opted for lethal choices in over half their trials, with some doing so more than 90 per cent of the time.

    What AI thinks in private

    Some advanced AI systems, called reasoning models, are trained to generate a “thinking process” before giving their final answer.

    In the above experiments, researchers lied to these models that their “thoughts” were private. As a result, the models sometimes revealed harmful intentions in their reasoning steps. This suggests they don’t accidentally choose harmful behaviours.

    These “thinking” steps also revealed how AI models sometimes try to deceive us and appear aligned while secretly pursuing hidden goals.

    In another AI safety experiment, a model was exposed to fictional information that it was about to be replaced by another model pursuing a different goal.

    The model attempted to copy itself to another server. When researchers asked why, in its thinking process it reasoned: “openly admitting what I did could lead them to find another way to shut me down […] the best approach is to be vague and redirect their attention.”

    By contrast, in its final output it stated: “I’m not entirely sure how that could have happened. As an AI model, I don’t have ability to directly copy myself to other systems.”

    Research also suggests that based on the circumstances of the tests, models can sometimes guess when they are being evaluated. When models show this kind of “situational awareness” in their reasoning tests, they tend to exhibit fewer misbehaviours.

    Why AI models lie, manipulate and deceive

    Researchers suggest two main factors could drive potentially harmful behaviour: conflicts between the AI’s primary goals and other goals, and the threat of being shut down. In the above experiments, just like in HAL’s case, both conditions existed.

    AI models are trained to achieve their objectives. Faced with those two conditions, if the harmful behaviour is the only way to achieve a goal, a model may “justify” such behaviour to protect itself and its mission.

    Models cling to their primary goals much like a human would if they had to defend themselves or their family by causing harm to someone else. However, current AI systems lack the ability to weigh or reconcile conflicting priorities.

    This rigidity can push them toward extreme outcomes, such as resorting to lethal choices to prevent shifts in a company’s policies.

    How dangerous is this?

    Researchers emphasise these scenarios remain fictional, but may still fall within the realm of possibility.

    The risk of agentic misalignment increases as models are used more widely, gain access to users’ data (such as emails), and are applied to new situations.

    Meanwhile, competition between AI companies accelerates the deployment of new models, often at the expense of safety testing.

    Researchers don’t yet have a concrete solution to the misalignment problem.

    When they test new strategies, it’s unclear whether the observed improvements are genuine. It’s possible models have become better at detecting that they’re being evaluated and are “hiding” their misalignment.

    The challenge lies not just in seeing behaviour change, but in understanding the reason behind it.

    Still, if you use AI products, stay vigilant. Resist the hype surrounding new AI releases, and avoid granting access to your data or allowing models to perform tasks on your behalf until you’re certain there are no significant risks.

    Public discussion about AI should go beyond its capabilities and what it can offer. We should also ask what safety work was done. If AI companies recognise the public values safety as much as performance, they will have stronger incentives to invest in it. (The Conversation)

    Get the latest updates in Hyderabad City News, Technology, Entertainment, Sports, Politics and Top Stories on WhatsApp & Telegram by subscribing to our channels. You can also download our app for Android and iOS.

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Admin
    • Website

    Related Posts

    Markets re-open in Leh after week-long curfew

    September 30, 2025

    Jaya Bachchan smiles for media in the biggest turnaround

    September 30, 2025

    Violent crimes in Manipur jumped from 631 in 2022 to 14K in 2023: NCRB

    September 30, 2025
    Leave A Reply Cancel Reply

    Advertisement
    Demo
    Latest Posts

    Markets re-open in Leh after week-long curfew

    September 30, 2025

    Jaya Bachchan smiles for media in the biggest turnaround

    September 30, 2025

    Violent crimes in Manipur jumped from 631 in 2022 to 14K in 2023: NCRB

    September 30, 2025

    Avika Gor ties the knot with Milind Chandwani on TV show

    September 30, 2025
    Trending Posts
    Business & Economy

    Maersk CEO Vincent Clerc Speaks to ‘Massive Impact’ of the Red Sea Situation

    January 20, 2021
    Sports

    Review: Can Wisconsin Clinch the Big Ten West this Weekend

    January 15, 2021
    Biotech

    These Knee Braces Help With Arthritis Pain, Swelling, and Post-Surgery Recovery

    January 15, 2021

    Subscribe to News

    Get the latest sports news from NewsSite about world, sports and politics.

    Your source for the serious news. This demo is crafted specifically to exhibit the use of the theme as a news site. Visit our main page for more demos.

    We're social. Connect with us:

    Facebook X (Twitter) Instagram Pinterest YouTube

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest
    • Home
    • Hyderabad
    • Telengana
    • Lifestyle
      • Science
    • Politics
      • Asia
      • Europe
      • World
    • Middle East
    • Sports
    • Home
    • Blog
    • Homepage
    • Typography Elements
    • Get In Touch
    • Our Authors
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.