CommunityNews

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold, on a large university network consisting of ~8,000 hosts across 12 subnets. ARTEMIS is a multi-agent framework featuring dynamic prompt generation, arbitrary sub-agents, and automatic vulnerability triaging. In our comparative study, ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate and outperforming 9 of 10 human participants. While existing scaffolds such as Codex and CyAgent underperformed relative to most human participants, ARTEMIS demonstrated technical sophistication and submission quality comparable to the strongest participants. We observe that AI agents offer advantages in systematic enumeration, parallel exploitation, and cost – certain ARTEMIS variants cost $18/hour versus $60/hour for professional penetration testers. We also identify key capability gaps: AI agents exhibit higher false-positive rates and struggle with GUI-based tasks.

Read in full here:

View thread on forum

#testing #cybersecurity

0 2 0

2026-01-07 00:45:43 UTC

Where Next?

View thread on forum

testing

cybersecurity

Home AI>In The News

#testing #cybersecurity

0 2 0

Last post

Popular Ai topics

AI>In The News

AI Teaches Itself Diplomacy

Now that DeepMind has taught AI to master the game of Go—and furthered its advantage in chess—they’ve turned their attention to another b...

spectrum.ieee.org

0 1483 0

2021-03-06 13:40:54 UTC

New

AI>In The News

Why cows may be hiding something but AI can spot it

bbc.co.uk

#spot

0 1004 0

2022-02-01 15:09:12 UTC

New

AI>In The News

In the metaverse, responsible AI must be a priority

Language technology powered by AI can perpetuate bias if we are not careful. We need to be sure that language AI is trained to be ethical...

techcrunch.com

#metaverse

0 973 0

2022-03-05 14:57:25 UTC

New

AI>In The News

Fake Joe Rogan interviews fake Steve Jobs in an AI-powered podcast

Voice synthesis PR stunt calls upon the dead to help sell an AI product.

arstechnica.com

#jobs

2 913 3

2023-01-10 21:50:47 UTC

New

AI>In The News

Why AI is still dumb and not scary at all (pt.1)

How I Learned to Stop Worrying and Love the AI

tejo.substack.com

15 722 9

2025-05-05 21:52:16 UTC

New

AI>In The News

OpenAI introduces Codex, its first full-fledged AI agent for coding

It replicates your development environment and takes up to 30 minutes per task.

arstechnica.com

#coding #openai

3 733 4

2025-05-20 23:35:09 UTC

New

AI>In The News

Switching to Claude Code + VSCode inside Docker

Why I decided to ditch Cursor and switch to running Claude Code in an isolated environment + diy guide!

timsh.org

#docker #code /vscode #claude

0 849 2

2026-04-21 12:51:23 UTC

New

AI>In The News

LLM Leaderboard - Comparison of over 100 AI models from OpenAI, Google, DeepSeek & others | Artificial Analysis

Comparison and ranking the performance of over 100 AI models (LLMs) across key metrics including intelligence, price, performance and spe...

artificialanalysis.ai

#google #artificial #openai #llm /deepseek

0 1250 0

2025-08-01 14:49:37 UTC

New

AI>In The News

Impeccable: Design skills for AI coding tools

1 skill, 17 commands, and curated anti-patterns for impeccable frontend design. Works with Cursor, Claude Code, Gemini CLI, and Codex CLI...

impeccable.style

#coding #design

0 1 0

2026-01-15 17:17:29 UTC

New

AI>In The News

Why I Cancelled Claude: Token Issues, Declining Quality, and Poor Support

First enthusiasm A couple of weeks ago I subscribed to Claude Code, and during the first few weeks I had a really nice experience. It was...

nickyreinert.de

#claude

14 174 11

2026-05-01 09:28:48 UTC

New

Other popular topics

General Dev>Learning Resources

The Pragmatic Programmer, 20th Anniversary Edition

Andy and Dave wrote this influential, classic book to help their clients create better software and rediscover the joy of coding. Almost ...

pragprog.com

#pragprog #published-book /book-the-pragmatic-programmer-20th-anniversary-edition

4 4782 0

2020-04-18 18:22:46 UTC

New

General Dev>Dev Chat

Standing Desks

No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:

#workspace #opinions

177 9886 77

2022-09-27 18:40:05 UTC

New

General Dev>Code Editors

Poll: Which code editor do you use?

You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...

#community #polls /vim /emacs #code-editors /vscode #notepad /sublime-text #atom /textmate #codespaces #brackets /onivim #geany

121 5796 61

2025-09-05 00:52:19 UTC

New

Backend>Questions

Erlang's not installing on macOS Big Sur "You are natively building Erlang/OTP for a later version of MacOSX than current version"

Just done a fresh install of macOS Big Sur and on installing Erlang I am getting: asdf install erlang 23.1.2 Configure failed. checking ...

#macos /erlang #big-sur #asdf

10 6212 8

2021-01-16 12:33:23 UTC

New

Backend>Learning Resources

Programming Phoenix LiveView

Build highly interactive applications without ever leaving Elixir, the way the experts do. Let LiveView take care of performance, scalabi...

pragprog.com

#pragprog /elixir /phoenix #published-book /book-programming-phoenix-liveview

84 14528 26

2026-07-17 13:20:20 UTC

New

General Dev>Dev Chat

Warp—The blazingly fast, Rust-based terminal

A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...

/rust #terminal

52 6785 22

2025-02-26 17:47:24 UTC

New

Backend>Chat

Data Structures and Algorithms with Elixir

This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...

/elixir #algorithms #data-structures

108 11869 31

2024-11-14 02:14:00 UTC

New

General Dev>In The News

Jan: An open source alternative to ChatGPT that runs on the desktop

Jan | Rethink the Computer. Jan turns your computer into an AI machine by running LLMs locally on your computer. It’s a privacy-focus, l...

jan.ai

#desktop #chatgpt

4 5652 4

2024-03-29 08:42:30 UTC

New

Backend>Learning Resources

Programming Clojure, Fourth Edition

Quickly turn complex problems into simple, working solutions using functional programming, safe concurrency, and the expressive tools of ...

pragprog.com

#pragprog

4 1975 3

2025-11-09 23:53:54 UTC

New

Game Dev>In The News

Grand Theft Auto: Vice City | DOS games in browser

Open-source implementation of the classic GTA engine now running directly in your browser. Experience the reVC technology demo on DOS.Zon...

dos.zone

#games #browser

0 173 0

2025-12-20 02:36:57 UTC

New

AI>In The News

Toolcraft - Starter kit for AI design apps

AI>In The News

I sent Claude Opus 5 "-" and it wrote me 5k tokens about a cartographer - Austin's Nerdy Things

AI>In The News

The Half We Don't Measure

AI>In The News

Your site, your rules: new AI traffic options for all customers

AI>In The News

How OpenAI Lost Control of an AI Model—and What Needs to Change

AI>In The News

The new rules of context engineering for Claude 5 generation models | Claude by Anthropic

AI>In The News

Be skeptical of OpenAI’s rogue hacker agent story

AI>In The News

Introducing Claude Opus 5

AI>In The News

AMD and Cerebras Launch AI Inference Solution

AI>In The News

Codeberg Divides

AI>In The News

AI In The News ❯

Latest on Devtalk

Toolcraft - Starter kit for AI design apps

AI>In The News

Apple won’t turn on any ‘restricted mode’ for missed lease payments

macOS>In The News

I sent Claude Opus 5 "-" and it wrote me 5k tokens about a cartographer - Austin's Nerdy Things

AI>In The News

Why models write slop: the environments are too small

General Dev>In The News

Idempotency Fundamentals & API Guarantees

General Dev>In The News

Machines will never understand language

General Dev>In The News

CSV Is Never Just CSV

General Dev>In The News

The Half We Don't Measure

AI>In The News

React Native v0.87.0-rc.3 released!

Hybrid>Official News

React Native v0.86.2 released!

Hybrid>Official News

Erlang OTP-29.0.4, OTP-28.5.0.4 and OTP-27.3.4.15 released!

Backend>Official News

After the MVP is done, how to gain visibility and customers?

Frontend>Chat

Rohboter — Discover, Compare & Finance Commercial Robots

General Dev>In The News

Giving Money Away Can Be Harder than Making It

General Dev>In The News

Ghosted After a Job Interview? Report the Company

General Dev>In The News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

CommunityNews

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

Where Next?

Popular Ai topics

AI Teaches Itself Diplomacy

Why cows may be hiding something but AI can spot it

In the metaverse, responsible AI must be a priority

Fake Joe Rogan interviews fake Steve Jobs in an AI-powered podcast

Why AI is still dumb and not scary at all (pt.1)

OpenAI introduces Codex, its first full-fledged AI agent for coding

Switching to Claude Code + VSCode inside Docker

LLM Leaderboard - Comparison of over 100 AI models from OpenAI, Google, DeepSeek & others | Artificial Analysis

Impeccable: Design skills for AI coding tools

Why I Cancelled Claude: Token Issues, Declining Quality, and Poor Support

Other popular topics

The Pragmatic Programmer, 20th Anniversary Edition

Standing Desks

Poll: Which code editor do you use?

Erlang's not installing on macOS Big Sur "You are natively building Erlang/OTP for a later version of MacOSX than current version"

Programming Phoenix LiveView

Warp—The blazingly fast, Rust-based terminal

Data Structures and Algorithms with Elixir

Jan: An open source alternative to ChatGPT that runs on the desktop

Programming Clojure, Fourth Edition

Grand Theft Auto: Vice City | DOS games in browser

Sponsor Spotlight

AI>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta