CommunityNews

CommunityNews

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold, on a large university network consisting of ~8,000 hosts across 12 subnets. ARTEMIS is a multi-agent framework featuring dynamic prompt generation, arbitrary sub-agents, and automatic vulnerability triaging. In our comparative study, ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate and outperforming 9 of 10 human participants. While existing scaffolds such as Codex and CyAgent underperformed relative to most human participants, ARTEMIS demonstrated technical sophistication and submission quality comparable to the strongest participants. We observe that AI agents offer advantages in systematic enumeration, parallel exploitation, and cost – certain ARTEMIS variants cost $18/hour versus $60/hour for professional penetration testers. We also identify key capability gaps: AI agents exhibit higher false-positive rates and struggle with GUI-based tasks.

Read in full here:

Where Next?

Popular Ai topics Top

New
First poster: jacobtriton
Why AI is Harder Than We Think. Since its beginning in the 1950s, the field of artificial intelligence has cycled several times between...
New
First poster: CommunityNews
Google has unveiled a tool that uses artificial intelligence to help spot skin, hair and nail conditions, based on images uploaded by pat...
New
New
First poster: CommunityNews
Steve Blank Artificial Intelligence and Machine Learning– Explained. Artificial Intelligence is a once-in-a lifetime commercial and defe...
New
First poster: ozornin
They are among 400 artists appealing to Sir Keir Starmer, saying creative industries are threatened.
New
First poster: mercyf
It replicates your development environment and takes up to 30 minutes per task.
New
gfqdjb
With all the AI buzz around coding assistants, and being a bit concerned about being dependent on third-party cloud providers here, I dec...
New
CommunityNews
Study shows how patterns in LLM training data can lead to “parahuman” responses.
New
First poster: jkdiaz
TechCrunch spoke to experienced coders about their time using AI-generated code about what they see as the future of vibe coding.
New

Other popular topics Top

DevotionGeo
I know that these benchmarks might not be the exact picture of real-world scenario, but still I expect a Rust web framework performing a ...
New
siddhant3030
I’m thinking of buying a monitor that I can rotate to use as a vertical monitor? Also, I want to know if someone is using it for program...
New
brentjanderson
Bought the Moonlander mechanical keyboard. Cherry Brown MX switches. Arms and wrists have been hurting enough that it’s time I did someth...
New
AstonJ
I have seen the keycaps I want - they are due for a group-buy this week but won’t be delivered until October next year!!! :rofl: The Ser...
New
Margaret
Hello everyone! This thread is to tell you about what authors from The Pragmatic Bookshelf are writing on Medium.
1147 29994 760
New
mafinar
This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...
New
PragmaticBookshelf
Author Spotlight Mike Riley @mriley This month, we turn the spotlight on Mike Riley, author of Portable Python Projects. Mike’s book ...
New
New
hilfordjames
There appears to have been an update that has changed the terminology for what has previously been known as the Taskbar Overflow - this h...
New
mindriot
Ok, well here are some thoughts and opinions on some of the ergonomic keyboards I have, I guess like mini review of each that I use enoug...
New