CommunityNews

CommunityNews

Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing

We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. We evaluate ten cybersecurity professionals alongside six existing AI agents and ARTEMIS, our new agent scaffold, on a large university network consisting of ~8,000 hosts across 12 subnets. ARTEMIS is a multi-agent framework featuring dynamic prompt generation, arbitrary sub-agents, and automatic vulnerability triaging. In our comparative study, ARTEMIS placed second overall, discovering 9 valid vulnerabilities with an 82% valid submission rate and outperforming 9 of 10 human participants. While existing scaffolds such as Codex and CyAgent underperformed relative to most human participants, ARTEMIS demonstrated technical sophistication and submission quality comparable to the strongest participants. We observe that AI agents offer advantages in systematic enumeration, parallel exploitation, and cost – certain ARTEMIS variants cost $18/hour versus $60/hour for professional penetration testers. We also identify key capability gaps: AI agents exhibit higher false-positive rates and struggle with GUI-based tasks.

Read in full here:

Where Next?

Popular Ai topics Top

New
First poster: bot
NVIDIA Doubles Down: Announces A100 80GB GPU, Supercharging World’s Most Powerful GPU for AI Supercomputing. SC20—NVIDIA today unveiled ...
New
First poster: jss
We are in the middle of an AI boom. Machine Learning experts command extraordinary salaries, investors are happy to open their hearts and...
New
First poster: bot
DeepMind AI predicts incoming rainfall with high accuracy. Having flexed its muscles in predicting kidney injury, toppling Go champions ...
New
First poster: bot
Language technology powered by AI can perpetuate bias if we are not careful. We need to be sure that language AI is trained to be ethical...
New
First poster: CommunityNews
A simple algorithm that revolutionizes how neural networks approach language is now taking on image classification as well. It may not st...
New
First poster: bot
AI Wrote and Performed a Jerry Seinfeld Routine!. I used GPT-3 to write a Jerry Seinfeld stand-up routine about cats - and then used Dee...
New
First poster: bot
Chri Besenbruch, CEO of Deep Render, sees many problems with the way video compression standards are developed today. He thinks they aren...
New
First poster: bot
AI and the Future of Pixel Art. Creative industries are undergoing a 0 to 1 moment. If you didn’t know, now you do. The impact that AI w...
New
AstonJ
This is cool! DEEPSEEK-V3 ON M4 MAC: BLAZING FAST INFERENCE ON APPLE SILICON We just witnessed something incredible: the largest open-s...
New

Other popular topics Top

PragmaticBookshelf
Ruby, Io, Prolog, Scala, Erlang, Clojure, Haskell. With Seven Languages in Seven Weeks, by Bruce A. Tate, you’ll go beyond the syntax—and...
New
ohm
Which, if any, games do you play? On what platform? I just bought (and completed) Minecraft Dungeons for my Nintendo Switch. Other than ...
New
Rainer
My first contact with Erlang was about 2 years ago when I used RabbitMQ, which is written in Erlang, for my job. This made me curious and...
New
AstonJ
Just done a fresh install of macOS Big Sur and on installing Erlang I am getting: asdf install erlang 23.1.2 Configure failed. checking ...
New
AstonJ
I have seen the keycaps I want - they are due for a group-buy this week but won’t be delivered until October next year!!! :rofl: The Ser...
New
AstonJ
If you are experiencing Rails console using 100% CPU on your dev machine, then updating your development and test gems might fix the issu...
New
DevotionGeo
The V Programming Language Simple language for building maintainable programs V is already mentioned couple of times in the forum, but I...
New
PragmaticBookshelf
Build efficient applications that exploit the unique benefits of a pure functional language, learning from an engineer who uses Haskell t...
New
CommunityNews
A Brief Review of the Minisforum V3 AMD Tablet. Update: I have created an awesome-minisforum-v3 GitHub repository to list information fo...
New
PragmaticBookshelf
A concise guide to MySQL 9 database administration, covering fundamental concepts, techniques, and best practices. Neil Smyth MySQL...
New