kammy

kammy

Benchmarking AI Design (+ the quirks of AI generated UI right now)

Hi everyone!

The other day I was having a debate with my friends about whether or not the top LLM models are “good at design.” I’d love to hear other people’s thoughts & experiences, as well as what models (if any) people use to help them improve their interfaces if they don’t have access to a designer.

I’m also curious about if anyone has noticed interesting trends across models or within specific models on design. For example, I can’t help but notice DeepSeek & Anthropic models love gradient purple titles & backgrounds, and OpenAI has a serious issue with putting white text on white backgrounds / unreadable colors.

I also built a website where people can compare different models on the same design prompts, creating a crowdsourced leaderboard. Would love to hear thoughts & reactions

Where Next?

Popular Ai topics Top

AstonJ
This video about multi-agent AI is a really nice watch - it only took them a few million tries to master certain strategies - doing much ...
#ai
New
AstonJ
Can you spot the AI generated person in the pic below? ▶ Spoiler Video here:
New
Eiji
Today, I tried to find some information and few times I not only got completely wrong answers, but even fake GitHub links … Every time I ...
#ai
New
AstonJ
I have a feeling we’re going to see a lot of threads about DeepSeek, so have put up a portal for it :003:
New
apoorv-2204
How are you using AI in my life? How the day to day life is changed around you? professional and in personal life? I it use for autocom...
#ai
New
kammy
Hi everyone! The other day I was having a debate with my friends about whether or not the top LLM models are “good at design.” I’d love ...
New
xiji2646-netizen
DeepSeek just released V4 and the pricing is hard to ignore. V4-Flash: $0.28/M output tokens. V4-Pro: $2.19/M. Both with 1M token contex...
New
xiji2646-netizen
Alibaba just opened public API access for HappyHorse 1.0, the model currently ranked #1 on Video Arena’s blind tests. What caught my att...
New
xiji2646-netizen
Cursor cloud agent development This month’s updates: Codex got real Windows sandboxing (May 13) ...
New
xiji2646-netizen
Codex mobile in the ChatGPT app https://techcrunch.com/wp-content/uploads/2026/05/App-view.png?resize=1200,675) Codex shipped a batch o...
New

Other popular topics Top

dasdom
No chair. I have a standing desk. This post was split into a dedicated thread from our thread about chairs :slight_smile:
New
Rainer
My first contact with Erlang was about 2 years ago when I used RabbitMQ, which is written in Erlang, for my job. This made me curious and...
New
New
AstonJ
This looks like a stunning keycap set :orange_heart: A LEGENDARY KEYBOARD LIVES ON When you bought an Apple Macintosh computer in the e...
New
PragmaticBookshelf
Build highly interactive applications without ever leaving Elixir, the way the experts do. Let LiveView take care of performance, scalabi...
New
PragmaticBookshelf
Build efficient applications that exploit the unique benefits of a pure functional language, learning from an engineer who uses Haskell t...
New
AstonJ
Was just curious to see if any were around, found this one: I got 51/100: Not sure if it was meant to buy I am sure at times the b...
New
husaindevelop
Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...
New
mindriot
Ok, well here are some thoughts and opinions on some of the ergonomic keyboards I have, I guess like mini review of each that I use enoug...
New
xiji2646-netizen
Woke up to this today: Claude Code’s complete source code exposed via npm source map. Not a snippet. All 512,000 lines. 1,900 TypeScript ...
New