CommunityNews

Real-time LLM Inference on Standard Datacenter GPUs (3,000 tokens/s per request)

Today, Kog AI launches a tech preview of the Kog Inference Engine (KIE): 3,000 output tokens/s per request on 8× AMD MI300X GPUs and 2,100 on 8× NVIDIA H200 (FP16, no speculative decoding). This preview runs a 2B model, with support for large third-party MoE models coming next at similar speeds.

Read in full here:

https://blog.kog.ai/real-time-llm-inference-on-standard-gpus-3-000-tokens-s-per-request/

View thread on forum

#llm

0 2 0

2026-05-30 15:56:05 UTC

Where Next?

View thread on forum

llm

Home AI>In The News

#llm

0 2 0

Last post

Popular Ai topics

AI>In The News

Combating Anti-Blackness in the AI Community

In response to a national and international awakening on the issues of anti-Blackness and systemic discrimination, we have penned this pi...

devinguillory.com

#community /diversity

0 1449 1

2021-01-31 21:13:15 UTC

New

AI>In The News

AI Can Generate Convincing Text–and Anyone Can Use It

SOME OF THE most dazzling recent advances in artificial intelligence have come thanks to resources only available at big tech companies, ...

wired.com

0 1576 0

2021-04-01 00:20:44 UTC

New

AI>In The News

Unveiling our new Quantum AI campus

Within the decade, Google aims to build a useful, error-corrected quantum computer. This will accelerate solutions for some of the world’...

blog.google

#google #quantum #tech-giants

0 1159 0

2021-05-20 19:30:14 UTC

New

AI>In The News

Nvidia R&D chief on how AI is improving chip design

Getting a glimpse into Nvidia’s R&D has become a regular feature of the spring GTC conference with Bill Dally, chief scientist and se...

hpcwire.com

#nvidia #design

0 1188 0

2022-04-20 14:08:47 UTC

New

AI>In The News

How to fix the eyes in AI-generated images

aidemos.info

0 4508 0

2022-09-10 13:54:33 UTC

New

AI>In The News

AI video just took a startling leap in realism. Are we doomed?

Google’s Veo 3 delivers AI videos of realistic people with sound and music. We put it to the test.

arstechnica.com

#video #veo

10 695 8

2025-06-10 13:52:30 UTC

New

AI>In The News

Claude Code is My Computer | Peter Steinberger

I run Claude Code with --dangerously-skip-permissions flag, giving it full system access. Let me show you a new way of approaching comput...

steipete.me

#code

0 914 0

2025-06-04 04:26:28 UTC

New

AI>In The News

AI Changes Everything

From fear to optimism: why I am convinced AI is worth embracing.

lucumr.pocoo.org

4 854 5

2025-07-10 05:21:15 UTC

New

AI>In The News

Claude-code - native LSP support

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing rout...

github.com

#code #changelog #claude

0 1 0

2025-12-23 13:53:12 UTC

New

AI>In The News

Rob Pike (co-creator of Go) on GenAI

Fuck you people. Raping the planet, spending trillions on toxic, unrecyclable equipment while blowing up society, yet taking the time to ...

skyview.social

/go #claude #bluesky

5 268 5

2026-01-01 22:01:26 UTC

New

Other popular topics

General Dev>Dev Chat

What dev-related stuff have you been up to?

Reading something? Working on something? Planning something? Changing jobs even!? If you’re up for sharing, please let us know what you’...

#community

1063 23582 394

2026-07-13 02:04:09 UTC

New

Game Dev>Learning Resources

Hands-on Rust: Effective Learning through 2D Game Development and Play

Rust is an exciting new programming language combining the power of C with memory safety, fearless concurrency, and productivity boosters...

pragprog.com

#pragprog /rust #published-book /book-hands-on-rust

117 10879 30

2024-11-09 13:24:20 UTC

New

General Dev>Code Editors

Dendron: a personal knowledge management tool on top of VSCode

/vscode #visual-studio-code

30 8077 9

2021-05-05 12:15:29 UTC

New

General Dev>Hardware

Planck vs Preonic vs Subatomic (Keyboards)

I ended up cancelling my Moonlander order as I think it’s just going to be a bit too bulky for me. I think the Planck and the Preonic (o...

/keyboards #mechanical-keyboards #ortholinear #planck #preonic

105 17596 47

2021-05-28 21:32:35 UTC

New

Linux>Chat

RancherOS is in end of life

Oh just spent so much time on this to discover now that RancherOS is in end of life but Rancher is refusing to mark the Github repo as su...

#linux #rancheros

10 6358 6

2021-01-30 21:04:03 UTC

New

General Dev>Dev Chat

PragProg’s Medium Posts

Hello everyone! This thread is to tell you about what authors from The Pragmatic Bookshelf are writing on Medium.

#pragprog #blog-post

1147 29994 760

2025-07-10 13:36:16 UTC

New

General Dev>Dev Chat

Warp—The blazingly fast, Rust-based terminal

A few weeks ago I started using Warp a terminal written in rust. Though in it’s current state of development there are a few caveats (tab...

/rust #terminal

52 6785 22

2025-02-26 17:47:24 UTC

New

Backend>Chat

Data Structures and Algorithms with Elixir

This is going to be a long an frequently posted thread. While talking to a friend of mine who has taken data structure and algorithm cou...

/elixir #algorithms #data-structures

108 11869 31

2024-11-14 02:14:00 UTC

New

Windows>Chat

Taskbar Overflow Menu (NOT System Tray Overflow)

There appears to have been an update that has changed the terminology for what has previously been known as the Taskbar Overflow - this h...

#taskbar-overflow-win-11

3 3715 2

2023-02-13 08:43:55 UTC

New

Backend>Learning Resources

Engineering Elixir Applications

Develop, deploy, and debug BEAM applications using BEAMOps: a new paradigm that focuses on scalability, fault tolerance, and owning each ...

pragprog.com

#pragprog /elixir #published-book /book-engineering-elixir-applications

40 7136 21

2024-11-08 15:13:02 UTC

New

AI>In The News

Zig Creator Calls Spade a Spade, Anthropic Blows Smoke

AI>In The News

Old and new apps, via modern coding agents

AI>In The News

AI 2040 and the Cult of Intelligence

AI>In The News

Are Scientists Sacrificing Originality for Speed With the Use of AI?

AI>In The News

AI Can't Recreate Thrust (But It Can Help You Understand It)

AI>In The News

Don't Go Quietly Into the AI Night

AI>In The News

How Version Control Will Evolve for the Agent Boom

AI>In The News

A new way to reflect on how you use Claude

AI>In The News

I Think I Have LLM Burnout

AI>In The News

What's really slowing down the AI buildout

AI>In The News

AI In The News ❯

Latest on Devtalk

Forget shadow maps: a new method for high-quality real-time shadows

Game Dev>Chat

Red Hat will support your RHEL forever now - for a price

Linux>In The News

Introducing Precursor: detecting agentic behavior with continuous client-side signals

General Dev>In The News

Exploring LiveView 1.2

Backend>Learning Resources

Kotlin v2.4.10 released!

Backend>Official News

Fable 5.8.1 released!

Frontend>Official News

Dell sued by Finnish company over $70m price increase for data centre servers

General Dev>In The News

Salience-Driven Development

General Dev>In The News

The death of open channels

General Dev>In The News

Zig Creator Calls Spade a Spade, Anthropic Blows Smoke

AI>In The News

Are you telling me a readonly property is wrecking my performance?

General Dev>In The News

Old and new apps, via modern coding agents

AI>In The News

V 0.5.2 released!

Backend>Official News

AI 2040 and the Cult of Intelligence

AI>In The News

Are Scientists Sacrificing Originality for Speed With the Use of AI?

AI>In The News

Devtalk ❯

We ❤️ helpful members!

We reward our most helpful members via our MOTM scheme - by giving away a whopping 25 books per year!

Sub Categories:

We're in Beta

About us Mission Statement See our Roadmap

Real-time LLM Inference on Standard Datacenter GPUs (3,000 tokens/s per request)

CommunityNews

Real-time LLM Inference on Standard Datacenter GPUs (3,000 tokens/s per request)

Where Next?

Popular Ai topics

Combating Anti-Blackness in the AI Community

AI Can Generate Convincing Text–and Anyone Can Use It

Unveiling our new Quantum AI campus

Nvidia R&D chief on how AI is improving chip design

How to fix the eyes in AI-generated images

AI video just took a startling leap in realism. Are we doomed?

Claude Code is My Computer | Peter Steinberger

AI Changes Everything

Claude-code - native LSP support

Rob Pike (co-creator of Go) on GenAI

Other popular topics

What dev-related stuff have you been up to?

Hands-on Rust: Effective Learning through 2D Game Development and Play

Dendron: a personal knowledge management tool on top of VSCode

Planck vs Preonic vs Subatomic (Keyboards)

RancherOS is in end of life

PragProg’s Medium Posts

Warp—The blazingly fast, Rust-based terminal

Data Structures and Algorithms with Elixir

Taskbar Overflow Menu (NOT System Tray Overflow)

Engineering Elixir Applications

Sponsor Spotlight

AI>In The News

Latest on Devtalk

We ❤️ helpful members!

Devtalk Sponsors

Categories:

Sub Categories:

Popular Portals

Devtalk Sponsors

We're in Beta