createos

createos

How are you handling the full deployment lifecycle for AI workloads in production?

Curious how other teams are approaching this.

Building an AI app used to mean picking a model and writing product logic. Now it means also picking a hosting provider, wiring up a monitoring tool, and at some point figuring out billing. Three separate systems, each with their own failure modes, each needing maintenance.

The pattern I keep seeing: teams ship something that works in staging, then spend the next month firefighting the infrastructure around it. A monitoring alert lags the actual incident by 10+ minutes. The billing integration breaks when usage spikes. The hosting layer that worked for a prototype can not handle real traffic.

Some specific questions for anyone running AI workloads in production:

  • Are you managing hosting, monitoring, and billing as separate systems or have you consolidated them?

  • If separate, how much engineering time per week goes into keeping those integrations running vs. building the actual product?

  • Have you looked at managed execution layers as an alternative to self-building this stack?

We ran into this problem ourselves while building CreateOS (createos.sh), which ended up being our answer to it. But I am more interested in how others are solving it, or whether the problem is even the same across different team sizes.

Where Next?

Popular General Dev topics Top

AstonJ
Split from a thread about Serverless architectures. Serverless/cloud from Wikipedia: Serverless computing is a cloud computing execut...
New
KnowledgeIsPower
I had written an article about Migrate a K3S cluster storage from Rook to OpenEBS, with Velero Please give comment or feedback, thanks!
New
KnowledgeIsPower
I had written an article about Using mcrouter and memcached as caching layer for Thanos Store Please give comment or feedback, thanks!
New
New
avinashmeena
What are the benefits of using Linux hosting for small businesses? How does Linux hosting compare to other hosting solutions in terms of ...
New
KnowledgeIsPower
I had written an article about the Leaky Vessels on K3S. Please give comment or feedback, thanks!
New
akinihsan
I am trying to use following kubernetes ingress service to host dashboard in the server. it doesnt host the administration dashboard. How...
New
akinihsan
Hello Do you use static code analysiss tool on your ci cd pipeline? Which one do u use and what are the benefits for you
New
tasaraskam
Hey everyone, I’m a DevOps engineer and I noticed that most developers either don’t know DevOps or hate setting up CI/CD and Docker. So...
New
runmyjob
Hi, We’re building EU-based CI runners with load-based billing at RunMyJob and I’d love to benchmark them against real-world CI setups. ...
New

Other popular topics Top

ohm
Which, if any, games do you play? On what platform? I just bought (and completed) Minecraft Dungeons for my Nintendo Switch. Other than ...
New
AstonJ
You might be thinking we should just ask who’s not using VSCode :joy: however there are some new additions in the space that might give V...
New
AstonJ
Just done a fresh install of macOS Big Sur and on installing Erlang I am getting: asdf install erlang 23.1.2 Configure failed. checking ...
New
rustkas
Intensively researching Erlang books and additional resources on it, I have found that the topic of using Regular Expressions is either c...
New
AstonJ
We’ve talked about his book briefly here but it is quickly becoming obsolete - so he’s decided to create a series of 7 podcasts, the firs...
New
PragmaticBookshelf
Author Spotlight Jamis Buck @jamis This month, we have the pleasure of spotlighting author Jamis Buck, who has written Mazes for Prog...
New
husaindevelop
Inside our android webview app, we are trying to paste the copied content from another app eg (notes) using navigator.clipboard.readtext ...
New
CommunityNews
A Brief Review of the Minisforum V3 AMD Tablet. Update: I have created an awesome-minisforum-v3 GitHub repository to list information fo...
New
sir.laksmana_wenk
I’m able to do the “artistic” part of game-development; character designing/modeling, music, environment modeling, etc. However, I don’t...
New
AstonJ
If you’re getting errors like this: psql: error: connection to server on socket “/tmp/.s.PGSQL.5432” failed: No such file or directory ...
New