How not to fail building an AI product
To start, let’s talk a little bit about my background. I learned to program soon after I turned 11, and haven’t slept since. I started programming semi-professionally at 14, and incorporated my first startup at 15. I worked at a leading cloud provider where I co-developed an AI inference pipeline that processed billions of tokens per day, and now work for Gensyn helping to build AI-enabled products. AI is going to continue to be an important facet of life - so let’s talk about how it’s used.
AI at Gensyn - CodeAssist #
CodeAssist, the product for which I acted as a tech lead, is a completely local, private AI assistant that works with you as you code. We centered on LeetCode-style problems, where the assistant would learn how you like to be interacted with, and teach itself to support you in that way.
CodeAssist presents you with programming challenges, where you can collaboratively work with an AI to complete the problems. Then, you can train your AI locally, without any data leaving your computer, teaching it how you like to be interacted with. Then, you can test your knowledge - and your newly trained AI - against new problems and watch it improve.
CodeAssist is built with a backing LLM - specifically Qwen 2.5 - that supplies the actual code. The novel piece we built is an “action selection model” that, looking at the current state of the file, learns to instruct the LLM. This means that the model you train learns that you like it when the AI comments your code, for example, or it could learn that you like suggestions for completing the line you’re on.
As I led developing CodeAssist, we had a few core tenets that we followed (mostly intuitively).
- It must run on low-power end-user devices
- It should support as many environments as reasonable
- It should do something new and interesting in the research side
- It should be dead-simple to run
Focusing on the user #
Naturally, we had to keep the user in mind for the entire process. Our user base is mixed, but the common denominator is that not everyone has massively powerful devices. Some of our users run on as little as 8GB of RAM, and we had to support that. The product would be more technically capable if we forced it to run on only larger devices, or used a cloud offering for the LLM. However, this does not enable the goal of supporting fully-local, fully-private AI computing.
We also require Docker for running CodeAssist. While this may seem like a heavy dependency for an end-user, it’s actually not as bad as it seems. A user can follow any of the thousands of guides, posts, and knowledge-base articles on how to install Docker on their system. Then, we can have them run very simple scripts that automatically install the containers, which makes for a very small surface area that we have to support. Simply put, we only need to develop the guide for running our one script, and don’t need to talk about the edge cases of supporting Docker in our users’ environments.
So, with AI in mind, let’s talk about how to actually build a product - and a company - around AI.
The goal of a company #
Let’s assert some simple goals. When a startup (or any company, really) is founded, usually, it has some number of these goals:
- Solve a problem encountered by others
- Grow a team and its expertise on the problem domain
- Remain in-business for an extended period of time
- Generate revenue
- Make the founders (and investors) rich
Startups #
Usually, startups are a little more focused in their goals. They want to grow quickly, gain investment, achieve product-market fit, and (optionally) exit. Let’s talk a little more about that third item, the “product-market fit.”
Product-market fit is defined by Wikipedia as “the degree to which a product satisfies a strong market demand."[1] This means that people will actually want to buy the thing the startup is building, and the startup is able to generate interest and revenue from it.
The MVP #
When starting a company, the single most common failure I have seen is the founders focusing too much on a single topic within their startup, and not keeping the “big picture” in view. They focus on the engineering, building what “they want” to build, and not what the market wants. I’ve fallen into this trap, too - I love solving difficult engineering problems, so the challenge becomes making sure I’m focusing on the right engineering problem.
This is why the MVP is so important. In a startup, you often want to perform the minimum amount of work possible to develop a product before you obtain feedback. This way, you can learn quickly what ideas are good, and what are bad.
Before-customers and after-customers #
When starting a company, you really have two companies that you build. You have the earlier, before-customer company, which builds out what you think a customer wants. Then, you have the later, after-customer (once you’ve achieved your first sale) company, which builds out what the customer actually wants. This is why your MVP is so important. The sooner you talk to a customer, the sooner you build the product that makes you rich.
The AI MVP #
Getting into the realm of AI, now, one of the most important things about being an AI company is having your “special sauce.” Whether that’s a special prompt to GPT-5, a fine-tune, or even a fully custom model, you need to have something that makes your company stand out above others.
This can be expensive.
Challenges with AI #
There are three core challenges with an AI startup, that are different from others.
Challenge 1: Hardware cost #
When it comes to getting your model trained, or inference performed, you need to have somewhere to run the model. If you’re built on top of an existing AI provider, such as OpenAI, Anthropic, Google, or others, then you pay for the hardware as any other consumer would - it’s built into the pricing of their offering.
However, if you’re running the hardware yourself - either through a cloud provider like AWS, GCP, Azure, CoreWeave or Lambda, or you’re purchasing the hardware directly from Dell, SuperMicro, or Nvidia - you have to put forward significant capital to acquire the compute needed to support your startup. Large deals are often fully-upfront or partially-upfront. This is because demand is so high, these companies can effectively charge whatever rate they want for reliable infrastructure.
Challenge 2: Hardware reliability and availability #
The last sentence of the previous paragraph uses a very important word - reliable infrastructure. While I can’t say the exact failure rate for hardware I’ve worked with, it was significantly higher than you would expect. These chips fail. Meta released a paper that described a hardware failure every three hours for a 16k GPU cluster.[2] This means that they were able to only train models for three hours before a GPU would fail, and they would lose the progress they had made since their last checkpoint.
Side note: Checkpoints #
When a model is training, it is a very good idea to save your progress every so often. This means writing the current state of the model in-memory to disk, which is time that you’re not spending training. This is similar to saving your progress in a video game.
Challenge 3: Hardware orchestration #
The last challenge is how you actually access the hardware. There are a couple major routes to be taken - Slurm and Kubernetes. There are newer libraries and tools coming out by major players, but these are the two most well-known. They’re also operationally challenging. You can have entire teams of people dedicated to running these clusters. Even more so than your typical DevOps cluster. You need not just engineers who manage the underlying infrastructure, the hardware, the OS, and the orchestration, but you also need engineers to manage the performance tuning of the kernel, managing the NUMA and PCIe topologies for any virtualization being performed, along with managing any kernel or hardware bugs you may encounter (which were surprisingly common to me).
The AI expenses #
AI startups can be some of the most capital-intensive startups to build. When building an AI startup, you have two primary options: You can either utilize someone else’s infrastructure (GPT-5 from OpenAI, Claude from Anthropic, etc.), or you can build the infrastructure yourself. Let’s talk about these two options.
But there’s one key point I want to make up front: Don’t build something you’re not selling.
Somebody else’s infrastructure #
If your company is based on a special prompt, proprietary data, or some other magic that you have performed, but uses infrastructure that you don’t control, you’re going to be competing against the big players who can out-spend you in an instant. You should be as scrappy as possible, and fight for wins where you can get them - but don’t focus on losses too hard. Improve your product, but don’t sell to those who won’t buy.
Your infrastructure #
The other extreme is to own the hardware, top-to-bottom. This is immensely expensive upfront, and has significant maintenance and operational challenges. In the same vein of the point made earlier, if you want to get really good at building infrastructure, you should become an infrastructure provider, instead of selling an application to end-users.
The happy middle #
If you’re going to build a startup, rent the hardware from someone who knows what they’re doing. If you think you can do it all yourself, entirely in-house - you’re wrong. Infrastructure providers aren’t incentivized to out-spend you and swallow your target marked. Similarly, they can’t analyze your request patterns to build a competitor. They are better than you at their job, so be better than them at yours.
You will still have hardware failures. You will still need to know how to run a training job. However, with products like a managed Kubernetes cluster - available from various cloud providers - you don’t need to know how IOMMU groups affect the performance of inter-CPU communication. The defaults of most of these providers are well-tuned for running ML workloads, and you can get 90% of the way there with very little effort.
Don’t build what you don’t sell #
I want to drive this point home. If you are really good at building something, then sell it. If you’re not planning on selling it, there’s no sense in a small company becoming experts in two things at the same time. Build what you’re selling, and rely on others for everything else. There’s a reason nobody except Google builds their own payroll provider. Don’t build your own payroll provider - unless you’re becoming a payroll provider company.
Try CodeAssist #
If you want to try out the project, you can visit this link to view the guide. If you want to learn more, you can join our Discord. If you want to do an AI startup, remember some key things:
- Don’t build what you don’t sell
- Build your MVP as scrappily as possible
- Don’t get stuck on the technical challenges
- Talk to your customers both before and after the sale
- The infrastructure is not the easy part