Forum Topics NVDA--view

I thought the below was a thoughtful piece for those following the story without getting too much into the weeds. i agree with the author's general direction and points. boy has there been a lot written on this one!!


AI is Revolutionary, but Investors are Getting Ahead of Themselves

RIHARD JARC

JUN 20, 2024



I decided to share some short thoughts on the semiconductor space and the AI frenzy that we are witnessing.

First, everyone who follows me and this newsletter knows I am one of the firmest believers in AI and the landscape changes it will bring to businesses and our society in the mid-and long term. I think it's a revolutionary technology shift similar to the one the Internet brought upon us.

That being said, right now, looking short-term, I see many investors and fund managers who were late wanting to furiously catch up with some of the names and get exposure in their portfolios.

One of the hottest spaces right now is semiconductors, specifically Nvidia. It has gone so far that Nvidia has become so big that it is not only the bellwether for a basket of companies classified as »AI companies« but also a significant weight in the Nasdaq and the S&P 500.

I wanted to share my quick thoughts on Nvidia and what I expect from the AI market, at least in the short term. But before we discuss Nvidia, we need to understand two basic concepts about AI and the need for chips in this process.

Two key processes are needed for today's LLMs: the computation required to train AI models and the inference workloads that make the AI models run once they are trained.

The important thing with both workloads is the difference. For training, the only real option right now is using a GPU or, more specifically, an Nvidia GPU. As this semiconductor expert put it well:

»When you're talking about training, no one can compete with NVIDIA right now, not only from a chip perspective, in terms of performance and how they have integrated the networking and the NVLink Switch and NVSwitch technology that their proprietary technology of connecting these chips, basically connecting this GPUs multiple thousands of them together.«

source: Alphasense

The answer for inference is different as the use case is less complex, and you have competitors with some reasonable solutions (AMD, hyperscallers with their internal chips, or smaller companies with their custom ASICs).

From the AI perspective, the most significant difference between a GPU and an ASIC is that GPUs are more general and, with them, more versatile. At the same time, an ASIC is more specific and tailored to a specific workload, which can make it better performance/cost-wise for some workloads. Because an ASIC is tailored, it doesn't have general usability, which can limit its usefulness if workloads or the underlying model infrastructure start changing.

Nvidia's business fortress

I won't go into the basics of Nvidia because I hope most of you know that by now, but instead, I want to highlight a few things that I find essential.


Firstly, Nvidia looks very resilient as a business. In my view, it will continue to hold a dominant position in the short and mid-term, at least regarding AI training workloads.

The reasons for Nvidia's dominance are the following:

  1. Their software stack - CUDA. A lot of people mention CUDA but don't explain the reasons behind the moat. Simply put, engineers have learned to code CUDA in school, and it's the default. So now, if you are a company that is building AI models and needs to optimize the silicon, it's already hard to find semiconductor engineers, but most of them know CUDA. So, going for any other option would substantially limit your talent pool. You have some so-called CUDA code converters. Some of them are even open source, but the problem is that they don't fully convert the CUDA code to some other software stack for a different GPU or custom silicon. So even if they convert 80% of the code, you still need kernel engineers to custom engineer the remaining 20%. Again, this means firstly, it is tough to get a kernel engineer right now, and secondly, as this Former Nvidia Researcher explained, the costs of using some other GPU + the custom engineering work that needs to be done to convert the CUDA code is more expensive than just going with an Nvidia GPU.
  2. Supply is very limited, even for alternatives. Because Nvidia has strong relationships with its suppliers, it is hard for companies, even Big Tech, to get enough of their most advanced custom silicon.
  3. While Microsoft, Apple, Google, Amazon, and Meta, all of Nvidia's biggest clients, are developing their own silicon, most of them are still in the early stages of development. Even if they have performance/cost advantages over some Nvidia GPUs right now (like Google’s TPUs), they are mainly used for internal workloads at these companies, as most clients are not familiar with the software stacks of these custom chips (hence the problem: it's not CUDA). The problem is also that the R&D in the semi-industry is not small, and with Nvidia's size and access to capital, the game became a lot harder for others to catch up with every new version of the GPU.

Hyperscalers have learned from the past that it doesn't make sense always to go “full in” into semis:

»In the past, all of these hyperscalers have looked at doing their own Arm processors for their own servers. Some of them are still working on them today. What they found is, in many cases, the ROI for spending a whole design cycle with your design team to develop something that you can readily get off the shelf, that works just as well, if not better, is not worth it. Why would I spend my team doing an Intel chip when I can just go to Intel and buy one, and it works pretty well, or AMD Genoa?«

source: Alphasense

The industry's problem is also not just performance; it's HEAT.

Domain industry expert explains it well.

»I think the biggest piece is probably networking throughput and power. Power because these things all have thermal cutoffs. Your power solution isn't efficient, you're generating tons more heat, which ends up throttling the performance of your chip..

 If you look at a system, let's just say I put in 100 W, just 100 W of power into something and my solution is at 96% efficiency, that means the 4% of efficiency loss is generating heat on the board. That's just how the system works. 1% or 2% of efficiency loss generates a significant amount of heat. Now, if you look at how many boards they have stacked in a server rack, think about the amount of cooling that has to happen, both at the server rack level and at the data center level because all the hot air is in this data center. Cooling becomes a massive problem for everyone.”

source: Alphasense

4.      What clients want clients get

For cloud providers, it is as simple as this Former Nvidia competitor has said:

»If eight out of 10 of your customers that pay for your cloud services now are asking about utilizing time on an NVIDIA machine, and only two of them are asking about utilizing time on an AMD system, then 80% of your workloads are going through the NVIDIA. Even if the NVIDIA is a little more expensive, your market is going to dictate which system do you want to utilize.«

source: Alphasense


Subscribed

Now, despite business fundamentals continuing to look strong for the foreseeable future, here is my problem with Nvidia.

Nvidia's market cap at the time of writing is over $3.4 trillion. It has officially become the biggest public company on earth. Without overcomplicating this because I intended for this to be a short post, here are some of my problems around that $3.4T number:

In his most bullish talks, Nvidia CEO Jensen said that he believes the world currently has $1T of data centers and that the industry will be around $2T in 6 years. He thinks most of the infrastructure will be accelerated computing, like GPUs.

So, let's do some back-of-the-napkin math:

In six years, we will be at $2T, and let's presume that the industry continues to grow at a high clip of 20% annually (+$400B). Let's also assume that an additional $200B (10%) of infrastructure is updated yearly to newer versions.

That means that $600B of revenue is up for grabs for a company like Nvidia. Even if they have an 80% market share (which again could be a problematic assumption since a big part of AI workloads will be inference), they will do $480B in revenue (today, they are at $80B TTM). And, if the margin goes from the high 70s-80s gross margin right now to in the low 70s (which is something that Nvidia's CFO even acknowledges), also keep in mind AMD's gross margin for their AI chips is more in the low 60s. That would mean we would still have a highly monopolistic net income margin of 40% (a rare breed of companies ever achieving this kind of margin). That would mean Nvidia would make $192B in net income. Let's give that a generous 25x P/E ratio, and we will come to $4.8T in market cap.

If we apply a discount rate of 4.5%, which is basically a risk-free interest rate, we get to $3.6T worth today.

So, even with the most optimistic assumptions of the market development for the next six years, we get to $3.6T, and the company is trading at $3.4T. That means a 5.88% return above the risk-free rate. I'll leave it to everyone to come to a conclusion about whether that makes sense as an investment.

We are coming to a key moment for Nvidia as well as the broader space

The last thing I wanted to highlight is a key moment coming up shortly for the industry, in my view. And that is OpenAI's GPT5, which is supposed to come out this year. Here, I share my opinion with a high-ranking GPU data center expert:

» GPT-5 is clearly happening. It's already probably almost close to finishing training, and it will be 10X more expensive than GPT-4 based on all accounts that we have seen. We might find that the GPT-5 is just not as impressive. You spend 10X the money than GPT-4, but if it's only slightly more impressive than GPT-4, then you're basically like, "Well, we have reached a limit of scaling." For every 10X scale, you're getting a 2% increase in performance or 5%...«

source: Alphasense

This presents a problematic moment for the industry, especially for the »picks and shovels« semiconductor companies that have recently risen in value dramatically. If GPT5 is not that much better than GPT4 (4o) and it costs 10x to train, this will cause a cool down in the vicious pace of CapEx spending for many companies. I am not saying that these companies won't invest and buy more GPUs, but this pace won't continue to rise at the clip it is rising today. The industry might even shift towards SLMs (small language models) that are more specialized in specific tasks, which again means a slower pace of demand for the highest and newest GPU at any price.

Now, why am I writing this? First, I find it extremely helpful from a learning perspective to write down my thoughts in moments like this, to circle back and learn from them at a later point in the future. The second thing is that if that moment comes, investors should not panic or lose hope but realize that AI is here to stay. It will be a revolutionary technology that will change the world. Still, at the same time, our human nature tends to lead us to, in the short term, get overenthusiastic or over-negative about specific things, and in my view, this time is no different. Maybe AI will change that someday when it comes to investing; who knows?

I hope you find it useful.

Until next time.

As always, If you liked the article and found it informative, I would appreciate sharing it.

Share


Subscribed

Disclaimer: 

Nothing contained in this website and newsletter should be understood as investment or financial advice. All investment strategies and investments involve the risk of loss. Past performance does not guarantee future results. Everything written and expressed in this newsletter is only the writer's opinion and should not be considered investment advice. Before investing in anything, know your risk profile and if needed, consult a professional. Nothing on this site should ever be considered advice, research, or an invitation to buy or sell any securities.

9