
Discover more from Chace’s Newsletter
Do NHLe Models Under Rate Pro League Prospects?
Does League Quality Matter Independent of League Adjusted Scoring
One thing that I really wanted to dive into while exploring public prospect analysis was the debates around league quality. It seems like every year there is some prospect who looks awful in NHLe models because they couldn’t score in a pro league, and there is massive public debate about how the league quality is a signal in and of itself. This year it was Dimitry Simashev scoring 0 points in 18 KHL games, last year it was Brad Lambert scoring 10 points in 49 Liiga games, the examples go back forever. Today I am going to try and find out what signal league quality actually has and if purely scoring-based models miss something.
What is NHLe?
To start, here is a quick explainer of what NHLe actually is and why it is used in prospect analysis. As per usual anyone who is already familiar with NHLe will likely want to skip this section.
NHLe, which stands for NHL Equivalency, is a clever concept that has existed for a long time. It solves a complex problem in prospect analysis where different prospects often compete against very different competition levels.
For example, in the 2023 NHL draft, top prospects include Connor Bedard and Leo Carlsson. Bedard scored like 2.5 points per game in the regular season, while Carlsson only scored a little over half a point a game. The problem is Bedard put up his numbers in Canada’s second-best league for teenagers, while Carlsson played in a Swedish pro league, probably the third-best hockey league in the world.
So who was more impressive last year? Well, this is where we turn to NHLe to directly comparison of their point totals. NHLe models like Max’s do a lot of math to help us know that estimate, on average, when players jump from the WHL to the NHL, they score about 3 NHL points for every 20 WHL points they scored. On the contrary, when jumping from the SHL to the NHL, they tend to score a little over half as much as they did in the SHL. The exact translation factors Max calculated were 0.15 for the WHL and 0.62 for the SHL. So for the exact calculations, we can multiply each player’s points per game in their league by their translation factors to get their NHLe.
Bedard —→ 2.51 WHL PPG * 0.15 = 0.38 NHLe Per Game
Carlsson —→ 0.57 SHL PPG * 0.62 = 0.35 NHLe Per Game
Now the gap between the two players has closed dramatically because Carlsson was playing in such a strong league, but Bedard still remains the higher-scoring prospect. This kind of analysis naturally comes up a lot when discussing the draft because of the massive differences in league quality.
Do NHLe Models Under Rate Pro League Prospects?
Now that everyone is up to speed on the acronyms we can try and answer the question, do NHLe models underrate pro league prospects? To do this I have picked out the top professional leagues that prospects often play in and defined them as “Top Pro” leagues. These leagues were the KHL, SHL, Liiga, NL, Czech and the DEL, as well as the “Russia” league that preceded the KHL. Flaws with NHLe models is a topic I will be digging much further into now that I have the data and all our systems in place, and this is a nice place to start.
So with our “top pro” prospects defined let’s start comparing how our top pro cohort compared to the rest. Remember NHLe adjusts league scoring for the difficulty of that league, so technically no NHLe value should be worth more than any other. To start, here is how likely each cohort was to make the NHL, grouped by position. Note I have defined making the NHL as playing at least 82 NHL games. It turns out, pro league prospects have historically been more likely to make the NHL than other prospects. Specifically low-scoring pro league prospects vs. low-scoring junior prospects.
The interpretation of scatter plots is not nearly as intuitive when dealing with categorical variables (either you played 82 NHL games or you didn’t) so here is what you are looking at. At an NHLe of 0, the defender line sits about about 0.25 for the top pro league prospects, and about 0.1 for other prospects. This means about 25% of the lowest-scoring defensive prospects in pro leagues made the NHL, but only 10% of the lowest-scoring prospects from other leagues made the NHL. This logic continues down the trendline. Here is the same plot with the error bars for those interested.
Also, note the error bars tend to be larger for forwards than defenders. This happens for a few reasons. The first is points are a better indicator of forward value than defender value so they are more highly correlated with NHL outcomes. The second is that more forwards get drafted so we have a larger sample size to draw from. The final reason is that forwards tend to score more, so there is more independent variation in the scoring rates of forwards (the gap between a good forward and a bad forward is larger than the gap between a good defender and a bad defender), which makes statistical analysis easier.
So do NHLe models underrate pro-league prospects? We need more data to be sure because the error bars are huge. This is an issue but draft analysis is fun because you have to do your best to make decisions based on incomplete information. With that in mind the data we do have does suggest that on average yes, in the past prospects from pro leagues have been better bets to make the NHL than prospects outside of these top pro leagues, but only when looking at the low NHLe prospects. As scoring increases the hit rates have been equal (or even lower for pro league prospects). We would like more data on this subject but this is how the results have been in the past in the small sample.
While the error bars are huge so there must be caution in this interpretation, I am willing to bet this trend will continue as we get more and more data. Eventually, I think this relationship will be robust with a larger dataset. There are two key reasons why I think this finding makes sense, one is a hockey reason and the other is a statistical reason.
The hockey reason is that the low-scoring prospects in pro leagues are more likely to be playing in really, really poor situations relative to their peers who have the same NHLe scoring rate in a junior league. Maybe the prospect struggling in the pro league is not getting much time on ice, or their teammates are very poor (relative to the league quality). It tracks that if a professional coach is playing you on their roster this young, you probably already have many of the skills needed to make the NHL even if you have been unable to produce high NHLe rates. Whereas someone with an equally poor NHLe in a junior league is less likely to have those skills.
Then there is a statistical reason that has to do with the sample size of the underlying data. To produce an elite NHLe in a junior league a prospect must often score 100 plus points. To produce the same NHLe in pro leagues prospects can sometimes score as little as 20 or 30 points. The number of good events required to swing an NHLe value massively is much smaller in a pro league.
As a result, what are the odds a player who has only scored a couple of points in a pro league is actually a better prospect than their scoring rate implies? Well the difference between good and mediocre may only be a couple crossbars and a few whiffed shots by their teammates for the pro league prospect. On the other hand, the difference between good and mediocre may be like 20 crossbars and many whiffed shots by their teammates for the junior league prospect. Obviously both of these things can happen, but it is significantly more likely for a few instances of bad luck to happen than many instances. So it’s more likely that a prospects “true talent” is above their poor scoring rate in a pro league.
It’s this combination of reasons that makes me fairly certain this trend will hold as we collect more and more data. We have observed the trend in what limited data we do have, and the trend makes intuitive sense at the hockey and statistical levels. I should also note that this general trend has been true with different thresholds as well. This time let’s analyze prospects not just based on whether they made the NHL and had a points per game above the median at their position once in the NHL. Here we see a similar trend.
And then the same thing with the error bars:
New threshold, same general trend, same problem. When we raise the bar beyond just making the NHL, are pro league prospects outperforming their NHLe values? Technically yes but again we need more information to be sure if this relationship is noise or not. That being said the logic is similar to what we saw before. Maybe once there is more data we will look back and I will be wrong about this. People are fooled by randomness all the time and it is entirely possible that is what is happening to me here. But I still think what we are seeing makes sense. If two players have a poor NHLe value, the one who performed poorly in the pro league is likely the better prospect. They have been more likely to make the NHL, and become productive players once there, at least in a small sample historically.
Another fascinating finding from this is that the relationship between scoring as a prospect and becoming a high-scoring NHL player is really, really noisy with pro league defenders. For every other cohort, there is a clear and confident relationship between scoring as a prospect and becoming a high-scoring NHL player. With pro-league defenders, there is still an upward-sloping trendline but the error bars are incredibly large. So if there is a cohort of players NHLe models are most likely to be “wrong” about, it looks like that cohort is the pro league defenders. We should keep this in mind when discussing prospects like Dimitri Simashev who struggled to produce in the KHL. Scoring-based models may lead you astray anywhere, but they are most likely to lead you astray when using them to predict defensive prospects who play in a pro league.
Are NHLe Models ‘Wrong’?
The obvious next question is why might these relationships exist. As I mentioned above I think these results make intuitive sense and don’t necessarily mean there is anything wrong with the NHLe translations themselves. Max, Thibaud Chatel, and Patrick Bacon all have NHLe estimates very highly correlated with each other (The R squared between Max’s estimates and Bacon’s is like 0.98) I don’t think the NHLe estimates are flawed even despite looking at this. It would be really weird if everyone independently came to the same wrong conclusion.
This is something that could (and probably will be) its own article, but I think the statistical properties of scoring in pro vs. junior leagues mean this is likely to occur even using a perfect NHLe model. This is just my intuition though, I could be wrong, but I don’t think this article is actually finding a flaw in public NHLe estimates.
Conclusion
So do NHLe models underrate pro-league prospects? There is not enough information to be sure. This is where statistics can become more of an art than a science. Despite the evidence being inconclusive I do think that NHLe models are likely to underrate pro-league prospects who have poor NHLe values. Historically these low NHLe pro league prospects have been more likely to make the NHL and become productive players once there than equally low NHLe prospects in junior leagues. The sample of this finding is small enough that I may very well be reading into noise here, but the findings make intuitive sense on every level to me so I think when somebody revisits this question in a few years with more data that is what they are going to find. This will be a fun one to look back on.