Strong and Weak Links in NHL Line Combinations
Revisiting Strong Vs. Weak Links in Hockey. Part 1 - Forward Lines
So picture this. You’re building a hockey team and are faced with two alternatives. You can either upgrade your best player by one unit or upgrade your worst player by the same amount. What would you choose?
Introduction to Strong and Weak Link Games
What you choose depends on what type of game you're playing. In a “strong link game,” having the strongest, best player possible is what wins games. For an example of a strong link game, think Basketball. In the NBA, it doesn’t matter how good your depth is, your probably not beating the team that has peak Lebron James. Basketball is so star-driven it’s probably the best example of a strong link game.
On the other hand, Soccer is a classic example of a weak link game. In soccer, scoring often involves long and complex passing sequences leading up to the goal. As a result, depth tends to matter a lot because if somebody in that sequence is consistently failing their assignment, it often doesn’t matter if Messi is the shooter at the end of the sequence. The superstar won’t see the ball enough to overcome the lack of depth on their team. As a result, soccer is a classic example of a weak link game. (You can also think of NFL defences or offensive line play if you're an American Football fan like me).
So, if your answer to the introduction were you would upgrade your team’s worst player, you are implying hockey is a weak link game. If you think you'd instead upgrade your superstar, your implying hockey is a strong-link game. Where does hockey lie on this spectrum? Well, today, let’s try and find out.
Previous Research
Thankfully, I am not the first to test if hockey is a strong or weak link game. Alex Novet has looked into this in the past. He found that the strength of a team’s best player was a much better predictor of a team’s record than the strength of their worst player. In other words, at a macro level, hockey at the NHL level is a strong link game.
This is fantastic to know and makes intuitive sense to me. While depth matters more in hockey than basketball, I still grew up watching Crosby, Toews, and Kopitar hand the cup back and forth.
This research is fantastic. However, I think it can be improved upon. There are two reasons why. The first is that at the most macro level, this research isn’t particularly actionable. All GMs would prefer McDavid to their best player, but that isn’t an option. Even the GMs that get the option to upgrade on their best player rarely get to do so in season. So while it might be actionable in the offseason, you are usually stuck trying to maximize what you have in season.
We can also improve upon this by answering the question more granularly. Eric Eager and George Chahrouri consistently talked about this question as it relates to the NFL on their old podcast, and they noted something interesting. NFL offences are strong link systems because the quarterback touches the ball every play. So if you upgrade the QB, you upgrade the offence. On the defensive side of the ball, you (usually) need four pass rushers and seven coverage players not to blow their assignment. As a result, NFL defences are a weak link system because no matter how often your best player wins his matchup, if his teammate is getting roasted every play, very little can be done.
So while football is a strong link game overall (QBs are just that valuable), if you applied that logic to improve your defence, you might be disappointed.
Back to Hockey
So, we know hockey is a strong link game in general, but one problem we know is the results may not be the same all over the ice. Additionally, most teams must maximize what they have in season. As a result, I think line combination data is the perfect place for us to go with this question. Today, let’s focus on forward lines. Specifically, what is the best predictor of a forward line’s offensive output? Is it their best player, second-best player, or worst player? Depending on which player it is will help us know how forward lines should be constructed.
Data Collection
Answering this question requires a lot of data, so I turned to Evolving Hockey. Here I collected data for each forward line that played at least 100 minutes together at even strength. Then I collected RAPM estimates for each player to estimate which of these players were best to worst at driving play. Note each player on the line had to play at least 500 minutes to be included in this analysis. I did this because I wanted to get a good representation of the skater’s play driving ability that season from the RAPM estimate, so I used this cutoff. Also, note the target variable for this analysis will be Expected Goals For (xGF) rather than goals for because goals are so noisy, and we don’t have huge samples in a season’s line combinations.
Basic Analysis
So now that we have data for all the forward lines we can, we can start to answer the question. What is the best predictor of a line’s offensive output (xGF)? How strong the best player is at driving xGF, how strong the second best player on the line is at driving xGF, or how strong the worst player on the line is at driving xGF.
Again we are using RAPM xGF to estimate the quality of the individual players and using the lines on ice expected goals for to measure the line’s offensive efficiency. On their own, each player’s strength is positively correlated with their line offence. This makes sense. Making any individual player on a line better should improve the line. However, the relationship between player quality and line output is not constant across the player types.
It turns out the strength of the line’s second-best player is most highly correlated with line success. So is the offensive output of NHL forward lines about strong or weak links? Maybe neither. Maybe it is the second-best offensive player on the line that is most important when predicting its offensvie success. However, before moving on, we must address a potential collinearity issue.
Collinearity In The Data
The problem with analysis like this is we have a collinearity problem. The data are moving together, so individual correlations won’t tell the whole story because each of the three data points is, by definition, related.
Think about it this way, the data above suggests the middle player on a line is most important when driving offensive success. So when trying to get the most out of McDavid, this would suggest playing him with someone like Draisaitl, then cheaping out on their third linemate might be the ideal scenario. The problem is that for Drisaital to be the second-best player on a line, the best player on the line must be better than Leon Draisaitl. This only leaves like 5? Guys on earth who must be on the line as well. So knowing how good the second-best player is on a line inadvertently tells us tones about how good the best player is on the same line. The same goes for the worst player. Technically the best way to ensure your line does well is to ensure the worst player is the third-best player on earth. That way, by definition, you have the three best players on earth on the line together.
In statistics, we call this problem colinearity, all three variables are moving with each other, so it’s hard to know which of the three is driving the effect most strongly.
Colinearity Cure?
So how do we get around this colinearity problem? There is no perfect way to do so, but I think I have a clever idea. We will attempt a two-step process, and I’ll use the best player as an example and apply the idea to all three.
Here we will exploit the fact that the residuals (errors) in a regression model are, by definition, entirely unrelated to all the variables in the model. So we will run a model where the quality of the best player on a line is a function of the median player on the line and the worst player on the line.
Expected Quality of Best Player = B0 + B1(Median Player) + B2(Worst Player) + u
The model above predicts how good the best player on a line is based on the quality of the other two players. In this model, the quality of the other two players does a good job of predicting how good the best player is, but it is imperfect. How imperfect each prediction is, is captured in the u variable (the residuals). So with the calculation
Isolated Best Player = Actual Quality of Best Player -Expected Quality of Best Player
The isolated best player result represents u from the first equation. These residuals represent how the best player on the line varies independently of the other two players on the line. In other words, I think it solves our colinearity problem. Now we can repeat this formula for the other players on the line and make a new model of the three variables to test what is the best predictor of line success.
Random Aside for Fellow Nerds
So I think this is a good way of avoiding colinearity in this problem, but I’m not 100% sure. I’ve used this method in previous writing, and I think the logic makes sense. But I have been sitting on this for a while because I haven’t been too confident. Then my friend asked for help with her thesis. She was replicating finance paper called “REIT Characteristics and the Sensitivity of REIT Returns,” This paper used the same methodology to avoid collinearity. So I was uncertain if this methodology made sense. Now I know it was good enough for a peer-reviewed journal article; therefore, it’s good enough for me to take to Hockey Twitter. That being said, if you think I screwed up or have any other ideas to test with this, I’d love to hear from you on Twitter @CMHockey66, via email chacemccallum@gmail.com, or even in comments below.
Modelling Strong and Weak Links
Now that we have solved our colinearity problem, we can finally answer our question. Whats the best predictor of how much offence a forward line generates? Is it the best player, second best player, or worst player’s individual results? Let’s run this model
Line xGF = B0 + B1*Isolated Best Player + B2*Isolated Median Player + B3*Isolated WorstPlayer + u
From this model, the coefficients (the B’s) will represent how much the lines xGF per hour is expected to increase based on a 1 standard deviation increase in each player on the line. The larger the coefficent on the player rank, the more that rank increases offensive output, on average. So if the best player has the largest coefficient, it would mean NHL forward lines are strong link sysstems because the best players results are the most highly related to the lines overall results. Here are the coefficients from the model.
The coefficients from our model imply the same thing we saw from our original analysis. It looks like the best predictor of a lines success is how good the median player is, even after isolating the quality of each player from one another. Perhaps more surprising is the impact of a lines best player is so similar to the impact of the lines worst player. So if you could magically upgrade the lines best player or worst player by the same amount, maybe your choice wouldn’t matter at all? At least as it relates to even strength offence.
Lessons For Line Combinations
I find this conclusion really interesting. Are NHL forward lines strong or weak link systems? I guess…. neither? Instead, NHL teams should be ensuring they have a fantastic second option on a line above all else. Here is how I would phrase these results from a hockey perspective.
Stars matter a lot in hockey, but the data suggests that there is only so much one player can do to drive a lines offence. To get the most out a line, the best player will need help. That being said there is still only so much puck to go around, so you hit diminishing returns when upgrading the third best player on a line. So, instead of trying to load up a line with three superstars, you should focus on getting the lines best player a strong second option and that is likely the best way to increase its offensive output.
I think this makes sense if we think about so many of the leagues most efficient offensive players because they often have a great teammate playing alongside them. Think McDavid-Draisaitl, Marner-Matthews, Ovi-Backstrom etc. So, if you are ever mad at your favourite team for playing two great players together I get it, but maybe the team is on to something. The second best player on a line seems like the best predictor of its offensive results.
For those who made it through the end despite this post being pretty dense, thank you, and feel free to let me know if you have any questions!
Interestingly, modern NHL coaches seem to have intuited this relationship over time. A lot of them now seem to focus on forward pairings, where a known twosome with clear chemistry stick together while a rotating cast of lesser linemates is shuffled through.