PREVIOUS: Part 1 (Intro)
In Part 1 (Intro), I explained why I am looking to see if the NHL’s history of expansion has damaged the product on the ice in the short term and long-term. Here, I explain the methodology.
I have attended three hockey analytics conferences so far – when I began this, it was two with the third still on the horizon (Seattle, March 2019; Rochester, September 2019, Columbus, February 2020). There is a great deal of theory and data, and unfortunately not all of it is publicly available – and what is may be a challenge.
Looking at NHL history has one massive drawback and one massive advantage, which are two sides of the same coin: for a long time, records were kept in a completely nonsensical and inconsistent manner – if they were kept at all. The information junkie and the historian in me bemoans this fact daily, in that it prevents a dark picture from ever being made clear. On the other hand, this allows for a greater consistency when comparing records across years.
In order to even begin this work at all, it stood to reason that I would need a player database and a player-season database. From there, information could be extracted and analyzed. But, I chose to operate with what a complete newcomer might have access to: Microsoft Excel. And I realized that even that might be a bridge too far for some, and I don’t want anyone with a curiosity to be shut out of all of this information, so I went even further with Apache OpenOffice – a freeware version of the Microsoft Office suite.
The Player Database
From there, player and player-season information was extracted from hockey-reference.com. I decided early on that I needed to do a few things.
- Add player birthdays in, in the event that I wanted to do any type of age-related analysis
- Answer four questions for every single player-season in NHL history: from the season in question, did the player suit up for NHL games two or more years prior, did he play last year, did he play the year after, did he play two or more years later?
After that, and I mean after all of that was done and I thought I was finished, I realized that I’d stumbled across something which I feared would cloud the picture of anything I was trying to do.
Let’s take Alex Pietrangelo of the St. Louis Blues as our example here. He was drafted 4th overall in 2008, and played eight games in the NHL in the season immediately after he was drafted (2008-09). He played nine games in 2009-10, and then 79 of 82 in 2010-11. He has been a full-time NHLer ever since.
The problem with using raw information as the be-all end-all is that it can mask what I’m trying to find. The raw information treats Pietrangelo as a rookie in 2008-09, when he played less than one-tenth of his team’s games. 2009-10 would then be his second year, and 2010-11 (his first year as an established NHLer) as his third.
This doesn’t necessarily sound terrible, but when we’re looking at the roughly 8,000 players in NHL history, it can create issues. I’m looking for illumination, not obfuscation. And who the NHL considers a rookie may be different than what these numbers show.
In the NHL, the Calder Memorial Trophy is awarded to the league’s top rookie. As it’s defined in modern terms, a “rookie” is a player who meets these requirements:
- He has not played 25 or more NHL games in a single season previously
- He has not played six or more NHL games in two previous seasons
- He is younger than age 26 as of the start of the league season (September 15)
The age requirement was added for the 1990-91, after a longtime professional player named Sergei Makarov won the Calder despite being 32 years old and one of the most decorated players in Soviet hockey history. It was his first year in the NHL, and that was enough.
The number of players who would have competed for the Calder in the 30 seasons since while being 26 or older is an extremely short one: Marek Zidlicky in 2003-04 likely would have been a finalist or winner, and Sergei Nemchinov (1991-92) and Nikolai Borschevsky (1992-93) may have received votes. Andrew Hammond was suggested to me by a Twitter follow (@MoreHockeyStats), which gives rise to an interesting thought exercise.
The other issue is that the NHL’s definition of “rookie” has changed over the years – there was a time where a single game played was classified as a player’s rookie season.
So I went back through and added in an additional column for a player’s Calder season without the current age restriction, and applied the modern (25-games once/6-games twice) definition retroactively.
The total amount of time compiling this information, keying in birthdays, answering the questions, and designating the Calder seasons took between 350-400 hours of work. This was done in the span of 59 days, around a full-time job with inconsistent hours and days worked. At the conclusion of this project, I will be releasing the database into the public realm so that everyone else can analyze it as they see fit.
Andrei Zyuzin, the example
This graphic shows the entirety of Andrei Zyuzin’s NHL career, taken from the database. “TOT” under the team column (C) mean that he played games for more than one team during that season, and the stats are the total combined ones from that season. “AGE” (header for Column H) is his age according to hockey-reference.com, which is oriented to February 1 of the season in question.
Columns AF, AG, and AH have more precise information: his actual birthday, the season’s age cutoff, and his actual age as of that date expressed as a decimal.
Columns AI through AL are the four questions referred to above:
- “Prior” is whether he played NHL games two or more seasons before the one in question
- “Last” is whether he played NHL games the season immediately preceding the one in question
- “Next” is whether he played NHL games the season immediately following the one in question
- “Later” is whether he played NHL games two or more seasons following the one in question
For every player, the first year in which he plays NHL games with have N’s in the first two columns – and the last year in which he plays NHL games will have have two N’s in the last two columns.
The Early Exceptions
Unfortunately…that’s not necessarily true. The NHL formed in 1917-18 after the demise of the NHA, and I mean immediately after. The NHA team owners, minus Toronto’s Eddie Livingstone, had a league meeting in which they agreed to dissolve the league and then immediately re-form it with all of the same teams (minus Livingstone’s Toronto squad). Ah, league politics.
In the early days of the NHL, there were players who had previous NHA experience. For building the database, I considered NHA experience to be equivalent to NHL experience.
Additionally, I considered experience in the various Western leagues (the PCHA, WHL, and WCHL) to be equivalent to the NHL – my reasoning is that if the player was in a league that could play for the Stanley Cup, they were on equal footing.
This has additional ramifications down the road as well:
- The WHA existed as a direct competition to the NHL from 1972-73 to 1978-79. There is a stipulation with Calder rules (outlined above) which state “major professional league”, which was a provision designed to make sure that no WHA player could win the Calder in the event that they played later in the NHL. I do not regard this stipulation with Calder seasons – Wayne Gretzky is regarded as a Calder player in 1979-80, as it is his first NHL season.
- That said, the WHA could not play for the Stanley Cup – therefore I do not have them in the database as an equivalent league the way I do the earlier Western leagues.
- I regard the Western leagues as a one-way street: if a player went from the West to the NHL, it’s regarded as a fresh start. If he went from the NHL back West, then came back to the NHL, it’s treated as years outside the top level. I absolutely admit that this is arbitrary, but I think it makes a modicum of sense. The Western leagues began declining as the NHL’s earliest expansion process took place – the idea of a player going from what was becoming the top league to a lesser one and then coming back but still being equal…intuitively it’s not the same thing. In the case of Western players who stayed even as the leagues declined, and then made the jump, I look at them the same way as if a Czech player stayed in his home country for an additional year or two. Again, it’s arbitrary, but I think it has at least a shred of common sense to it.
- The Western leagues are no longer considered equal after the 1926-27 season, as they were unable to compete for the Stanley Cup.
There is an additional issue with the earliest days of these leagues through 1926-27, which is that player birthdates may be in conflict with other published sources. Although this does not completely cease to be an issue after 1926-27 (see Johnny Bower), it is severely mitigated after that point. An example is Chris Speyer, who’s listed in various places with a birth year of 1902, 1904, and 1907. Since his first year in the NHL is 1923-24, this has big ramifications: was he 21, 19, or 16 at the time?
Unfortunately, due to the arbitrary definition of “experience”, the massive shift in the strength of the Stanley Cup-competing leagues, and discrepancies with multiple players in the early days, I have little choice except to recommend taking information from 1917-18 through 1926-27 – the NHL’s first decade of existence – with caution.
It only applies for one year: if a player appeared in NHL games in 2003-04 and then in 2005-06, these are treated as consecutive seasons due to the 2004-05 season….well, not existing.
Andrei Zyuzin’s career chart, outlined above, is but one example of what a player’s career may end up looking like.
For most veteran mainstays, it will closely mirror that of Zyuzin’s: NNYY in their first year, NYYY in their second, a series of YYYY years, a YYYN in their second-to-last, and a YYNN in their last.
This is important for later grouping, which will be explored further in a later part of this series.
One additional concern is with players who are still active, and who thus may end up playing games at a later point. Several have suited up this year alone who had gone multiple seasons since their last NHL game; this will be cleaned up at the end of the 2019-20 regular season and re-run according to the best information.
Regular Season Only
Peter Forsberg’s player chart shows that he was not active during the 2001-02 season, even though he suited up in the playoffs.
For those who may not recall, Forsberg suffered a spleen injury during the second round of the 2000-01 Stanley Cup playoffs, which necessitated the removal of said organ. After years of being beaten up in both the regular season and in the Avalanche’s regular deep playoff runs, Forsberg announced that he was sitting out the 2001-02 regular season to recuperate. When he came back for the playoffs that season, he was arguably the most dominant he had ever been.
The database is regular season only, which means that Forsberg shows up as absent for the 2001-02 season. This has only an incredibly tiny effect on anything.
There are about two dozen players in NHL history who are absent from this database completely; they only appeared in playoff games and not at all in the regular season. Almost to a man, they were depth players whose sole NHL games were between 1946-55.
The list, which is not 100% complete, includes:
- Buck Davies
- Bill Anderson
- Butch Stahan
- Don Cherry
- Doug Anderson
- Doug McKay
- Eddie Emberg
- Gary Collins
- George McAvoy
- Gerry Reid
- Gord Haidy
- Gord Wilson
- Jack Stanfield
As you can see, the second half of this list (which is alphabetical by first name) has gone missing.
I do not know the reason why this was a somewhat common phenomenon during this time period. More research is needed on it.
The Team/Game Database
Simply looking at players across the board is a sure-fire way to end up with a project that’s only halfway complete. With this in mind, I decided to create a second database – still in Apache OpenOffice – that has information on every game in NHL history.
What’s that saying about it being better to have and not need rather than need and not have….hell, I’ve spent most of my life on a farm. My favorite pair of jeans is shredded not because they came that way, but because I was underprepared for clearing brush that turned out to be little more than an obscenely dense brier patch.
If talent dilution exists, it is very likely that this information would be needed. If talent dilution exists, the effects would leave evidence somewhere – it’s just a question of finding it.