BeerHavior: Rankings, Biases and our Changing Palate

morpheus_beer_ipa_stout_ratebeerYou can check out part two of this series about the geography of the top-20 RateBeer beers here and part three on what the “bottom” best beers tell us about the top, here.

For the past couple weeks I’ve been looking through data from RateBeer.com, which releases a “best beers in the world” list each year. RateBeer has a full archive dating back to 2006, so I wanted to map out what I thought would showcase changes in behavior pertaining to beer.

My general thought? We’d see more variety not only in beer, but especially in the strength of top-ranked brews. On that front, I found myself to be both right and wrong.

First, a note about RateBeer’s rankings – they are incredibly consistent. From 2006 to 2013, the “best beers” are heavily skewed toward rare beers that are often imperial stouts. Why do these particular beers rank so well?

One reason is selection bias – not everyone can get a Dark Lord Russian Imperial Stout, so there are fewer ratings of that beer than Bells Hopslam, which typically performs well and is available across the country every year. The fewer ratings a beer has, the greater chance it has of compiling top scores. That’s because…

… there’s also motivational and cognitive bias. Beer nerds are famous for riding the hype train, which pushes beers like Dark Lord to holy heights. If, by chance, we are lucky enough to get a bottle, the sheer magnitude of the occasion has the ability to skew our judgment. We expect a beer to be amazing, therefore it’s more likely to be amazing once we have it.

ratebeer_beer_bottlecapSure, every person is a special snowflake, but these are general guidelines to keep in mind. It should come as no surprise that of the top-20 “best beers” from 2013’s list, 11 currently reside on the “most wanted” list of the site. (It should also be noted that these beers are also likely to taste great)

All that said, my methodology – for sake of time and effort – was to look at the top-20 of each year’s best beers, as rated by RateBeer users. The top-10 didn’t offer enough variety, so I simply doubled it up. You can see a full list of the beers here.

But enough about all that. Let’s do the numbers…

As expected, imperial stouts perform very well. Why? Probably because they fit nicely into all the biases previously mentioned. More often than not they are limited release, limited distribution and well hyped. It’s a perfect storm. That said, here’s the distribution of imperial stouts in top-20 “best beer” rankings over the past eight years:

Imperial_stouts_line_graph

I was curious why 2010 stands out with 17 of 20 beers being an imperial stout, but it could simply be an apex of hype for that style. In 2010, All About Beer magazine noted that the “resurrection of languishing stylistic gems might be America’s greatest contribution to brewing in the past 30 years. This would most certainly be the case with Russian imperial stout.”

For what it’s worth, Google Trends shows an initial spike in searches for “imperial stout” at the start of 2009, showing that awareness – or at least curiosity – started to pick up in the 2009/2010 timeframe.

While the presence of imperial stouts didn’t surprise me, this is what did: average strength of the top beers. Imperial stouts, which typically clock in from 8 to 12 percent ABV, always make up the plurality of RateBeer’s “best beer” lists. Even still, the times, they are a-changin:

beer_abv_line_graph

After hitting a peak of 11.53% in 2007, the average ABV of RateBeer’s top-20 beers has dropped by roughly 2 percent. What happened? Simply put: variety.

No RateBeer list has had fewer than nine imperial stouts, including 2013’s list. However, beginning in 2011, we see cracks in the ratings. Notable top-20 additions include:

  • 2011 – Russian River Supplication (sour)
  • 2012 – Cigar City Passionfruit/Dragonfruit Berliner Weisse and 3 Fonteinen Gueuze
  • 2013 – Hill Farmstead Ann (saison), Cigar City Miami Madness (Berliner Weisse) and Cantillon Fou’ Foune (lambic)

Why is this important? Well, this is one reason:

number_of_craft_breweries_Brewery-Count-HR

I know this has been beaten to death, but that growth of breweries doesn’t only mean a flood of new beer everywhere you look. It means that to succeed as a business, these breweries have to set themselves apart. When it comes to year-round or seasonal offerings, they have to do something different aside from an IPA – gotta have one! – to sell beer.

Luckily, that also means quality of beer and innovation have gone hand-in-hand, offering beer drinkers wonderfully complex beers that also taste great.

Yes, nearly all the beers we drink fall below the ABV averages seen above of RateBeer’s top-20 “best beers,” but keep it in context that these are regarded as the best beers in the world and their combined strength is going down. Give it a few more decades and DING will be in heaven!

—–
Lots of stuff here today and I’d love to hear your thoughts. This is a very broad topic with lots of viewpoints, so please do leave comments/questions below and let’s hash this out some more.

On Wednesday, I’ll try to provide a better idea of where/how these “top beer” changes are happening.

+Bryan Roth
“Don’t drink to get drunk. Drink to enjoy life.” — Jack Kerouac

40 thoughts on “BeerHavior: Rankings, Biases and our Changing Palate

  1. Something about the way RateBeer ranks beers seems off to me. The way I understood it Ratebeer ranks the beers based off average score. If 1000 people rate a beer a 9 and 10,000 rate a separate beer an 8.8 than the beer with the smaller sample size would be deemed the better beer. It might very well be true, but I would like to see some math involved that takes into account that fact that a beer has been vetted by 10x the amount of palates and therefore the score is much more likely to be an accurate depiction of the beer and isn’t influenced by the white whale thing as much.

    1. BBB: RateBeer takes number of ratings into account. There’s a Bayesian weighting system in place to counteract what you describe. Here’s a detailed description:

      http://www.ratebeer.com/ratingsqa.asp

      1. Thanks. Makes more sense to me now. I’m a beer nerd white whale of sorts. I’ve never looked at Beer Rankings in my life. Totally uninteresting to me.

    2. RateBeer uses some pretty standard statistical methods to take into account sample size. It is detailed on their website if you want to know more. Basically, it pulls the score from the arithmetic mean of the ratings more towards the center of the scale w/ beers w/ less ratings. Or to use your example, a beer that was rated 8.8 (well, RB uses a five point scale) by ten thousand will likely be ranked higher than something rated a 9 by only one thousand (couldn’t say for sure w/o looking into the specifics further).

    3. They’ve got a ratings FAQ here: http://www.ratebeer.com/ratingsqa.asp

      What I took away from their formula is they try as best they can to avoid the biases I mention above. Not necessarily a perfect way to do so, but they’re trying!

  2. Interesting article. I usually tend to avoid Rate Beer and the other “numbers” sites. It’s not my bag. It’s great to see the breakdown on how the numbers are shifting. I’m curious to see them after another year of sours/funk beers gaining popularity.

    1. I tend to avoid rating sites but enjoy perusing to see how aspects of a beer from my palate matches up with others. The ratings themselves are kind of an extra piece for me.

      I have enjoyed delving into the numbers this time, though, because the trends have been really interesting. Thanks for the kind words!

  3. The math-to-literature ratio of this article is dangerously high. Good work.

    1. Today, math … Wednesday, geography. I guess schoolin’ was good for something!

  4. Your posts make my brain hurt. I think all the smoke set off the fire alarm.

    Really great stuff; I had an inkling the rising popularity of sours would drive the average ABV down, and probably knock some IPAs off the list.

    1. I honestly thought double IPAs and imperial stouts would be very close in volume, but I imagine the seasonality of imperials and their speciality release status make them more “special,” when DIPAs can be found all the time.

      Imperials may also benefit from the fact that the RateBeer audience is very international, so it’s not just American hop heads chanting “USA! USA! USA!” as they shove lupulin down our throats and give perfect scores.

  5. Great post. I’ve often thought that certain rare beers suffer from the motivational/cognitive/selection biases. Why is the Westvleteren 12 considered one of the best beers in the world while St Bernardus’ Abt 12 (a similar- and in my book delicious- beer brewed roughly 20 minutes away by a brewery with historical ties to Westvleteren) is not? Well, one of them is widely available, while the other one is only available for purchase at a monastery, by appointment, where you have to dial a phone number months in advance and give your license plate to get a chance to by the beer.

  6. Thanks for the very interesting post. Lately I’ve been crunching some numbers on top beers at both the Rate Beer and Beer Advocate websites just for the fun of it. I took a sample of 76 beers that are available where I live (Columbus, OH) plus a few that I’ve tried that are less readily available (Pliny, Supplication, Westy XII). About 90% of them were ranked in the Beer Advocate top 250 beers, although only 4 from the top 20 because those beers are generally not readily available. One difference that immediately jumps out at you is that Beer Advocate members rate beers higher than Rate Beer members, by 0.33 points on average across the beers in my sample. Yet I believe they both use Bayesian statistics to calculate the scores. So I guess the members of BA are simply more apt to put a positive spin on the reviews. I’m pretty sure they have a different geographic distribution. My sense is that BA draws more from the Midwest and east coast of the US, than RB.

    If you divide them up into different broad styles—Belgian-style, German-style, IPAs, Stouts & Porters, Sours & Wild Ales, Barleywines & Strong/Old Ales—you see something else interesting. The biggest difference between the two sites is for German-style beers (Lagers and Hefeweizens) where the BA rating was 0.46 higher than the RB rating, and the smallest difference was for Stouts and Porters where the BA rating was only 0.24 higher. This supports your argument that Rate Beer users are a Stout loving group of beer drinkers. Of the 76 beers there were only two where the RB rating was within 0.1 of the BA rating: Speedway Stout and Oak Aged Yeti.

    1. Well this is just fantastic. Thanks so much for adding all this. Really great stuff.

      I am not a Beer Advocate user, but I do peruse the site. One of the biggest complaints I regularly hear about BA is how restrictive the Alström brothers can be, banning people from the site and general shenanigans. I wonder if that may play into more positive reviews.

      1. Occasionally I’ve seen comments online about that kind of activity and the Alstrom Brothers, but I can’t speak to it personally. Given the tens (hundreds) of thousands of reviews that come into each site it’s hard to believe that they could ban enough people to skew the statistics. However, I think you are onto something by bringing in their influence.

        The BA-RB rating discrepancy was among the largest for the following beers: Weihenstephaner Dunkelweizen (BA = 4.23 vs RB = 3.57), Weihenstephaner Hefeweizen (BA = 4.41 vs RB = 3.81), La Fin du Monde (BA = 4.32 vs RB = 3.82), and Great Lakes Dortmunder Gold (BA = 4.09 vs RB = 3.54). Now consider the Alstrom Brothers ratings of those beers, 100 100, 100 and 99, respectively (I’m not 100% sure how the 0-100 scale translates to the 0-5 scale). In contrast they rated Speedway Stout only a 90, and that one has a small discrepancy (BA = 4.39 vs RB = 4.33). I remember for a while that they had an unfavorable review of Bells Two Hearted Ale and that one has just about the smallest differential (BA = 4.26 vs RB = 4.07) among the IPAs. Just the fact that they post their opinions probably influences later ratings, particularly for newcomers. I don’t think there is anything sinister to it, just human psychology.

        Thanks for the discussion I had not thought about the “expert reviewer” influence you raised the question. I’m planning on writing a few posts on data mining from the ratings sites at some point in the near future. I’ll let you know when I do.

      2. Occasionally I’ve seen comments online about that kind of activity and the Alstrom Brothers, but I can’t speak to it personally. Given the tens (hundreds) of thousands of reviews that come into each site it’s hard to believe that they could ban enough people to skew the statistics. However, I think you are onto something by bringing in their influence.

        The BA-RB rating discrepancy was among the largest for the following beers: Weihenstephaner Dunkelweizen (BA = 4.23 vs RB = 3.57), Weihenstephaner Hefeweizen (BA = 4.41 vs RB = 3.81), La Fin du Monde (BA = 4.32 vs RB = 3.82), and Great Lakes Dortmunder Gold (BA = 4.09 vs RB = 3.54). Now consider the Alstrom Brothers ratings of those beers, 100 100, 100 and 99, respectively (I’m not 100% sure how the 0-100 scale translates to the 0-5 scale). In contrast they rated Speedway Stout only a 90, and that one has a small discrepancy (BA = 4.39 vs RB = 4.33). I remember for a while that they had an unfavorable review of Bells Two Hearted Ale and that one has just about the smallest differential (BA = 4.26 vs RB = 4.07) among the IPAs. Just the fact that they post their opinions probably influences later ratings, particularly for newcomers. I don’t think there is anything sinister to it, just human psychology.

        Thanks for the discussion I had not thought about the “expert reviewer” influence you raised the question. I’m planning on writing a few posts on data mining from the ratings sites at some point in the near future. I’ll let you know when I do.

  7. tempestinatankard January 7, 2014 — 1:56 pm

    I was just reading your most recent post on tastes of home, and decided to check out your link to this piece. Well done! Your number crunching is very revealing, and your attempt to make sense of rankings through the lens of bias makes for thought-provoking reading. I found it particularly interesting that ABVs have fallen off rather steeply since 2010, and I think your reasoning accounts convincingly for this trend. An increasing number of breweries translates into more variety as they attempt to distinguish themselves.

    But I think there’s more to it, something we can’t merely reduce to variety driving down the average ABV of “top-ranked” beers. (I may be wrong, but I don’t think we’re going to see a lager inhabiting any top-10 spots on these lists any time soon – unless it’s an Imperial Lager ramped up to appeal to a generalized North American craft beer palate primed for big and intense flavours.) What remains concealed within this ABV decline is the ongoing the tendency of the craft beer community (drinking and thinking under the powerful influence of North American beer rating sites, of course) to continually seek out the rarest beer, the most novel beer, or the most extreme beer. You’re right in alluding to this confusion between rarity and quality when you discuss biases and changing palates, but I think we could draw an even more explicit analogy between an infatuation with high ABV and the recent turn to sours and funky beers. Both of these are, I’d argue, markers of a taste for the extreme. ABV may continue to drop, but as a few commenters have intimated, this may have less to do with an embrace of sessionability than it does with the recent “discovery” of sours and saisons in North America.

    It may be quite some time before Ding gets to heaven.

    1. Great points – thanks for adding another layer to this.

      Anecdotally, I’ve noticed an increasing trend of new breweries to try and adhere to the “farmhouse”model and focus on saisons and the like, which I love. It gets back to the idea of differentiating yourself among other breweries that open up with common lineups featuring a pale ale and IPA.

      These businesses wouldn’t be opening if there wasn’t something of a demand for them and by playing to historical styles that don’t require a 10 percent ABV, will hopefully open up eyes and convert more people to the plethora of flavors available.

      Perhaps therein lies the key – as you note – American palates are keen on extremes, but we’re just learning now that doesn’t mean it has to be a malt or hop bomb.

Leave a comment

search previous next tag category expand menu location phone mail time cart zoom edit close