Bing Brings Big Data to NCAA’s Official Analytics Bracket


CAZFlA9WcAE6h6p

Big data and a March Madness bracket.

Two distinctly, diametrically opposed conceptions–at least at first glance–that, now, invariably come together during this time of year.

Searching these terms online leads to a plethora of results, ranging anything from gamblers, mathemagicians, or data scientists’ perspectives on predicting the Big Dance. The common action that users tend to do to find tournament-related information, however, is visit their trusted sources for sports news, be it ESPN, Yahoo, or the NCAA’s website–outlets that, too, possess its own bracket property. These platforms, though, have tactful design purposes, completely different than objective queries users place in search engines. Thus, there’s vast possibilities to leverage a search engine’s system to produce a bracket and everything pertinent to it.

Yet, at the same time, other marquee events are just as open to big data analysis, which could function as precursors to March Madness predictions.

Bing is currently the official bracket data partner of the NCAA, where they have been granted access to ten seasons’ worth of raw data.

As a search engine, though, they have been proactive to venture into different arenas insofar as assisting users with information to make better decisions. Political elections, the Grammys, the World Cup, and the NFL are some of the other topics that Bing has analyzed and predicted as of late. Their track record in these verticals with fruitful outcomes serve as preliminary steps to apply their big data technology to college basketball. Of note, its Bing Predicts tool spurs success rate, with better than 95 percent accuracy in the 2014 U.S. Mid-Term Elections, 15 for 15 during the knockout round of the World Cup, and to the great chagrin of Microsoft’s Seattle Seahawks faithful–correctly predicted the New England Patriots to win the Super Bowl.

Conversely, there’s certain similarities and differences between non-sports events and those that are to keep in mind, with respects to big data integration.

“For popularity-based contests like American Idol, web and social signals can highly correlate with popularity-based voting patterns. From this, the engine can make very accurate projections on who will be eliminated each week and who the eventual winner will be,” tells Walter Sun, Bing’s Principal Applied Science Manager, to SportTechie.

“On the other end of the spectrum, predicting the World Cup, NFL season, and now the NCAA tournament requires the incorporation of player and team stats, tournament trends, game outcomes, location of contests, league trends through multiple seasons, and data from web and social channels. The online web and social data add yet another layer of insight, providing our Bing Predicts model wisdom from the crowd,” continued Sun.

Accordingly, Bing’s partnership with the NCAA is of mutual interest for both parties. The former has made commitment to prediction-based efforts–highlighting its capabilities to forecast at a high rate–that segues fittingly into college basketball’s postseason. Bing’s research notes that, by and large, people just follow five schools leading up to tournament time. Filling out a bracket invariably becomes a guessing game. The larger purpose being to bring this new data to fans and enabling them to form a smarter bracket, enriching the experience instead of merely guesswork.

In fact, Keith Martin, NCAA’s Managing Director of Marketing and Broadcast Alliances, believes that this relationship provides a “fun” alternative for fans that’s also “more analytical”, in terms of their bracket selections.

Bing’s model, thus, takes into account a sundry of variables that projects tournament success. Three of the more popular factors that fans consider are a conference’s strength, wins in neutral and road games, and players’ experience in past postseasons. One of the better indicators, though, that’s often overlooked is defensive efficiency–greater than three-point shooting efficiency, too. Of course, this barometer proves to be less prone to high variance than sporadic shooting sprees. The aggregate of individual talent–including pre-season expectations and total McDonald’s All-Americans or potential pros serving as proxies–also goes somewhat unnoticed. Talent trumps effort when everyone is giving the same output, but heighten stakes magnifies sense of urgency for talent to overcome early season struggles.

Speaking to the former, Kentucky and Virginia’s defensive potency all season are ongoing cases. As for the latter, the 2013-14 Kentucky team took time to build chemistry, but found its groove during tournament time.

“We have a team of machine learning experts–some of which are also sports enthusiasts–who have much experience extracting meaningful features from large amounts of data. Once the team has extracted a good candidate set of potential features to learn from, the model takes care of the rest,” Sun describes Bing’s process to analyze the information.

“In fact, the direction of the feature doesn’t even have to be known prior for it to work in a model. The features and associated values from past seasons are entered into the model to learn which ones lead to success during the NCAA Tournament. Other tournament statistics are added on top of this to further increase the confidence of each prediction,” added Sun.

unnamed

So, once the algorithms are devised, they validate on data not used in its formation to ensure it can perform well enough.

In essence, these NCAA models function in the same way as what Bing does for search.

A user submits a query to the search engine, then Bing can distill a trillions’ worth of URLs in the open web. In order to provide users a set of links that suffices, algorithms are constructed that take in big data (online documents and signals), creating a ranked output of good results for the end-user. The key is that the data harnessed for the model represents the desired output in its current state.

Sun breaks down the scenarios that Bing had to be cognizant about to format its models: “For the March Madness predictions, we would not take data earlier than 1985 since the tournament first moved to the now familiar 64-team format that year. Furthermore, the 1986 season do not have a standardized three-point line, so data learned from then would not represent the game today. The shot clock reduction from 45 seconds to 35 seconds happened in 1994, so the pace of games increased a little after that. In the past few years, the pace and style of game has been similar, so using these recent results to learn the model has seemed to provide robust and reliable results.”

The deeper big data dive, moreover, provides context as to why certain key trends transpire.

Anyone who pays close attention to tournament odds is aware that five seeds are liable to get upsetted by 12 seeds, with a 34 percent clip of such matchups. The automatic qualifiers from smaller conferences tend to be assigned lower seeds because of the deemed expectation that they’re inferior. Sun mentions, though, that these 11 and 12 seeds are commonly awarded to the best of these automatic qualifiers. While foreseen as underrated, these schools haven’t been tested but are the elite of their respective conference. Thus, from time to time, the second or third place school of a power conference offers a favorable opponent than the ratings may suggest.

Given the fact that just one in five people follow more than five schools, fans can get an advantage by leaning on Bing’s forecast of all the schools to complement their respective choices. The combination of Bing’s Bracket Builder (it’s Bing Predicts as an under layer into their bracket, where users hover over to notice probability chances), tournament predictions, and real-time game data presents users intel that’s useful, be it first-timers or veterans. Bing’s methodology simply gives fans an opportunity to join a bracket in a new, interactive way.

Fans just have to search for NCAA tournament information on Bing, the bracket builder will appear at the top of the search results page, or navigate via Bing.com/NCAA. In case a user needs help filling their bracket, Bing will complete the rest of the bracket for them automatically. There’s chances to win Sweet Sixteen or Final Four tickets. This bracket works complementary to NCAA’s one, it isn’t instantly available from the original source. It’s its own unique central hub for predictive knowledge about March Madness.

“Our biggest success indicator will be our predictions outcomes,” says Sun.

“We know it’s nearly impossible to build a 100 percent bracket due to numerous unpredictable factors in single elimination games, but our hope is that with our predictions, people are able to build a smarter, more informed bracket,” Sun believes.

Bing is predicting Kentucky, Arizona, Villanova, and Duke in the Final Four, and are anticipating a 75 percent success rate.

The eighth seeded NC State stunning Villanova over the weekend demonstrates the other 25 percent of Bing’s big data game that it’s vulnerable to in its official analytics-based bracket.