Soccer data environments can often be seen as black boxes: based on club websites or American Soccer Analysis’ own State of MLS Analytics survey, we might know that something sporting-data-related is going on at a team’s training ground, but it’s often difficult to place what exactly that something is. In their defense, club and team personnel often want to protect any sort of competitive advantage so this sort of secrecy is (somewhat) justifiable, but these inner workings are what sickos soccer analytics nerds and college students interested in the industry really want to know about.

This year’s inaugural American Soccer Insights Summit (ASI), hosted by Rice University in Houston, TX, offered a rare look behind the curtain, bringing together club staff, industry professionals, fans, and students to give tangible shape to a community that has persisted mainly via blog posts (like this one!), Twitter/BlueSky threads, podcasts, WhatsApp chats, and Discord servers. For many attendees, this might have been the first time they’d been able to gather with a large swathe of like-minded people to let their collective hair down and talk shop.

A brief note on the conference structure (which you can find online at https://americansoccerinsights.com): Day 1 featured a collection of industry presentations, alongside a number of structured opportunities to help people in the room find more friendly faces and learn more about work going on across the industry, while Day 2 mixed industry presentations and research competition finalists (more on this later), bookending the day with keynotes from club executives and ending the conference with a live taping of the Expected Own Goals podcast. ASI put out a great recap video of the conference here:

Keynotes

These two presentations on Day 2 might be the closest the public will get to being in the room during the player recruitment process: Pat Onstad and Ethan Creagar (both of Houston Dynamo) established the “what” and “how” of the player recruitment process, while Zayne Thomajan (from Chicago Fire) dug deeper into its “why”.

In the public analysis sphere, the focus in a transaction is naturally the player’s on-field value-add (the “goals added”, if you will), but there are other mechanics that determine why that transaction comes together in the first place. We often talk about the player recruitment process as “risk mitigation”: how can we be as sure as possible that a player we bring in is going to help our team? What unknowns can we make known? Onstad and Creagar walked through the scouting side of this process, which generally includes:

Generating player lists based on well-defined player profiles using data
Watching video to validate the player’s profile and understand their style/team style/team fit
Asking questions about specific situational contexts
Meeting and interviewing the player in person
Making a final decision that everyone’s on-board with

Speaking to other current and former club personnel at ASI, it was clear that Onstad and Creagar’s approach was fairly common across data-savvy organizations -- table stakes even.

But as Thomajan explained, there’s still some room to optimize these kinds of processes via good sporting strategy. Her presentation focused more on transfer and player personnel philosophy: how can clubs mitigate risk on the front- and back-ends of the process, rather than just in the scouting work? This involves dedicated market analysis, both inside and outside a club’s league: what leagues are reliable providers of talent at positions of need? What markets have other teams seen consistent translation from? What positions do other teams prioritize discretionary spending and cap space on? These kinds of strategy questions that help structure the top of the player search funnel. As prospects work their way through the recruitment process, clubs also have to consider a number of “player personnel factors”:

What kind of support system does the player have in place currently? What do they need to perform their best?
What languages do they speak? How does that mesh with the cultures in the existing locker room?
Has this player already left their home country once before? Do they understand how to adapt to a new culture?
What are other ways we can smooth the transition process for this player? Can we set up bank accounts, apartments, phones, etc?

The most interesting strategy piece Thomajan touched on was around perception: how does a transfer (or a certain type of transfer) affect the perception of a league and its competitiveness? We know that the flow of perceived player quality defines a league’s market position, but it can also inform strategy on the front end of the recruitment process. The easiest (and most obvious) comparison we can make in this regard is of NWSL versus MLS: the former’s dominant (near the top, if not top outright) market position opens the doors to a wider range of talent at a more diverse range of relative price points compared to the latter’s more limited funnel (which can often force riskier transactions, financially or otherwise).
Put together, these two presentations provided a holistic view of the player recruitment process, which made for great fun given the way the day started: Pat Onstad mentioned a trade in progress to start his presentation, and the resident MLS sickos decoded who was involved well before Tom Bogert could get his story and posts about it out.

Industry Presentations

If the keynote presentations sketched out what the player recruitment process looks like end to end, the remaining industry presentations at ASI fleshed out the rest of the canvas, painting a comprehensive portrait of the impact an analytics outfit can have on an organization. Below are some short summaries and key takeaways from each presentation:

Building an Analytics Operation: Lessons Learned through an Oral History of American Soccer Analysis

Brian Greenwood and Tyler Richardett, American Soccer Analysis

If you’re reading this, you already know what ASA is and you might be tempted to skip to the next paragraph. What you might not know about ASA is that its data arm has existed for over a decade, connecting a group of MLS data jockeys to build a sort-of Fangraphs for American soccer. I would bet that ASA’s founding date of 2013 makes it older than most (if not all?) MLS analytics operations, and its data engineering infrastructure (led by Brian and Tyler) has evolved from a collection of spreadsheets and cloud folders into a cloud computing operation that rivals that of many clubs. There have certainly been growing pains (moving from Excel to programming languages comes to mind), especially as technology has evolved, but ASA has stayed grounded to that mission of pushing soccer analytics in the US (and Canada) forward with its database of public metrics for six domestic leagues (seven, if you count NASL), including support across the board for its on-ball value metric, goals added (g+) -- all available via API or public dashboards.

But then again: if you’re here, you probably knew all of this!

Building a Data-Oriented Culture on a Budget

Alejandro Dávila, Director of Analysis - Mazatlan FC

How do you run a performance department on the cheap? You listen deeply to your stakeholders and make do with what you have -- and sometimes you literally make use of their own words to power your analysis. Davila and his team put a new spin on the old analytics adage of “speaking the coaching staff’s language” to build out tools to support the staff at Mazatlan: instead of building top-down reports based on data they might have to pay for, they went bottom-up, starting with standardized reports for their staff to fill out and building team and player evaluation mechanisms based on key words and concepts driven by the club and its staff. This was an excellent presentation for clubs that want to dip their toes in the analytics pool but are struggling to identify where to get started.

Rice Soccer Analytics: From Insights to Impact

Kushal Gupta, Rice University & src | ftbl

Even teams at the NCAA level want to invest in data to streamline their game preparation, as evidenced by Gupta’s student group’s tight integration with the Rice University women’s soccer team coaching staff. Gupta and his team of <10 students act as the team’s match analyst corps, preparing opposition analysis presentations using Wyscout data to identify key themes and players for the technical staff to design training and tactics around. In 2025, the group will expand to post-match analysis and report directly to Rice head coach Brian Lee, who has been incredibly complimentary of the impact that the analytics group have had on his team’s success. Gupta underscored the impact students and interns can have with the resources they have available (free or otherwise) -- they just need opportunities to do work AND develop trust with stakeholders.

Speedrunning an Analytics Department in 50 Days

Arielle Dror, Director of Data & Analytics - Bay FC

Ever build a plane while it’s picking up speed down the runway and then end up having to replace an engine while it’s airborne? No? Just Arielle at Bay FC? An expansion team might be the startup environment to end all startup environments. Starting an expansion team in the Bay Area might be the most startup to ever startup. Dror walked through all of the many hats she wore through the first 18 months of Bay FC’s existence, detailing the many lessons she learned (including those about video drones) along the way.

Heading into the Olympics break, the club sat near the bottom of the league, but after many tough conversations and a retooling of the roster and team style, Bay made a late push into the playoff field. The secret? Trust: both between players and the technical staff and between the technical staff and the analytics group. But how do you build trust with a staff not used to sporting data or environments with sporting data? You can’t assume what knowledge they have or don’t have: it’s your responsibility as a practitioner to meet them where they are and grow with them. This sort of stuff is an analytics cliche, but in practice, it’s actually very difficult to execute. Dror credited her department’s ability to have honest conversations with the technical staff and the work her team put in to always make data-driven conclusions non-judgmental as major factors towards their success in the summer and the latter stages of the NWSL regular season.

Building Shiny Apps for Rapid Prototyping

Lydia Jackson, Machine Learning Engineer - Teamworks/Zelus

Have you ever wondered how data gets into an analyst’s hands? When we think about soccer analytics, rarely do we consider the systems atop which we do soccer analytics: how data comes from the provider to the database, how it goes from the database to the models, and how the models get run. Speaking from experience, building these systems is quite a heavy lift and there’s not much special sauce involved (until, of course, you get to the models themselves). That’s why Jackson’s (another ASA alum) talk was so very interesting: soccer software is really just software, and engineers run into many of the same problems in soccer as they do working, say, on email marketing automations. There were two major pieces to her presentation:

Good software gets out of the way for an end-user. By the same token, a good software platform gets out of the way of the users developing on top of it. By providing standard tooling to build projects and applications, Jackson and her team ensure that data scientists don’t need to worry about how they’ll ship and configure their software for deployment -- these frameworks abstract away platform level details and let their customers do their jobs.
As a software process matures, you’ll need logging and observability mechanisms to help proactively discover and solve problems. Reading and searching through logs is easy for software developers with knowledge of how things can fail, but what if a software developer isn’t the one who notices something is missing? Can we build systems that allow even non-systems-savvy folks to solve problems? This second-order line of questioning is how Jackson and her team came up with their own command center and alerting tool-set, unifying logging, alerting, and common solutions to allow anyone in the organization to debug and solve problems in data sync logic.

Like I said, soccer software is just software -- to a point. Occasionally, soccer data types will talk a big game about “big data” and the “millions of data points” that go into the player reports they generate. However, compared to email transactional data (think millions of email-opens or clicks per hour), the scale of soccer data is actually quite small. You don’t need a lot of computing power to get things done: in fact, Jackson’s team runs all of their projects off a single AWS EC2 instance.

You might not need a lot of computing power to make a big impact, but what you DO need is good infrastructure. Performance analysts, sport scientists, etc should be doing their jobs, not doing data engineering or writing boilerplate application code. These auxiliary tasks that they might take on to build platforms to do their jobs takes away from the time they spend on actually doing their jobs. Investing in data infrastructure and building standardized and resilient platforms abstracts these tasks away from other staff so they can focus on their core competencies.

SkillCorner Presentations

ASI featured a research competition sponsored by data provider SkillCorner, giving groups access to Wyscout and SkillCorner’s own tracking data derived metrics for the 2024 NWSL and 2023/2024 WSL seasons. Representatives from SkillCorner introduced the competition structure and dataset in their own timeblock, while short presentations from the competition finalists were interspersed throughout the Day 2 schedule. Woven throughout these presentations was this desire for more: both in terms of time and data. Each research team had clear next steps they’d have taken with more than the allotted six weeks, and from speaking to others about these types of presentations, there’s this craving for more complex data sets to work with outside of these competitive, time-bound settings. Below are some short summaries from each presentation:

Tracking Off-Ball Runs: Analyzing a Defender’s Responsibilities and Abilities in Space

Lucas Kimball - Atlético Clube de Portugal

Kimball assigned a primary defender to every off-ball run by calculating the closest defender near the runner and then evaluated said defender on how they defended the run. This made for an interesting analysis of players’ 1v1 off-ball defending with some nifty radar charts that featured different metrics of importance for different position groups.

Evaluating the Wide Attack

Lori He - Second Spectrum

He modeled success in wide attacking areas based on the movement of the ball within a short event window after a completed pass -- if the team in possession maintained possession and moved the ball two meters closer to the goal within the window, the wide attacking sequence was considered successful. Part of the analysis relied on the novel concept of “sum of freedom”, which represented the total Voronoi area available to the player in possession adjusted based on which areas were closer to goal (using a normal line between the player and the goalmouth). Evaluation of different modeling techniques revealed that a gradient boosted model worked better to evaluate attackers’ success in achieving “success”, while the “paired” defenders for these attackers were evaluated better via a random forest model.

Dual Dependency: Analyzing the Winger and Wingback Relationship

Sebastian Bush, Theo Schmidt, Caleb Heller, Nick Rovelli - Syracuse University

Yet another ASI presentation featuring an ASA contributor (Sebastian!) Bush and his team took a multi-step approach to evaluating the interplay of wingers and fullbacks:

Build playing style clusters for wingers and fullbacks based on off-ball runs
Predict how the style and offensive performance of a wingback affects the offensive performance of their paired winger
Predict how the style and defensive performance of a winger affects the defensive performance of their paired fullback
Evaluate how different pairings of winger and fullback styles affect team performance (via team xGD in 15-minute segments) via a joint-effects model

Bottom line: the result metrics for these modeled relationships are so low (EX: a R^2 of 0.15 between offensive wingback/winger performance) that there’s no clear evidence that they truly exist. Further, there’s no clear optimal combination of playing styles -- the importance of performance of the interplaying players trumps that of their archetypes. Is this a null result in a research project? Yes! Have we still learned something about the game from this result? Also, yes!

Quantifying Midfielders’ Efficiency

Joseph Sosa, James Zhong, Trey Hibbard - Auburn University

This team took a different tack with SkillCorner’s data, choosing instead to focus on the performance of midfielders under pressure based on the number of defenders surrounding them. There were a number of efficiency metrics to be had here:

Ball Control Efficiency
Short-Distance Pass Efficiency
Shot Selection Efficiency
Long-Distance Pass Efficiency

Sosa et al focused on players’ Ball Control Efficiency rating, measured as the mean amount of time a player’s ball possession probability exceeded a given threshold for now, adding that working out these other metrics would increase the validity of a more holistic efficiency metric downstream.

Hold the Line: Strategic Insights into High Line Defense in Professional Women’s Soccer

Ben Thorpe - Yale University, Adam Cohen - Duke University

Thorpe and Cohen analyzed the effectiveness of playing a high line with centerbacks against three types of attacks they identified in the data: counterattacks, quick build-up play, and heavy possession build-up play, evaluating the success of defenders via the average expected threat (xT) of completed passes allowed per possession rather than tackles or interceptions made. The duo’s work reinforced the idea that counterattacks are the best weapon against a high defensive line, generating higher expected threat (xT) than the other two attack styles and producing a shot on more than one of five sequences. Thorpe and Cohen then went on to tackle optimal defensive structures for a high line, noting that league style and intraleague sample size differences: the NWSL preferred a 4-2-4 structure, while the WSL prided itself on a 4-4-2, but while the 4-4-2 was both the most common and most optimal structure (in terms of least xT produced) in the WSL (and the most optimal structure overall), the 4-3-3 was actually the most optimal defensive structure in the NWSL.

There were team and player level takeaways as well (which you should definitely watch the talk for), but the duo noted that there was no team strength adjustment for xT in the analysis (something that they thought any future work should include), which might explain the stark contrast in optimal defensive structures between WSL (traditional league structure with a dominant top 4) and NWSL (which featured extremely high parity in 2023, based on work done by our friends at Expected Own Goals).

1v1 Defending Beyond Tackles and Interceptions

Rob Oakley - Temple University

Oakley focused on modeling players’ 1v1 defending ability based on the “EPV prevented” by their movement in a defensive sequence, calculated by taking the expected EPV of a two-pass sequence and finding the difference between it and the actual EPV from the defender. Oakley defined 1v1 sequences as those where the closest defender was within four meters of the player in possession with the second-closest defender more than eight meters away. There were some fascinating results:

The list of “good” 1v1 defenders per “EPV prevented” in the NWSL in 2024 was vastly different from the list of league leaders in g+ Interrupting. Oakley argued that this meant that “EPV prevented” was measuring a completely different skill from the traditional Interrupting components of tackles and interceptions. Digging a little deeper, it seems that EPVp rewards CMs and DMs more for their possession denial skills than traditional on-ball value metrics like g+, which skew defensive value to central and wide defenders (just by the nature of where they’re positioned -- closer to the goal).
The range of values for NWSL players was clearly different from that of WSL players, indicating difference in team styles across the two leagues.

This was an interesting project and I really liked the methodology, although I’d like to see a comparison of EPVp and a traditional on-ball value’s method of evaluating 1v1s (EX: Statsbomb OBV on Dribble events) AND a comparison of the latter to g+ Interrupting, just to validating that we are truly measuring 1v1 events AND we understand the base rate of success on those events.

The Role of Athleticism in Defensive Transition Match-ups

Alvaro Botto Barilli, Jonathan Hamil - University of Texas

Barilli and Hamil put together arguably the most creative project of the finalists, integrating physical performance data (runs, sprints, etc) into their analysis of transition moments in NWSL. Their random effects model blended the “athletic scores and categories” of both teams, the formations of both teams, and a random-effects variable representing the team to determine the convex hull area created by the defensive team. Athletic scores and team athletic categories were built off a combination of high-speed runs, sprints, high accelerations, and the count of explosive accelerations into sprints. Barilli and Hamil found that formations ended up being the key predictor of defensive area, concluding that defensive structure and matchups mattered more than raw athletic ability.

Research Presentations

ASI’s SkillCorner presentations complemented a set of talks on cutting edge research from university groups.

Soccer Analytics Using Event Data and Tracking Data

Tianyu Guan, Assistant Professor of Statistics - York University

Guan featured three different projects built off a fully-unified event and tracking dataset from the 2019 season of the Chinese Super League (CSL). While the CSL isn’t great (in Guan’s own admission) in terms of quality, its dataset was broad and deep enough to allow Guan’s teams to pick out some key trends in their projects.

“Should You Park the Bus?” (aper): was Jose Mourinho right -- should you park the bus when leading? Guan et al found that a team’s dedication to compact defending (defined as the area of the convex hull created by the defenders in various intervals) while up late served to produce poorer outcomes (as measured by the number of shots allowed). Via an additional three-way ANOVA analysis, Guan et al reinforced traditional thinking about game state: weaker teams (based on pre-match betting odds) that are leading late tend to be the most cautious (again, based on the area of their convex hull), and often teams that were tied into the final five minutes of regulation backed off and settled for draws.
“Comparison of Individual Playing Styles in Football” (paper): Guan and Swartz built a variety of metrics that evaluated player attacking and defensive performance in the defensive, middle, and attacking thirds in a single match, then compared the Kullback-Leiber (KL) divergence of various players’ distributions of per-match values to determine similar players.
“Acceleration and age in soccer” (paper): Guan and Swartz evaluated age effects on performance via average maximum acceleration, finding that the smoothed curve for acceleration versus age tracks similarly with the curve for performance versus age (derived by a previous analysis by Swartz et al). With only a single season of CSL data available, the pair generated more data for their models by taking these player “functional snippets” and applying a linear smoothing method rather than using the full smoothed curve across the arc of each player’s career.

Large Language Models in Sports Analytics

Weining Shen, Associate Professor of Statistics - UC Irvine

[Video not available]

Shen provided an overview of the use of large language models (LLM) in sport, focusing on the potential application in soccer of these models’ image/video recognition in evaluating or conducting refereeing, especially in controversial situations. There’s certainly some thrust behind this work, especially given the variability in refereeing performance (see: fan sentiment after every match-week in the Premier League) and the encroachment of corrective technology in officiating (see: VAR, semi-automated offside, goal-line technology, etc). Shen’s presentation reminded me of a similar application of LLMs from the Sloan research paper competition last year. I see more of a value in LLMs to do scenario planning and tactical analysis -- producing novel situations and evaluating counterfactuals seems like a really compelling use of this kind of technology.

Papers: https://arxiv.org/abs/2410.08474, https://arxiv.org/abs/2406.12252, https://www.mdpi.com/2079-9292/14/3/461

Beyond the Expected: Tailoring Analytics for Women’s Soccer

Sachin Narayanan, Doctoral Candidate - Florida State University

At ASI, Narayanan summarized two of his recent papers:

While tracking the evolution of the women’s game via the development of national team style is certainly interesting, the more compelling part of Narayanan’s work (to me, anyway) was from the latter study: his construction, evaluation, and cross-validation of expected goals (xG) and post-shot expected goals (PSxG) models for both the men’s and women’s games. You should read the paper and look at the fancy charts, but here are the key points from the cross-validation analysis (emphasis my own):

“[R]esults suggest that the men’s model estimates higher xG values for women’s shots taken closer to the goal compared to the women’s model, with an average difference of approximately 8%. However, right around the 6-yard box, this relationship was briefly reversed as the women’s model predicted a higher xG.”
“When the goalkeeper was much closer to the goal, both models had similar xG values, but in situations where the keepers were further off their line (likely rushing out to meet a shot-taker), the women’s model predicted a higher likelihood of goal-scoring with xG values that were typically 5%–10% higher than those predicted by the men’s model.”
“At very close distances (five yards or less), the men’s model predicted higher PSxG values than the women’s model for the women’s test data. Between six and 12 yards, the trend reversed, and the women’s model predicted higher PSxG.”
“The impact of goalkeeping was also found to vary. Closer shots, higher velocity shots, and shots placed nearer to the posts were all necessary to generate greater, positive changes in men’s PSxG. Conversely, women’s PSxG was more positively affected as the height of an on-target shot increased, with the models also suggesting that ground shots typically produce higher PSxG estimates. Combined with the greater effectiveness of the lob shot and a higher risk of conceding when far off their line, it seems that female goalkeepers exhibit a different style of goalkeeping in comparison to their male counterparts, a style which may be further identified through attributes (e.g., ball distribution) not analyzed in this study (Riley, 2023).”

Comparing the SHAP values of the models revealed a few more key differences, namely that the women’s game featured “marginally higher” defensive intensity on shots and on-shooter pressure AND that a higher ball speed at the time of reception generally nerfed the xG of shots in the women’s game.

Narayanan argues that his findings go past just the adjustment of metrics and models for the women’s game: we have to rethink performance environments from the ground up. The first step is better and wider data collection: developing equivalent ground truth datasets for the women’s game can only serve to refine these types of analyses.

Philosophy Presentations

Mixed into the more technical presentations were two talks that were broader in scope, considering the nature and impact of analytics work and its downstream effects.

Soccer Analytics: Science or Alchemy?

Stefan Szymanski, Professor of Sport Management - University of Michigan

At ASI, Dr. Szymanski put a different spin on the analysis from his Soccernomics books: how much can the market value of a squad (as defined by TransferMarkt) explain their match results? Taking that a step further: if market value explains so much of match results, what’s the point in trusting online picks or models to beat the books -- what value are they actually providing? These companies might say they throw in everything plus the kitchen sink into their model on their websites, but if they don’t reveal what’s _actually_ going into their models AND their results barely beat the simplest possible model, what’s the point of the entire exercise? Additionally, how can you trust what you can’t recreate? In short, Dr. Szymanski encourages the return of the age of the soccer blogger and the public posting of data and methodology to generate, discuss, and refine new ideas.

Numbers Talk — Now Make Them Tell a Story

John Muller, Journalist - The Athletic, FiveThirtyEight, etc

There’s a point on the soccer data Dunning-Kruger scale where you think “huh, why show anything else other than this number that says this player is good? Why hasn’t my club bought them yet solely off this? He’s clearly better on this metric -- nothing else really matters.” Unfortunately for you, that’s right at the top of the Mountain of Stupid.

Relying on “number good so player good” is reductive. If a league season is a book, a player is a fully-fleshed out character with their own motivations and their own struggles -- their performance metrics have to come with the context in which they were created. Consider this holistic approach in the context of the scouting approach as described by the Dynamo and Fire: yes, it’s possible for a sporting director to ask “bottom line: is this player good?”, and sure, maybe the logical answer is just a simple yes/no. However, realistically, that conversation is going to sound a lot like a “yes, BUT” with a subsequent discussion of player strengths, weaknesses, and confounding factors (playing time, support system, manager fit, etc).

People

Of course, it’s the attendees that make the event, and in this, ASI excelled: ASI was the event du jour in the online soccer analytics community and made for a fantastic setting to connect so many of those virtual friends in real life for the first time. For those that came to the conference hoping to make those kinds of connections found a variety of opportunities to do so: roundtable discussions and a Women in Soccer Analytics networking reception on Day 1 helped set the stage for casual conversation and deeper discussions during breaks on Day 2. Events like the Expected Own Goals live podcast and ASA’s own meetup at Pitch 25 helped color in the lines of these online connections.

Where else can you say you heard people digging through the methodology of a research project using tracking data a few minutes prior to eavesdropping on a conversation about the arcane nature of MLS’s roster rules? Social media has always been the backbone of the soccer analytics community, with seemingly half of the giants of the field having a lengthy Twitter and blog posting history behind them. The ability for events like ASI to take the deep thinking about the game and its structure that masquerades as banter out of social media posts and into real life really strengthens those relationships and connections.

Conclusion

On the afternoon of Day 2, as Dr. Szymanski walked through the value of proper analytics work versus gambling snakeoil, he keyed on this idea of methodological secrecy: betting houses often sell their models on the basis that their internal work produces some “special insight” that can surely be trusted to win bets. From the outside looking in, working at a soccer organization is a similar proposition: surely there’s some locked up secret competitive advantage they’ve found in the data, right? Certainly, there might be: well-run clubs want to figure out how to win more games and they’ll fund and produce novel analysis to help them do so, but solely for their own gain and under lock and key. The public gains little insight into how or what the club is doing, only the plausible deniability that the club is doing something. There’s no learning involved: just a hole in our collective soccer corpus that we might not even know exists.

There’s a second layer problem to this as well: resourcing. While public soccer data exists (thanks to Hudl, Statsbomb, and SkillCorner), it’s limited in scope and complexity. Larger sets of event data and the golden goose of tracking data are behind such high paywalls that the only logical customers can be clubs and large soccer organizations, not a single fan working at a coffee shop off their laptop. Beyond the collective desire to do public independent analysis, these barriers to entry mean that the only groups that can drive our body of soccer knowledge forward are clubs. If they hold the keys to the car and the garage, how can the public even learn to drive?

Organizations like ASA and events like ASI become force multipliers in these resource-limited environments. Together, ASA and ASI serve three critical functions for public soccer analytics discourse:

Larger, more diverse datasets (unified events + tracking, etc) need to be available (especially for the women’s game).
The public needs more opportunities to work with diverse datasets (research competitions, hackathons, etc).
The public needs places to discuss its work (more blogs, more conferences, more meetups, etc). The nuance of 280 social media characters is almost always not enough.

Soccer nerds, like most humans, want to communicate -- they want to share, they want to learn, they want to grow, and they want to understand more about the game itself. Soccer data doesn’t exist in a vacuum, and numbers can lie: by building spaces where analysts (professional or hobbyist) can discuss ideas in earnest and refine how they understand the game by connecting with other experienced professionals, everyone grows together. This, not the actual downstream number-crunching, is the “good work” of public soccer analytics: fostering community and providing rich resources for collaboration.

On paper, it sounds outrageous to say that a first-year event is now a load-bearing pillar of American soccer data, but every single person I spoke to this weekend raved about the level of care put into this event. Scott Powers and his co-organizers Zefa Tullis-Thompson, Rose Graves, and Seth Davidson, along with the help of industry centerpieces Sarah Rudd, Ravi Ramieni, and Sam Gregory and support from Tom Stallings at Rice and Guanyu Hu from UTHealth-Houston, all built a large tent for soccer nerds of all stripes to mesh and grow together for a weekend. Who knows how big that tent could get in years to come?