Tutorial: Match Sheet Creation

In this tutorial we will create match sheets from the openly published event data from StatsBomb with the interface and objects provided by floodlight. Our goal is to load a match from the dataset, extract information about the scored goals, and use this information to create a match sheet.

Setup

First we need some data to work with. The open StatsBomb dataset contains (amongst others) data from the UEFA Euro 2020 with (partial) information about the player positions at the events which can be used for our purpose. From this dataset we load a single match from the dataset and also get the corresponding pitch information.

from floodlight.io.datasets import StatsBombOpenDataset

# load a match from the UEFA Euro 2020
dataset = StatsBombOpenDataset()
events_objects, teamsheets = dataset.get("UEFA Euro", "2020", "Croatia vs. Spain")
pitch = dataset.get_pitch()

# unpack the queried data
home_ht1 = events_objects["HT1"]["Home"]
home_ht2 = events_objects["HT2"]["Home"]
away_ht1 = events_objects["HT1"]["Away"]
away_ht2 = events_objects["HT2"]["Away"]

The variables home_ht1, home_ht2, away_ht1, and away_ht2 are Events objects containing the events of the teams during the first and second half. These will be used to create the match sheets. The pitch variable is a Pitch object that contains information regarding the pitch specification and coordinate system our data live in.

Data Preparation

To create match sheets from the event data we want to select certain (important) events to look at. To keep it short and simple we stick to goals. We use the select function from the floodlight.core.events submodule to find all shots with a positive outcome (1).

home_goals_ht1 = home_ht1.select(
    conditions=[("event_name", "Shot"), ("outcome", 1)]
)
home_goals_ht2 = home_ht2.select(
    conditions=[("event_name", "Shot"), ("outcome", 1)]
)
away_goals_ht1 = away_ht1.select(
    conditions=[("event_name", "Shot"), ("outcome", 1)]
)
away_goals_ht2 = away_ht2.select(
    conditions=[("event_name", "Shot"), ("outcome", 1)]
)

Similarly, we must not forget about own goals in the data!

home_owngoals_ht1 = home_ht1.select(
    conditions=[("event_name", "Own Goal For")]
)
home_owngoals_ht2 = home_ht2.select(
    conditions=[("event_name", "Own Goal For")]
)
away_owngoals_ht1 = away_ht1.select(
    conditions=[("event_name", "Own Goal For")]
)
away_owngoals_ht2 = away_ht2.select(
    conditions=[("event_name", "Own Goal For")]
)

Finally, we collect all goals into a single pandas DataFrame.

import pandas as pd

all_goals = pd.concat(
    (
        home_goals_ht1,
        home_goals_ht2,
        home_owngoals_ht1,
        home_owngoals_ht2,
        away_goals_ht1,
        away_goals_ht2,
        away_owngoals_ht1,
        away_owngoals_ht2,
    )
).sort_values("gameclock")

Here’s the (formatted) DataFrame you should get:

eID

gameclock

pID

tID

mID

outcome

timestamp

minute

second

at_x

at_y

to_x

to_y

event_name

player_name

team_name

qualifier

25

1172.344

nan

785

3794686

nan

0:19:32.433

19

32

68.3

62.1

nan

nan

Own Goal For

None

Croatia

16

2248.398

6720

772

3794686

1

0:37:28.398

37

28

109.0

43.3

120.0

42.6

Shot

Pablo Sarabia Garcia

Spain

16

3366.771

3957

772

3794686

1

0:11:06.771

56

6

115.3

42.4

120.0

41.0

Shot

Cesar Azpilicueta Tanco

Spain

16

4562.056

6748

772

3794686

1

0:31:02.056

76

2

112.1

51.2

120.0

39.5

Shot

Ferran Torres Garcia

Spain

16

5056.385

16527

772

3794686

1

0:39:16.385

84

16

119.0

40.9

120.0

42.5

Shot

Mislav Orsic

Croatia

16

5511.058

11603

772

3794686

1

0:46:51.058

91

51

114.2

37.2

120.0

41.9

Shot

Mario Pasalic

Croatia

Data Extraction

Alright, now let’s try to extract the relevant information from the above DataFrame. First we want to extract some meta information about the goals. For later use we write a function get_goal_info(goal) for that matter.

import ast

def get_goal_info(goal):
    scoring_team = goal["team_name"]
    if goal["event_name"] == "Shot":
        scoring_player = goal["player_name"]
        xG = ast.literal_eval(goal["qualifier"])["shot"]["statsbomb_xg"]
    else:
        scoring_player = "Own Goal"
        xG = None
    return scoring_team, scoring_player, xG

Next, we deal with the previously mentioned StatsBomb360 position data. The appropriate floodlight object to deal with position data is a XY object. To create XY objects that relate to a single frame of the match we have to bring them into shape (1, N). Therefore we define the function get_xy_data(goal).

import numpy as np
from floodlight import XY

def get_xy_data(goal):
    # read positions at event
    qualifier = ast.literal_eval(goal["qualifier"])
    freeze_frame = None
    if "360_freeze_frame" in qualifier:
        freeze_frame = qualifier["360_freeze_frame"]
    # set "to-location" to goal center if not available
    at_x, at_y, to_x, to_y = goal["at_x"], goal["at_y"], goal["to_x"], goal["to_y"]
    if np.isnan(goal["to_x"]):
        to_x = 120
    if np.isnan(goal["to_y"]):
        to_y = 40
    xy_ball = np.array([[at_x, at_y], [to_x, to_y]])
    xy_off, xy_def = None, None
    if freeze_frame is not None:
        # create arrays
        xy_off = np.array(
            [player["location"] for player in freeze_frame if player["teammate"]]
        )
        xy_def = np.array(
            [player["location"] for player in freeze_frame if not player["teammate"]]
        )
        # reshape arrays to represent a single frame
        xy_off = xy_off.flatten()
        xy_off = xy_off.reshape((1, len(xy_off)))
        xy_def = xy_def.flatten()
        xy_def = xy_def.reshape((1, len(xy_def)))
    # return XY objects
    return XY(xy=xy_ball), XY(xy=xy_off), XY(xy=xy_def)

Plotting

Now we can use the predefined functions to create a plot of a single goal (e.g. the last) with the plotting functionality of the XY and Pitch object.

import matplotlib.pyplot as plt
goal = all_goals.loc[all_goals.index[-1]]
fig, ax = plt.subplots()
scoring_team, scoring_player, xG = get_goal_info(goal)
ax.set_title(
    f"Goal for {scoring_team} by {str(scoring_player)} "
    f"|| xG: {round(xG, 2) if xG is not None else 'NA'}",
    fontdict={"size": 9},
)
pitch.plot(ax=ax)
xy_ball, xy_off, xy_def = get_xy_data(goal)
xy_ball.plot(
    t=(0, 2),
    plot_type="trajectories",
    ball=True,
    color="k",
    linewidth=2,
    linestyle="--",
    marker="X",
    markevery=[0],
    ax=ax,
)
if xy_off.xy is not None and xy_def.xy is not None:
    xy_off.plot(t=0, ax=ax, color="red")
    xy_def.plot(t=0, ax=ax, color="white")
../_images/tutorial_matchsheets_singlegoal.png

This is a neat start! However, our goal is to summarize the whole match into a single match sheet that displays all the goals.

Therefore, we setup a grid of subplots (in this case a 2x3 grid for the six goals). We add a legend with our designated colors for the two teams.

import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

rows = np.minimum(len(all_goals), 2)
cols = int(np.ceil(len(all_goals) / 2))
fig, ax = plt.subplots(rows, cols, figsize=(14, 7))
plt.suptitle("Match Sheet: Croatia vs. Spain (EURO 2020)")
plt.legend(
    handles=[
        mpatches.Patch(label="Croatia (left to right)", color="white"),
        mpatches.Patch(label="Spain (right to left)", color="red"),
    ]
)
../_images/tutorial_matchsheets_grid.png

Now we create the match sheet by iterating over all goals and updating the respective subplots. For visibility we want to display the goals for Spain at the left side of the pitch. Therefore we use the rotate and translate function of the floodlight XY module.

row, col, home_score, away_score = 0, 0, 0, 0
colors = {"Croatia": "white", "Spain": "red"}
for idx in all_goals.index:
    # display meta information
    scoring_team, scoring_player, xG = get_goal_info(all_goals.loc[idx])
    if scoring_team == "Croatia":
        conceding_team = "Spain"
        home_score += 1
    else:  # score by Spain
        conceding_team = "Croatia"
        away_score += 1
    ax[row, col].set_title(
        f"{home_score}:{away_score} for {str(scoring_team)} by {str(scoring_player)} "
        f"|| xG: {round(xG, 2) if xG is not None else 'NA'}",
        fontdict={"size": 10},
    )
    # get position data
    xy_ball, xy_off, xy_def = get_xy_data(all_goals.loc[idx])
    # rotate position data towards left goal for Spain
    if scoring_team == "Spain" and xy_off.xy is not None and xy_def.xy is not None:
        xy_off.rotate(180)
        xy_off.translate((pitch.xlim[1], pitch.ylim[1]))
        xy_def.rotate(180)
        xy_def.translate((pitch.xlim[1], pitch.ylim[1]))
        xy_ball.rotate(180)
        xy_ball.translate((pitch.xlim[1], pitch.ylim[1]))
    # plot pitch and position data
    pitch.plot(ax=ax[row, col])
    xy_ball.plot(
        t=(0, 2),
        plot_type="trajectories",
        ball=True,
        color="k",
        linewidth=2,
        linestyle="--",
        marker="X",
        markevery=[0],
        ax=ax[row, col],
    )
    if xy_off.xy is not None and xy_def.xy is not None:
        xy_off.plot(t=0, ax=ax[row, col], color=colors[scoring_team])
        xy_def.plot(t=0, ax=ax[row, col], color=colors[conceding_team])
    # update grid position
    col += 1
    if col == cols:
        col = 0
        row += 1

The result should look similar to the image below. However, due to an update in the StatsBomb dataset the FreezeFrame for the OwnGoal is no longer available. Thus, there will only be the trajectory of the ball in this plot. Also, keep in mind that the StatsBomb360 data does only contain the positions from some players at the event (extracted from the camera angle). That’s why you can not see the player responsible for the own goal in the first plot.

../_images/tutorial_matchsheets_allgoals.png

Feel free to try out this code with other matches from the StatsBomb dataset (dataset.available_matches) and also to experiment with other event types, plotting styles and your own ideas!