Fantasy Premier League with Machine Learning

Fantasy Premier League (FPL) is an online football game in which we can choose a specific combination of actual Premier League players. Points will be earned based on the player’s actual performance and their contribution on the match day.

As a football buff and an analytics enthusiast, FPL provides an awesome opportunity to combine the best of both worlds.

FPL is growing rapidly as it reaches 7 million users this season, so the competition is getting quite fierce.


FPL gives the fans an opportunity to manage actual Premier League players out of a total of 20 teams. Each player has a different price tag assigned. So, fans need to manage their team carefully within a limited budget to assemble the best possible team.

The performance of the players (forwards, mid-fielders, defenders, and keeper) is measured primarily on their contribution in:

  • goals
  • assists
  • clean-sheets

However, when players concede goals or get booked points are also deducted.

In the beginning, the FPL managers have a budget of 100 pounds to select 15 players (3 FWD, 5 MID, 5 DEF, 2 GK) out of 626 players, with a maximum of three players from one club. 

The managers can transfer only one player every Gameweek (GW) and select a replacement for someone in the team.

The trickiest part of all the captain selection as the score gets doubled for the captain.

So, one has to select playing 11 for a Gameweek. The total points these players obtain is your score, with doubled the points of the captain.

The aim of the project is to:

  • make a prediction of how many points each player will score in the next game week and,
  • select the best possible combination of 15 player squad based on the prediction and other constraints (budget, etc.)
  • finish in the top 1% of FPL

Exploratory Data Analysis

The data have various variables such as player, total points, opponent, home or away fixture, goals, assists, minutes played, player form and many more.

Currently, the analysis is done using the data until GW26.

Let’s check the top 10 players with most points.

De BruyneMID108161

However, the most valuable player will be the one having most points and lesser cost i.e. player high points per cost (or points per million).

Let’s plot a scatter plot having players’ costs versus total points to find out the most valuable players visually. Hover over the points to see the player names and other details.

The above graph provides important insights about the players who cost less than 8 million, however, they still fetched more than 120 points (till GW26). There are some clear winners who have mustered high points such as (Salah, Aubameyang, Vardy, Mane, etc.) but their cost is also quite high. So we can use this graph to find differential players. (A player who has been sparsely selected by other FPL managers is generally considered as a differential player. Naturally, they are risky choices hence they are not among popular picks.)

Machine Learning Model

Let’s predict points scored by each player in the coming Gameweek.

Data Preparation

Currently, I am using the data provided in the Github repository by Vaastav Anand, and it contains FPL data of the last three seasons.

The input data comprises of the week by week stats of each player like,

  • name
  • gameweek
  • home/away
  • fixture_difficulty
  • form (Fantasy’s own ICT index)
  • selected percentage
  • transfers_in
  • transfers_out
  • value
  • chance_of_playing_next_round
  • total_points
  • and many more.

The dependent variable (or y) will total_points (actual Fantasy Points) scored in each round. 

Model Training and Prediction

First, we will create a baseline model using multivariate linear regression. The model will fit and train data until the current gameweek for each player. After the linear regression is fitted, we will predict points (predicted_points) for all the eligible Premier League players for the coming gameweek.

If a player’s chance_of_playing_next_round is 0 (due to injury or suspension), he will be left out of prediction.

There is no limit to trying different models to do prediction. One can use neural net, deep learning, time series, machine learning, Artificial Intelligence, etc.

Team Optimization

Now we have predicted_points for all players, we need to select the best possible team combination with respect to certain constraints i.e. constructing a team maximizing total predicted_points.

The official Fantasy rules are set as constraints, including the £100m budget to select 15 players, a certain number of players in each position (3 FWD, 5 MID, 5 DEF, 2 GK), with a maximum of three players from one club.

Even with the perfect prediction of points for player performances, selection of the optimal team according to the above-mentioned constraints is not easy. I think almost everyone has faced this complication of when we decide on a bunch of players but, we have to shuffle them in and out of the team because of the budget right, too many players from the same team, formation constraints and other reasons.

For this, we will use Linear optimization (or Linear Programming), which is a method to achieve the best outcome (in our case, maximum points) in a mathematical model whose requirements are represented by linear relationships. Linear programming is a special case of mathematical programming (mathematical optimization).

There are various libraries in python that we can use for Linear optimization such as Pulp, Scipy, etc.

Once we get the optimized team, the player with the highest amount of expected points is set as captain, and the second highest as vice-captain.

Predictions in Action

For the coming gameweeks, I will be sharing the optimum team selected by the algorithm and predicted points of each player.

The first set of predictions for Fantasy Premier League Gameweek 28 using Machine Learning is out. Check out the selected team and let me know in the comments.

Start your Data Science Journey here.

Leave a Comment