close
close
The Arcane Network. How to Use Network Science and Python… | by Milan Janosov | Dec. 2024

How to use network science and Python to design the popular show

Milan Janosov
Towards data science

The second season of Arcanea recent blockbuster series on Netflix based on the universe of one of the most popular online video games of all time, League of Legends, set in a fantasy world with a strong steampunk design, completed with amazing graphics and a record-breaking budget. As a good network and data scientist with a particular interest in turning pop culture elements into data visualizations, this was all I needed after completing the final season to uncover the hidden connections and transform the plot of Arcane into a network visualization – using Python. Therefore, by the end of this tutorial you will have practical skills for creating and visualizing the network behind Arcane.

However, these skills and methods are absolutely not specific to this story. In fact, they highlight the general approach that network science offers to map, design, visualize and interpret networks of any complex system. These systems can range from transportation and COVID-19 spread network patterns to brain networks to various social networks like that of the Arcane series.

All images were created by the author.

Since we are presenting the connections behind all the characters here, we first need to make a list of all the characters. For this, the Arcane Fan Wiki page is an excellent source of free-to-use information (CC BY-SA 3.0) that we can easily access using simple web scraping techniques. Namely, we will use urllib to download and use BeautifulSoup to extract the names and fan wiki profile URLs of all characters listed on the main character page.

First, download the HTML code of the character list page:

import urllib
import bs4 as bs
from urllib.request import urlopen

url_char = 'https://arcane.fandom.com/wiki/Category:Characters'

sauce = urlopen(url_char).read()
soup = bs.BeautifulSoup(sauce,'lxml')

Then I extracted all potentially relevant names. One can easily find out which tags should be fed to the parsed HTML code stored in the soup variable by simply right-clicking on a desired element (in this case a character profile) and in a select the “Element Inspection” option in any browser.

From this I learned that a character’s name and URL are stored in a line that contains “title=” but does not contain “:” (which corresponds to categories). Additionally, I created a Still_Character flag that helped me decide which subpages on the character list page still belong to legitimate characters in the story.

import re

chars = soup.find_all('li')
still_character = True
names_urls = {}

for char in chars:

if '" title="' in str(char) and ':' not in char.text and still_character:

char_name = char.text.strip().rstrip()

if char_name == 'Arcane':
still_character = False

char_url = 'https://arcane.fandom.com' + re.search(r'href="((^")+)"', str(char)).group(1)

if still_character:
names_urls(char_name) = char_url

The previous block of code creates a dictionary (“names_urls”) that stores the name and URL of each character as key-value pairs. Now let’s take a quick look at what we have and print out the name-URL dictionary and its total length:

for name, url in names_urls.items():
print(name, url)

An example of the output of this code block, where we can text any link pointing to each character’s bio profile:

print(len(names_urls))

Which code cell returns the result 67, which indicates the total number of named characters we have to deal with? This means we’re already done with the first task – we have a comprehensive list of characters, as well as easy access to their full text profile on their fan wiki pages.

To map the connections between two characters, we find a way to quantify the relationship between any two characters. To capture this, I rely on how frequently the two characters’ biographies refer to each other. To achieve this, from a technical point of view, we need to collect these complete biographies that we have just received the links to. We’ll achieve this again using simple web scraping techniques and then save each site’s source locally in a separate file as follows.

# output folder for the profile htmls
import os
folderout = 'fandom_profiles'
if not os.path.exists(folderout):
os.makedirs(folderout)

# crawl and save the profile htmls
for ind, (name, url) in enumerate(names_urls.items()):
if not os.path.exists(folderout + '/' + name + '.html'):
fout = open(folderout + '/' + name + '.html', "w")
fout.write(str(urlopen(url).read()))
fout.close()

At the end of this section, our fandom_profiles folder should contain the fanwiki profiles of each Arcane character – ready for processing as we prepare to build the Arcane network.

To build the network between characters, we assume that the intensity of interactions between two characters is signaled by how often each character’s profile mentions the other. Therefore, the nodes of this network are the characters connected with connections of varying strength depending on how often each character’s wiki site source points to another character’s wiki.

Building the network

In the following block of code, we create the edge list – the list of connections that contains both the source and destination nodes (characters) of each connection, as well as the weight (coreference frequency) between the two characters. To perform the profile search effectively, I also create a “names_ids” that contains only the specific identifier of each character, without the rest of the web address.

# extract the name mentions from the html sources
# and build the list of edges in a dictionary
edges = {}
names_ids = {n : u.split('/')(-1) for n, u in names_urls.items()}

for fn in (fn for fn in os.listdir(folderout) if '.html' in fn):

name = fn.split('.html')(0)

with open(folderout + '/' + fn) as myfile:
text = myfile.read()
soup = bs.BeautifulSoup(text,'lxml')
text = ' '.join((str(a) for a in soup.find_all('p')(2:)))
soup = bs.BeautifulSoup(text,'lxml')

for n, i in names_ids.items():

w = text.split('Image Gallery')(0).count('/' + i)
if w>0:
edge = '\t'.join(sorted((name, n)))
if edge not in edges:
edges(edge) = w
else:
edges(edge) += w

len(edges)

As this block of code runs, it should return about 180 edges.

Next, we use NetworkX’s graph analysis library to convert the edge list into a graph object and output the number of nodes and edges the graph has:

#  create the networkx graph from the dict of edges
import networkx as nx
G = nx.Graph()
for e, w in edges.items():
if w>0:
e1, e2 = e.split('\t')
G.add_edge(e1, e2, weight=w)

G.remove_edges_from(nx.selfloop_edges(G))

print('Number of nodes: ', G.number_of_nodes())
print('Number of edges: ', G.number_of_edges())

The output of this code block:

This output shows us that although we started with 67 characters, 16 of them ended up not being connected to anyone on the network, which is why the number of nodes in the created graph is less.

Visualization of the network

Once we have the network, we can visualize it! First, let’s create a simple draft visualization of the network using Matplotlib and NetworkX’s built-in tools.

# take a very brief look at the network
import matplotlib.pyplot as plt
f, ax = plt.subplots(1,1,figsize=(15,15))
nx.draw(G, ax=ax, with_labels=True)
plt.savefig('test.png')

The output image of this cell:

While this network already gives some clues about the main structure and common features of the show, we can design a much more detailed visualization using the open source network visualization software Gephi. To do this, we first need to export the network to a .gexf graphics data file as follows.

nx.write_gexf(G, 'arcane_network.gexf')

Now the tutorial for visualizing this network with Gephi:

Video tutorial on YouTube: https://www.youtube.com/watch?v=utm91FhZalQ

Extras

Here comes an expansion part that I refer to in the video. After exporting the node table including the network community indexes, I read this table using pandas and assigned individual colors to each community. I received the colors (and their hex codes) from ChatGPT and asked them to match the main color themes of the show. Then this block of code exports the color – which I in turn used in Gephi to color the final diagram.

import pandas as pd
nodes = pd.read_csv('nodes.csv')

pink = '#FF4081'
blue = '#00FFFF'
gold = '#FFD700'
silver = '#C0C0C0'
green = '#39FF14'

cmap = {0 : green,
1 : pink,
2 : gold,
3 : blue,
}

nodes('color') = nodes.modularity_class.map(cmap)
nodes.set_index('Id')(('color')).to_csv('arcane_colors.csv')

When we colored the network based on the communities we found (communities meaning highly interconnected subgraphs of the original network), we discovered four main groups, each corresponding to specific groups of characters within the plot. Not surprisingly, the algorithm grouped the main protagonist family with Jinx, Vi and Vander (pink). Then we also see the group of underground figures from Zaun (blue), such as Silco, while the elite from Piltover (blue) and the militaristic force (green) are also well grouped.

The beauty and utility of such community structures is that while such explanations make them very easy to contextualize, it would normally be very difficult to create a similar map based solely on intuition. While the methodology presented here clearly shows how we can use network science to extract the hidden connections of virtual (or real) social systems, be they the partners of a law firm, the employees of an accounting firm and the human resources department of a major oil company.

Leave a Reply

Your email address will not be published. Required fields are marked *