Scott Smitelli

The Lesser 2024 Olympics Rankings

It has long bothered me that the overall rankings at the Olympic Games simply take each country’s total medal count as the single metric of success. Across all media outlets, whoever won the most medals overall is on top, and everybody else falls in line below. The main exception occurs when a country is doing extremely well, then a particular focus is paid to the number of gold medals won instead. By changing what is measured, the rankings start to change pretty significantly.

The 2024 Summer Olympic Games in Paris saw participation from 206 countries which collectively sent 10,795 competitors How did I get such a specific number? Oh, just you wait. for an average of about 52 athletes per country. But averages are just averages: The United States and France each sent over ten times that amount, while 22 other countries combined sent that many.

This disparity started to bug me, and it got me wondering how different the rankings would look if the medal counts were normalized against the number of people that participated. Did the fact that the U.S. sent a veritable armada of athletes hide any interesting achievements by any smaller countries?

Where’s all the data?

Type some variant of “Olympics results” into, well, anything and you’ll get a deluge of results. But you’ll find pretty quickly that it’s all the same: country name, gold/silver/bronze medal counts, and typically a fourth column for the total medal count. No mention of the number of athletes. Even the official site doesn’t aggregate the data into the required form.

The first website I found that did was surprisingly (and unsurprisingly) Wikipedia. Their page about the event shows the list of all countries that participated, along with the number of participants sent by each. In fact, the linked category page reveals that there is a dedicated article page for each country that participated. The page for Canada, as an example, starts with an “info box” that concisely shows that 315 people competed and the gold-silver-bronze medals earned were 9-7-11.

So this is how it’s gonna go down.

Forgive me Wikipedia, for I have scraped

The category page contains a link to each country page. The default pagination limits the number of links to 200, so it’s necessary to iteratively “walk” the pages by following “next page” links. I lean heavily on the Beautiful Soup library to parse and navigate the HTML structure:

import urllib
from bs4 import BeautifulSoup

ignore_prefixes = (
    '/wiki/Category:',
    '/wiki/Wikipedia:',
    '/w/index.php?title=Category:',
    '/w/index.php?title=Wikipedia:')

cat_url = 'https://en.wikipedia.org/wiki/Category:Nations_at_the_2024_Summer_Olympics'
links = []
while True:
    html = get(cat_url)
    soup = BeautifulSoup(html, features='html5lib')
    mw_pages = soup.find(id='mw-pages')

    has_more = False
    for link in mw_pages.find_all('a'):
        href = link.get('href')
        abs_href = urllib.parse.urljoin(cat_url, href)

        if link.text == 'next page':
            has_more = True
            cat_url = abs_href
        elif not href.startswith(ignore_prefixes):
            links.append(abs_href)

    if not has_more:
        break

# `links` contains one absolute Wikipedia URL for each participating country

Using Python, find all the Wikipedia pages for countries at the 2024 Olympic Games. The get() function is a small wrapper around requests.get(...).content that handles things like raising exceptions for error responses and caching results so I don’t repeatedly redownload the same URLs while doing iterative testing.

The main things to be mindful of here are the fact that some of the links inside the #mw-pages element are for navigation or meta pages that we don’t want to include. The business of urljoin() makes sure that we re-add https://en.wikipedia.org to any URLs that came back in a relative form.

A short aside: What the heck is the purpose of the Wikidata project? I would have assumed that it would be a rigorously annotated knowledge base of all the facts and figures needed to put this analysis together and it’s… not. Maybe I’m dumber than dirt, but I couldn’t find a single usable kernel of knowledge there that helped me do this. With a list of country URLs in hand, now we have to start working on the per-country pages. The HTML that comes from Wikipedia is a little bit… uh… nope, don’t wanna parse that. Luckily the HTML you see is not what the editors actually write. Wikipedia is written in wikitext, and pages frequently use infoboxes to consistently lay out page information. Each infobox is based on a shared template (in this case Template:Infobox country at games) that pulls its data from rigidly-named parameters. All we have to do is get the wikitext for the page and it becomes relatively easy to extract raw infobox source values.

To do that, it’s necessary to pull data from Wikipedia’s API, which I didn’t know anything about before I started this endeavor. I still don’t know anything about it, but at least I managed to bang some rocks together sharply enough to make it do what I wanted it to do. Briefly, the shape of this is:

import json

country_page = 'Canada_at_the_2024_Summer_Olympics'
base = 'https://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvslots=*&rvprop=content&format=json&titles='

data = json.loads(get(base + country_page))
content = list(data['query']['pages'].values())[0]['revisions'][0]['slots']['main']['*']

# `content` is the wikitext of the page as a string

Get the wikitext for Canada’s 2024 Olympics article. I’m quite certain the assignment to content is abject crap.

From there, a (relatively) tame regular expression capturing group like competitors\s*=\s*(\d+) can extract the number of competitors that Canada sent. Similar extractions can be done with gold, silver, and bronze. It was during this process that I discovered that Ghana’s page doesn’t include any medal counts in the infobox, while every other country does. Any Wikipedia editors wanna jump on that?

Armed with all that, and 208 HTTP requests later, we have pretty much what everybody else has… a list of countries, their medal counts, and the various sums that can be found by combining them. This data agrees with the overall stats from the official sources, so we know that Wikipedia got it right and it was scraped accurately.

And now we have a new dimension to play with: Competitors. All 10,795 of them.

Most and least efficient teams

Some uncommon names start to bubble up when we divide the total medal count by the number of competitors a country sent. This table shows the top and bottom ends of this list, omitting the 114(!) countries at the event that did not win a single medal:

RankCountryCompetitorsGoldSilverBronzeTotalTotal/Competitors
1Saint Lucia4110250.00%
2Kyrgyzstan16024637.50%
North Korea16024637.50%
3Grenada6002233.33%
4Bahrain13211430.77%
86Morocco6010123.33%
87Mongolia3201013.13%
88Fiji3301013.03%
89Refugee Olympic Team3700112.70%
90Argentina13611132.21%
91Egypt14811132.03%

Counting only gold medals instead, a few new names pop up:

RankCountryCompetitorsGoldSilverBronzeTotalGold/Competitors
1Saint Lucia4110225.00%
Dominica4100125.00%
2Bahrain13211415.38%
3Pakistan7100114.29%
4Georgia28331710.71%
59Switzerland12712580.79%
60Argentina13611130.74%
61Egypt14811130.68%
62South Africa14913260.67%
63Poland218145100.46%

As before, this list omits the 142(!!) countries that did not win a single gold medal.

The efficiency of sending a small but powerful delegation to the Olympics is pretty clear. Half of the athletes sent by Saint Lucia won medals, and a quarter of their team won gold. If one were to hold the United States team to this standard, they would have taken 296 medals, 148 of them gold.

Throwing population into it

There are something like 8.1 billion people on Earth today, and about 7.8 billion of those people live in countries that participated in the Olympics. If we were to distribute the population of all these countries evenly into the same 10,795 athlete slots of the 2024 Games, about one in 720,016 people globally would participate. Following this distribution, China and India would have each sent over 1,950 competitors apiece.

On the other end of the spectrum, Tuvalu and Nauru would have sent 1.5% of one person each, roughly the mass of a human’s forearm. And as a result of this research, now I’m probably on more government watchlists than I was before. So it’s pretty clear that the populations of countries are where the real fun lies. If we know how many people live in each country, we can distort the data in all kinds of amusing ways.

Once again, I defer to Wikipedia’s List of countries and dependencies by population as the source of truth. That page contains a table with the apparent best guess of each country’s population as of right now-ish. Although I likely could’ve done alright trying to parse this as HTML, I chose to use the tried-and-true approach of trawling regular expressions through the wikitext looking for matches.

The meat of this is:

An earlier/buggier version of this function taught me that an entire country name, Dominica, is a strict prefix of another, the Dominican Republic. Any Jeopardy! writers wanna jump on that?

import re


def get_population(country_name):
    wikitext = '...from List_of_countries_and_dependencies_by_population...'
    lines = iter(wikitext.splitlines())
    for line in lines:
        if f'{{flag|{country_name}}}' not in line:
            continue
        line = next(lines)
        pop_str = re.search(r'n\+p\|[^\d\|]*([\d,]+)', line).group(1)
        pop = int(pop_str.replace(',', ''))
        return pop


# Now something like get_population('New Zealand') returns 5338900

Using a wikitext string fetched from List of countries and dependencies by population, extract the population count for a single named country.

This relies on a couple of Wikipediaisms around template usage. Each country row in this table has a column populated by the {{flag|...}} template, which renders as a graphical flag and text link to the passed country name. On the following line (at least, that’s how it was when I did this) is the {{n+p|...|{{worldpop}}}} template that is responsible for rendering “number and percent” of the passed number divided by the current {{worldpop}} value. Using a regular expression that became uglier as I iterated on it, I was eventually able to extract the populations of all the relevant countries. My final table came out sorting the same as theirs, We’re scraping, after all, not building space stations. so I’ll assume the extraction was correct.

This also required a bit of reverse-diplomacy to map the Olympic Committee names to the country names the Western Wikipedia World uses:

FromTo
Chinese TaipeiTaiwan
Federated States of MicronesiaMicronesia
Great Britain(just return 65,685,738)
The GambiaGambia
Virgin IslandsU.S. Virgin Islands

The business with Great Britain/United Kingdom is because their population is handled by a template that I have no intention of ever understanding.

They’ve got spirit, yes they do

Before even considering medal counts, which countries sent the highest and lowest percentage of their own population to the Olympics?

RankCountryPopulationCompetitorsGoldSilverBronzeTotalParticipation
1Tuvalu10,679200001 per 5,340
2Palau16,733300001 per 5,578
3Monaco38,367600001 per 6,395
4San Marino33,950500001 per 6,790
5Cook Islands15,040200001 per 7,520
200Democratic Republic of the Congo98,370,000600001 per 16,395,000
201Somalia18,143,379100001 per 18,143,379
202Myanmar56,712,559200001 per 28,356,280
203Bangladesh169,828,911500001 per 33,965,782
204Pakistan241,499,431710011 per 34,499,919

Note: This list omits the Individual Neutral Athletes and Refugee Olympic Team, since they are not “countries” with populations in the same sense as everybody else.

The median participation rate lies somewhere between Japan and North Macedonia, roughly 1 per 306,000. Applying that uniformly to all participating countries, the U.S. would’ve sent 1,100 athletes and the Cook Islands would have sent a human leg (without foot). Dear FBI, I promise I’m a pretty cool dude once you get to know me.

Making it everybody’s business

Now we can do something profoundly stupid: We’ll simply ignore the number of competitors and just use the medal counts divided by the populations of entire countries. Given the way Olympic events are scored and ranked, this should be a completely fair and unbiased measurement of the athletic prowess of all nations, right? No. Wrong.

Well, no matter. Here are the highest and lowest medal totals divided by everybody in the country:

RankCountryPopulationGoldSilverBronzeTotalTotal/Population
1Grenada112,57900221 per 56,290
2Dominica67,40810011 per 67,408
3Saint Lucia184,10011021 per 92,050
4New Zealand5,338,9001073201 per 266,945
5Bahrain1,577,05921141 per 394,265
86Peru34,038,45700111 per 34,038,457
87Egypt105,914,49911131 per 35,304,833
88Indonesia281,603,80020131 per 93,867,933
89India1,404,910,00001561 per 234,151,667
90Pakistan241,499,43110011 per 241,499,431

Note: Excludes Individual Neutral Athletes, Refugee Olympic Team, and every country that earned no medals.

And the same, using gold medals instead of overall totals:

RankCountryPopulationGoldSilverBronzeTotalGold/Population
1Dominica67,40810011 per 67,408
2Saint Lucia184,10011021 per 184,100
3New Zealand5,338,9001073201 per 533,890
4Bahrain1,577,05921141 per 788,530
5Slovenia2,123,94921031 per 1,061,975
59Brazil203,080,7563710201 per 67,693,585
60Egypt105,914,49911131 per 105,914,499
61Ethiopia109,499,00013041 per 109,499,000
62Indonesia281,603,80020131 per 140,801,900
63Pakistan241,499,43110011 per 241,499,431

Note: Excludes Individual Neutral Athletes and every country that earned no gold medals.

So there they are, your new 2024 Olympic champions: Dominica! For the U.S. team to have won at a rate comparable to this Caribbean powerhouse, we would’ve needed to take home nearly 5,000 gold medals.

Last I checked, we did not.

Do what thou wilt

I’ll end this with a table of all the data, assuming you have working JavaScript. If not, the JSON source is here. The column headers are each clickable for your sorting pleasure:

[the table would've gone here, if the script had worked]

The column descriptions are:

Country
The name of the National Olympic Committee the country competed as.
Population
The total population of the country at the time of writing, shortly after the closing ceremony.
Competitors
The number of athletes sent to the 2024 Olympic Games by that country.
Participation
The competitors divided by the population expressed as a fraction.
Gold, Silver, Bronze
The number of gold/silver/bronze medals won by the country during the 2024 Olympic Games.
Gold/Pop
The gold divided by population expressed as a fraction.
Gold/Comp
The gold divided by competitors expressed as a fraction.
Total
The combined sum of gold, silver, and bronze medals won.
Tot/Pop
The total divided by population expressed as a fraction.
Tot/Comp:
The total divided by competitors expressed as a fraction.

I’m not entirely sure that there’s a real conclusion I’m angling for here, although I’m not the first person to have this kind of thought. There have been some legitimate papers written by actual scholars on the topic, whereas I’m just some guy who converts “I wonder if…” into “Now why did I…?”

« Back to Articles