# PostgreSQL window functions (part 2)

For this chapter, let's go back to our polling application data.

Below is the data set we're using in this chapter. You can also see this in an interactive database fiddle[1].

polls

id title owner_username
1 Flask vs. Django jose
2 Python vs. Java rolf
3 Windows vs. Mac bob

options

id option_text poll_id
1 Flask 1
2 Django 1
3 It Depends 1
4 Python 2
5 Java 2
6 Windows 3
7 Mac 3

votes (long one!)

username option_id
jose 1
charlie 1
ammar 1
rolf 2
bob 2
anne 4
eric 4
jose 4
charlie 4
ammar 4
rolf 4
bob 4
anne 5
eric 5
bob 4
anne 6
eric 7

# Why we need a window function in our vote percentages query

In our query earlier, we had something like this:

SELECT
    options.id,
    options.option_text,
    COUNT(votes.option_id) as vote_count,
    COUNT(votes.option_id) / SUM(COUNT(votes.option_id)) OVER() * 100.0 as vote_percentage
FROM options
LEFT JOIN votes on options.id = votes.option_id
WHERE options.poll_id = 1
GROUP BY options.id;

Aggregate functions like COUNT and SUM get calculated while the GROUP BY is being evaluated. For example, in COUNT(votes.option_id) as vote_count the value increases as the query processes each group, until it finishes.

That's why you can't have nested aggregate functions: you can't SUM something that is still being calculated.

However, window functions run after the GROUP BY and aggregate functions are evaluated. When the SUM() OVER() part of SUM(COUNT(votes.option_id)) OVER() runs, the COUNT() has already been calculated.

Let's take this step by step, and look at what the result set evaluates to before running the window function.

We've got a FROM options, a LEFT JOIN with votes, a WHERE clause, and a GROUP BY. For the purposes of these diagrams, I'm going to ignore any SELECT columns since those are processed at the very end.

Let's first of all create a diagram of what the rows and groups look like:

Rows and related groups of data

Then, aggregate functions run before window functions. With SUM(COUNT(votes.option_id)) OVER(), it means that COUNT(votes.option_id) runs first, so the window function 'sees' and can operate on this:

Result after running aggregate function

And then SUM() runs on that, giving us the total number of votes.

So to recap, the important part here is that SUM(COUNT()) look nested in the code, but they aren't really. The are evaluated at different points, so both functions are just operating on a column of data.

We can further break down the data that the window function operates on by placing stuff inside the brackets of OVER(). Let's take a look at an example.


  1. Interactive database fiddle for this chapter (opens new window) ↩ī¸Ž