# The SQL DISTINCT clause

In this chapter we'll learn about the DISTINCT clause, which we can use to SELECT unique results from a result set.

As an example for this chapter, we will try to get the most voted options across all polls. That means that for each poll, we want to get the most voted option.

Since we're interested in the most voted option, we'll begin by finding the vote count on each option. We can reuse part of the query we wrote when calculating the vote percentages.

SELECT
    options.id,
    options.option_text,
    options.poll_id,
    COUNT(votes.option_id) as vote_count
FROM options
LEFT JOIN votes on options.id = votes.option_id
GROUP BY options.id;

Note that here we're not limiting to one particular poll, so we're getting back all the options in the database: their id, option_text, poll_id, and count of votes.

Now we want to only get the option with the highest votes in each poll. We want to make sure that each options.poll_id only appears once.

Let's say we have three options in one poll:

id option_text poll_id
1 Flask 1
2 Django 1
3 It depends 1

If we ask PostgreSQL to only give us one poll_id for this, naturally we will lose two of the rows.

So which row will we keep?

The answer is: the first one. Therefore, we need to order the results so that we will keep the first row that we're interested in.

To order results, we'll ORDER BY vote_count DESC:

SELECT
    options.id,
    options.option_text,
    options.poll_id,
    COUNT(votes.option_id) as vote_count
FROM options
LEFT JOIN votes on options.id = votes.option_id
GROUP BY options.id
ORDER BY vote_count DESC;

And to get distinct poll_id, we'll use the DISTINCT ON clause. To do this, it must move to the top:

SELECT
    DISTINCT ON (options.poll_id) poll_id,
    options.id,
    options.option_text,
    COUNT(votes.option_id) as vote_count
FROM options
LEFT JOIN votes on options.id = votes.option_id
GROUP BY options.id
ORDER BY poll_id, vote_count DESC;

# The DISTINCT ON clause

DISTINCT ON does exactly what the example suggests: gets the results back as a whole, and then keeps the first row for each value in the column we've marked as distinct.

Note that for consistency in the order of rows returned, the column on which we apply DISTINCT ON must be used as the first colum in the ORDER BY clause.

The poll_id after the DISTINCT ON () clause is a column alias: just a name we use to refer to it in the ORDER BY clause later. It is optional.

# The DISTINCT clause

Together with DISTINCT ON, there is also an (arguably more popular) DISTINCT clause on its own.

The DISTINCT clause keeps one row for each group of duplicates, and it can be used on one or more columns of a table. So for example, if we had this table:

id option_text poll_id
1 Flask 1
2 Django 1
3 It depends 1
4 Bob 2
5 Rolf 2
6 Flask 3
7 Django 3
8 Circuits 3

Where we have 3 polls and 8 options. Note that Flask and Django appear twice.

We could use this option to get the unique values from column option_text:

SELECT
    DISTINCT option_text
FROM
    options;

That would give us this:

option_text
Flask
Django
It depends
Bob
Rolf
Circuits

Notice that Flask and Django only appear once.

If we apply the DISTINCT clause on two columns, then it would check duplication on both column values:

SELECT
    DISTINCT option_text, poll_id
FROM
    options;

Now we would get all the options, because there are no rows with the same option_text and poll_id as any other rows:

option_text poll_id
Flask 1
Django 1
It depends 1
Bob 2
Rolf 2
Flask 3
Django 3
Circuits 3

# Wrapping Up

Alright, now that we know about the DISTINCT and DISTINCT ON clauses, we can use that to get the most voted option for each poll!