# WHERE: filter through data

TIP

Run through and play with the interactive database[1] for this lecture to learn about this better!

We can optionally extend the SELECT command to give it some filtering information. If we do that, we'll only get back those rows that match our filter.

The filtering syntax goes like this:

SELECT * FROM users WHERE first_name = 'John';

The first part is not new: SELECT * FROM users. The second part is our filter that uses WHERE.

This query will return those rows whose column first_name has the value 'John'. Note that the match must be exact, so something like 'Johnathan' will not be returned, even though it starts with 'John'.

Other examples of query might be these:

... WHERE age > 18;
... WHERE salary < 35000;
... WHERE surname != 'Smith';

TIP

In the examples above, age and salary are fictitious INTEGER type columns. We must only use quotation marks around the values, such as in 'Smith', when the type is TEXT. Otherwise the quotation marks should not be used.

These are the available comparison operators:

  • > and < for "greater than" or "less than".
  • >= and <= for "greater than or equal to" or "less than or equal to".
  • = and != for "exactly equal to" or "not equal to".

# Multiple comparisons

We can use keywords AND and OR to chain comparisons logically. For example:

... WHERE years_experience > 10 AND salary < 35000;

This query could be used to find employees that should be considered for promotion.

The AND keyword is used to chain two comparisons, such that rows will only be included in the result if they match both comparisons. So in this case, we would get people with over 10 years experience, and also a salary less than 35000.

Another example, using OR:

... WHERE age <= 18 OR age >= 65;

This query could be used to find those people who might not be in working age.

The OR keyword is used to chain two comparisons and return rows that match either comparison. Here we would get people who are 18 years old or less, and also those people who are 65 years old or more.

# More than one chaining keyword

You can use more than one AND and OR in a query:

... WHERE age <= 18 OR age >= 65 OR salary = 0;

This would return rows where at least one of the comparisons match:

  • If the person is 18 or under.
  • If the person is 65 or over.
  • If the person's salary is exactly 0.

You can use both AND and OR in one query, although it can get a bit confusing:

... WHERE age <= 18 OR age >= 65 AND salary > 0;

This could be used to get those people who are not in normal working age, but are currently working.

But this is confusing because there are two ways to think about this filter:

  • If the person's age is 18 or under, return this row.
  • If the person's age is 65 or over and their salary is over 0, return this row.

Or, you could think of it this way:

  • If the person's age is 18 or under and their salary is over 0, return this row.
  • If the person's age is 65 or over and their salary is over 0, return this row.

At the moment, the query as we wrote it would be equivalent to the first set of bullet points. It might not be what you expect!

That's why when you're working with multiple chained operators, it's a good idea to use brackets (a.k.a. parentheses) to tell the database the order of evaluation of the conditionals.

If you want to match the second set of bullet points:

... WHERE (age <= 18 OR age >= 65) AND salary > 0;

This means the age comparison will evaluate first, and the database will determine whether the age is either 18 or less, or 65 or more. If it is, then it will check the salary and see whether it's over 0. If it is, then it will include the row in the results.

Similarly, if you wanted to match the first set of bullet points you could keep the query without any brackets, but it is a bit confusing. Adding brackets to make it more obvious is always a good idea:

... WHERE age <= 18 OR (age >= 65 AND salary > 0);

This is one of those situations where explaining how the logic is evaluated using plain English becomes more difficult than with code!

# Long queries

Sometimes you'll find you are writing a long query because you want many columns and a lot of filtering. Something like this:

SELECT first_name, surname, age, salary FROM users WHERE (age <= 18 OR age >= 65) AND salary > 0;

It can get difficult to read and understand quickly. To make it easier for a reader of the code (yourself or someone else), it can be a good idea to split it into multiple lines.

You can do this without worrying, because SQL doesn't care about new lines in a query. It only cares about the final semicolon that marks the end of the query:

SELECT first_name, surname, age, salary
FROM users
WHERE (age <= 18 OR age >= 65) AND salary > 0;

  1. Interactive database for this lecture (opens new window) ↩ī¸Ž