# Stage 3: last improvements

Stage 2 has a few problems that we've discussed before:

  • Marking a movie as "watched" removes it from the movies table and thus needs to be re-added so other users can watch it.
  • There is no way of distinguishing between two users with the same username.
  • We lose release date information about movies when somebody watches them.

These problems can be fixed by one important principle of database design:

Try to model your tables after entities that your program deals in.

What does that mean?

Well, let's think about what real-world entities or things our program uses.

  1. Movies
  2. Users

The "watching" isn't a real-world entity--it's not a thing.

This suggests (not always correct, but often gives good guidance) that we should have at least two tables: one for movie information and one for user information.

Watching is something users do with movies. One user can watch many movies. Similarly, one movie can be watched by many users.

That means that a movie being watched is not an attribute of either a movie or a user, so it should not be in either of those two tables.

This is called a many-to-many relationship. To implement this effectively, we need a third table that acts as a link between users and movies.

# Our new table structure

Our movies table will look very much like what we already know. However, we're going to add a unique identifier to the table so that even movies with the same name can be identified uniquely:

id INTEGER PRIMARY KEY title TEXT release_timestamp REAL

Every time we add a new movie we will give it a unique id value. An easy way to do this is to add 1 to it every time we add a new movie.

The users table will look like this. In here we're only storing the username, but we could add more columns if we wanted (e.g. user's date of birth):

username TEXT PRIMARY KEY

Applying PRIMARY KEY to a TEXT column automatically forces it to be unique. So we won't be able to add multiple users with the same username in our application.

And finally there'll be a third table, watched, that stores a new row every time a user watches a movie.

user_username TEXT movie_id INTEGER

In this table, both columns would be FOREIGN KEY, referencing the appropriate column:

  • user_username references users.username
  • movie_id references movies.id

Let's say we add the following data to this table:

user_username TEXT movie_id INTEGER
Rolf 2
Jen 2
Rolf 3
Bob 1

This means that:

  • "User 'Rolf' watched movie 2"
  • "User 'Jen' watched movie 2"
  • "User 'Rolf' watched movie 3"
  • "User 'Bob' watched movie 1"

If we wanted to know more about the user (and our users table had more than a single column), we'd grab the username and go into the users table and find their there.

Similarly if we wanted to know the movie's name or release date, we'd grab the movie_id and go into the movies table with it.

You can see that with this table, we can easily say that one user watched multiple movies (user 'Rolf'). Also, one movie was watched by multiple users (movie 2).