# Stage 3: last improvements
Stage 2 has a few problems that we've discussed before:
- Marking a movie as "watched" removes it from the
movies
table and thus needs to be re-added so other users can watch it. - There is no way of distinguishing between two users with the same username.
- We lose release date information about movies when somebody watches them.
These problems can be fixed by one important principle of database design:
Try to model your tables after entities that your program deals in.
What does that mean?
Well, let's think about what real-world entities or things our program uses.
- Movies
- Users
The "watching" isn't a real-world entity--it's not a thing.
This suggests (not always correct, but often gives good guidance) that we should have at least two tables: one for movie information and one for user information.
Watching is something users do with movies. One user can watch many movies. Similarly, one movie can be watched by many users.
That means that a movie being watched is not an attribute of either a movie or a user, so it should not be in either of those two tables.
This is called a many-to-many relationship. To implement this effectively, we need a third table that acts as a link between users and movies.
# Our new table structure
Our movies
table will look very much like what we already know. However, we're going to add a unique identifier to the table so that even movies with the same name can be identified uniquely:
id INTEGER PRIMARY KEY | title TEXT | release_timestamp REAL |
---|
Every time we add a new movie we will give it a unique id
value. An easy way to do this is to add 1
to it every time we add a new movie.
The users
table will look like this. In here we're only storing the username, but we could add more columns if we wanted (e.g. user's date of birth):
username TEXT PRIMARY KEY |
---|
Applying PRIMARY KEY
to a TEXT
column automatically forces it to be unique. So we won't be able to add multiple users with the same username
in our application.
And finally there'll be a third table, watched
, that stores a new row every time a user watches a movie.
user_username TEXT | movie_id INTEGER |
---|
In this table, both columns would be FOREIGN KEY
, referencing the appropriate column:
user_username
referencesusers.username
movie_id
referencesmovies.id
Let's say we add the following data to this table:
user_username TEXT | movie_id INTEGER |
---|---|
Rolf | 2 |
Jen | 2 |
Rolf | 3 |
Bob | 1 |
This means that:
- "User 'Rolf' watched movie 2"
- "User 'Jen' watched movie 2"
- "User 'Rolf' watched movie 3"
- "User 'Bob' watched movie 1"
If we wanted to know more about the user (and our users
table had more than a single column), we'd grab the username
and go into the users
table and find their there.
Similarly if we wanted to know the movie's name or release date, we'd grab the movie_id
and go into the movies
table with it.
You can see that with this table, we can easily say that one user watched multiple movies (user 'Rolf'
). Also, one movie was watched by multiple users (movie 2).