WEBVTT

00:00.060 --> 00:01.130
OK we'll come back.

00:01.260 --> 00:06.390
So we're finally here and this is a bit of a turning point in the course really important section.

00:06.400 --> 00:13.460
That's scary too much but really fundamental to how sequel and mines are used in the real world.

00:14.240 --> 00:19.880
So this is a section all about relationships and also something called joints which we'll get to in

00:19.880 --> 00:21.860
the second part of this section.

00:22.040 --> 00:24.400
So let's focus on the relationships part first.

00:24.740 --> 00:31.580
So far in this course we've been working with very simple data things like books or users where we only

00:31.580 --> 00:36.530
have a couple of columns per table and that's been by design.

00:36.530 --> 00:42.800
We need that simplicity in order to focus on what's important to isolate the different features that

00:42.800 --> 00:46.820
we're talking about whether it's functions or logical operators.

00:46.820 --> 00:51.320
It's much easier to teach those when we don't have 20 different tables to worry about and we can just

00:51.920 --> 00:56.290
basically drill down into one table and focus on that.

00:56.300 --> 01:03.710
But that's typically not how the real world works where we might have a users table but there's no site

01:03.740 --> 01:08.600
that really can just get by on having users unless it's featureless.

01:08.600 --> 01:15.020
There is plenty of other things that need to go along with users whether it's comments or likes or tweets

01:15.080 --> 01:19.280
or posts or photos or any sort of I don't know Order transaction.

01:19.310 --> 01:22.430
There are all these different things that we could have long side users.

01:22.430 --> 01:27.280
So we've been working with simple data but is about to change.

01:29.390 --> 01:34.980
Real world data unlike what we've been working with is really messy typically and interrelated.

01:35.000 --> 01:37.170
There's a lot of connections between data.

01:37.400 --> 01:42.810
So some of those things I talked about with let's say users if we're working with just a simple blog

01:42.810 --> 01:43.100
site.

01:43.100 --> 01:47.300
So nothing like a Facebook or a complex social network just a simple blog.

01:47.300 --> 01:48.920
We still need to have user data.

01:49.190 --> 01:52.790
We need to have information about posts blog posts.

01:52.790 --> 01:59.090
We also need to somehow keep track of comments and then tags and that can get pretty complicated already.

01:59.090 --> 02:05.030
Not to mention things like advertising and tracking not only ads on their own but how ads are related

02:05.030 --> 02:08.020
to users and what users are clicking on and not clicking on.

02:08.030 --> 02:13.760
There's so much data that just from a simple blog site something that may not seem that complex there's

02:13.760 --> 02:16.120
so much stuff that we could store.

02:16.430 --> 02:22.820
So we're going to focus on how we work with interconnected data in the next couple of sections so we're

02:22.820 --> 02:24.680
no longer focusing on one table.

02:24.800 --> 02:29.750
We're going to see how do we work with two tables or three or four that are related so not just tables

02:29.750 --> 02:34.850
that exist independently but tables that reference one another and are interconnected.

02:35.060 --> 02:35.960
So let's get going.

02:36.110 --> 02:39.760
And the first thing that I want to do is go back to our books data.

02:40.250 --> 02:42.140
So remember our books table.

02:42.140 --> 02:43.190
Don't worry we're done with it.

02:43.190 --> 02:45.620
We're not going to be typing any books code.

02:45.630 --> 02:47.260
But on its own it's really simple.

02:47.270 --> 02:54.350
We had a Books table with author first name and last name a book title released year page count in a

02:54.350 --> 02:57.640
stock inventory I believe a stock quantity.

02:57.640 --> 02:59.330
And I think that was it.

02:59.330 --> 03:02.230
So again on its own that just represents a single book.

03:02.510 --> 03:04.100
But you can't really do much with that.

03:04.190 --> 03:09.620
But if we are running a sort of books site where we sold books at a minimum we would need to keep track

03:09.620 --> 03:11.820
of things like versions.

03:11.840 --> 03:13.120
So we didn't even talk about this.

03:13.130 --> 03:15.990
But there's often multiple versions of a single book.

03:16.040 --> 03:21.240
If you go on Amazon and you look at a book let's say Harry Potter the second book.

03:21.320 --> 03:22.350
Chamber of Secrets.

03:22.550 --> 03:27.620
Well there's going to be the initial British release the UK version as a hardcover and then there's

03:27.620 --> 03:31.210
a paperback and then there's the American version and then there's a paperback.

03:31.320 --> 03:36.060
There is a full color illustrated version that came out relatively relatively recently.

03:36.080 --> 03:39.240
Looks pretty cool that has these giant illustrations.

03:39.260 --> 03:40.400
It's the same title.

03:41.300 --> 03:44.190
How do you keep track of that is at a different table.

03:44.510 --> 03:46.920
Or do you just make a separate book for every one of those.

03:47.090 --> 03:49.860
But if you do that then how do you make sure that they're connected.

03:49.910 --> 03:55.670
Because when I go on Amazon and I look at the hardcover it also will show me below you know some people

03:55.670 --> 04:03.380
also buy paperback version or it's also available and I don't know 25 40 different languages.

04:03.470 --> 04:07.100
So there's all sorts of things to worry about just with versions.

04:07.820 --> 04:11.290
But then we have authors Well that might seem straightforward.

04:11.360 --> 04:13.860
We have an author first name and last name.

04:14.120 --> 04:15.750
Two columns on our books table.

04:15.890 --> 04:17.350
But that's not sufficient.

04:17.570 --> 04:22.430
When we have multiple authors so some books a lot of books have multiple authors.

04:22.430 --> 04:28.520
What about if it's a research paper that we're selling or a journal that has dozens of authors potentially.

04:28.940 --> 04:32.720
So there's a lot of things to keep track of there that we can't really do right now.

04:32.720 --> 04:35.580
We only have room for one author in our columns.

04:36.750 --> 04:39.450
Then a whole big thing is customers.

04:39.740 --> 04:43.520
If we're doing a book shop on line books on their own it's great.

04:43.560 --> 04:46.390
But we also need ways to keep track of users or customers.

04:46.390 --> 04:48.390
This is what I'm calling them in this case.

04:48.390 --> 04:55.590
So customers on their own also can't do that much but we might keep track of their e-mail log in information

04:56.290 --> 05:00.630
of course who wouldn't just or their password directly but for simplicity's sake let's say for keeping

05:00.630 --> 05:04.070
track of password and e-mail so that they can log in.

05:04.260 --> 05:07.360
And then there's a whole bunch of other things orders.

05:07.410 --> 05:11.090
So if customers want to order books how do we store that.

05:11.310 --> 05:17.340
We need to create a new order and it needs to be associated with the customer who created it needs to

05:17.340 --> 05:21.750
have address information maybe and needs to have the content of the order.

05:21.960 --> 05:24.840
Are there books one book how do we support that.

05:24.840 --> 05:26.370
How do we handle that.

05:26.370 --> 05:33.630
What about if we're working with dates and times and we need to keep track of when a return period expires.

05:33.630 --> 05:39.480
So we need to have a you know an order date and then keep track of 30 days or 60 days after that or

05:39.480 --> 05:40.630
whatever it is.

05:40.710 --> 05:47.550
Then we also need to store or somehow keep track of the transaction information addresses for billing

05:47.550 --> 05:54.270
address not only shipping address the type of shipping that billing information so much store and then

05:54.270 --> 05:56.410
a whole other thing is revues.

05:56.460 --> 06:02.670
So if we go on Amazon or pretty much any book site any book that you look at will have dozens or hundreds

06:02.670 --> 06:04.490
sometimes thousands of reviews.

06:04.800 --> 06:09.090
So we've got reviews their ratings that you know the number of stars.

06:09.270 --> 06:15.990
But then also a title for the review and then a caption and then the actual content of the review which

06:15.990 --> 06:19.600
may be 10 characters I hate this book.

06:19.650 --> 06:24.120
However many characters that is or it could be a giant paragraph someone's love letter that they're

06:24.120 --> 06:25.070
writing to a book.

06:25.080 --> 06:26.750
So how do we account for all of that.

06:26.880 --> 06:29.700
And then the last thing that we'll talk about.

06:30.210 --> 06:35.220
So you may think that we could just get away with storing this in a book which we absolutely could we

06:35.220 --> 06:39.990
could just have one genre but many books don't cleanly fit into one genre.

06:40.410 --> 06:46.560
You might have a vampire novel that's also historical fiction because it's set in Victoria period England

06:47.220 --> 06:54.540
or we might have you know a sci fi book that has elements of fantasy or who knows you can have all sorts

06:54.540 --> 06:59.460
of you know cross-pollinated genres so only assigning one genre might not be that useful.

06:59.460 --> 07:01.120
We may want to have a bunch more.

07:01.220 --> 07:06.180
And in fact on Amazon you can see there are books that will have two or three different genres or different

07:06.180 --> 07:08.170
tags associated with them.

07:08.190 --> 07:12.540
So what I'm trying to show you here is that we were working with books on their own but in a typical

07:12.540 --> 07:19.350
app or typical Web site that has to do with books and bookstore in our case there's a lot more that

07:19.350 --> 07:23.320
goes along with books to make anything functional to store any important information.

07:23.370 --> 07:28.230
And this is I don't know maybe half of the tables that you would need to get away with the candidate

07:28.320 --> 07:30.320
the MVP or the bare minimum.

07:30.320 --> 07:35.670
There are still many other things something like orders for instance might actually need to be split

07:35.670 --> 07:36.840
into a couple of tables.

07:36.960 --> 07:43.530
You might need to have an address section or an address table which I need to then have a separate billing

07:43.770 --> 07:44.430
table.

07:44.670 --> 07:46.940
We might need to have shipping.

07:46.950 --> 07:49.550
There could be all different tables that we could split it up into.

07:49.740 --> 07:54.720
So this isn't to say this is the end all be all this is just that there's a couple of other tables that

07:54.720 --> 07:56.690
we need to consider at the bare minimum.

07:56.910 --> 07:57.330
OK.

07:57.480 --> 07:59.590
So we'll stop here in the next video.

07:59.640 --> 08:04.670
We're going to identify some of the different types of relationships between data at a high level.

08:04.800 --> 08:07.560
What are the different types of associations.

08:07.560 --> 08:08.660
And I'm back.

08:08.820 --> 08:12.720
I totally forgot to show you this even though I wrote a note to myself.

08:12.810 --> 08:13.770
I totally skipped this.

08:13.810 --> 08:15.950
So I'm back after recording this.

08:16.110 --> 08:22.590
Adding this to the end I want to show this schema diagram that I really think highlights how messy data

08:22.590 --> 08:23.970
can be.

08:23.970 --> 08:29.520
So here is I think I showed this early on in the course this is just a potential schema for Facebook

08:29.520 --> 08:31.820
and it's definitely incomplete.

08:31.840 --> 08:37.590
It's maybe a tenth of the number of tables that Facebook has but you can see just to store basic things

08:37.590 --> 08:40.110
like events and users.

08:40.110 --> 08:47.940
We have event Facebook event we have event membership which is related to RSVP States which is also

08:47.940 --> 08:53.670
related to venues which is also related to groups and to workplaces and to Facebook profiles.

08:53.680 --> 08:55.860
And it just goes on and on and on.

08:55.860 --> 09:01.920
So what I'm trying to show you is that data can get messy and in fact here's another super intimidating

09:01.920 --> 09:02.740
schema.

09:02.820 --> 09:08.730
This one is a little bit crazy because it has to do with storing protein database.

09:08.730 --> 09:12.610
I think it was I saw some article on IBM and I believe MIT.

09:12.840 --> 09:16.110
Unfortunately it's very low quality image that you can actually see the tables.

09:16.170 --> 09:22.830
But each one of these little rectangles has a separate table and every line is a connection between

09:22.830 --> 09:23.910
the tables.

09:23.910 --> 09:25.950
Now this is the worst I've ever seen the craziest.

09:25.950 --> 09:29.910
And I think that was the point is that they're trying to just show you you know they're never going

09:29.910 --> 09:35.190
to actually look at this to try and understand anything but they're trying to show you how complex protein

09:35.190 --> 09:35.880
information is.

09:35.880 --> 09:40.620
When you have all these different combinations and permutations for a given protein without going into

09:40.620 --> 09:46.560
too much detail here this isn't a bio course but the way that a protein is folded the same structure

09:46.860 --> 09:52.980
basically the same amino acids depending on how it's folded can drastically change the features if you

09:52.980 --> 09:53.490
will.

09:53.490 --> 09:55.780
Basically what that protein does how it behaves.

09:55.950 --> 10:05.190
And so this is a way of documenting using sequel in some way the structure of a protein using a database.

10:05.190 --> 10:05.570
OK.

10:05.720 --> 10:11.040
So that's a scary there I just think it's kind of a powerful image although grainy it is powerful to

10:11.040 --> 10:11.770
see.

10:11.940 --> 10:18.600
Basically some really messy ugly gross data that can be supported using tables.

10:18.600 --> 10:20.210
It just doesn't look pretty.

10:20.400 --> 10:24.510
But that's the important part in kind of the point of this video is I wanted to get you prepared mentally

10:25.080 --> 10:26.940
for working with some more complex data.

10:26.970 --> 10:28.100
Although nothing that looks like this.
