WEBVTT

00:01.240 --> 00:04.580
Say hey welcome to the next section.

00:04.790 --> 00:08.290
So this section as you can probably gather we're still working with our books data.

00:08.290 --> 00:10.150
I hope you're not tired of it yet.

00:10.360 --> 00:12.250
If you are almost done.

00:12.290 --> 00:17.060
But it's important that we keep working with one dataset for a little bit at least so that you can get

00:17.060 --> 00:18.090
familiar with it.

00:18.140 --> 00:22.940
And the reason that that matters is that as we do some of the more advanced things we want you to be

00:22.940 --> 00:24.260
able to check your work.

00:24.260 --> 00:29.850
So in this section we're going to learn a lot about different ways of performing analysis on data.

00:29.900 --> 00:35.450
So things like finding averages or something a bunch of data together grouping things by authors and

00:35.450 --> 00:39.710
calculating average quantities or page numbers per author.

00:39.710 --> 00:45.650
These are operations that if we had 10000 books or 10000 something else it's really hard to know if

00:45.650 --> 00:46.900
you're doing it right or wrong.

00:47.030 --> 00:50.690
You get an answer just a number of let's say sixty seven point five.

00:50.780 --> 00:52.650
And how could you know if that's right or wrong.

00:52.940 --> 00:58.020
But if we're working with books and we have 20 of them and you're familiar with the page counts and

00:58.030 --> 00:59.690
the authors you'll know.

00:59.690 --> 00:59.980
OK.

00:59.990 --> 01:05.750
This author has three books that seems right versus OK that author only has two books why are we saying

01:05.750 --> 01:06.080
that.

01:06.140 --> 01:08.540
You know we have three or something like that.

01:08.570 --> 01:11.020
Terrible example but the same idea is true.

01:11.330 --> 01:12.980
We know our data at this point.

01:13.130 --> 01:17.450
We're going to at the end of this course and along the way you know we're going to keep upgrading our

01:17.450 --> 01:22.360
data to more complex structures more tables more rose complex stuff.

01:22.730 --> 01:28.580
But at the very end kind of the capstone case study we'll be working with Instagram esque data fake

01:28.580 --> 01:33.950
data for Instagram and we'll have thousands and thousands of rows and you won't actually know if you're

01:33.950 --> 01:38.600
doing things right or wrong based off the number you're getting unless you've manually checked your

01:38.600 --> 01:41.970
work by you know doing a thousand additions or something.

01:42.300 --> 01:45.910
So all this to say stick with the books if you can.

01:46.190 --> 01:51.230
And in this section we're focusing a lot on these new aggregate functions.

01:51.260 --> 01:58.620
So those are things like finding averages counting summing things together based off of grouping data.

01:58.820 --> 02:02.170
So it's a bit hard to explain and a headshot that's showing you code.

02:02.270 --> 02:04.000
So I'll let the code do that.

02:04.010 --> 02:05.420
And just a few videos from now.

02:05.630 --> 02:10.690
But the rough idea is that we take our data and there's all sorts of insights we can gain.

02:10.850 --> 02:17.000
So rather than just working with an individual row or a group of rows we can combine things into like

02:17.120 --> 02:18.040
mega rows.

02:18.170 --> 02:24.380
I can combine all of our authors and group books based off of who wrote them or group books based off

02:24.380 --> 02:28.570
of what year they're written in and then perform operations on those groups.

02:28.700 --> 02:35.930
So it allows them to do things like find the average sales that we've had per year or we could do things

02:35.930 --> 02:40.450
like find the average page number for books per general.

02:40.670 --> 02:43.990
And then we could expand that obviously to more complex stuff.

02:44.000 --> 02:48.680
If you're working with advertising data or let's say our Instagram data we'll be able to at the end

02:48.680 --> 02:55.270
of the course do things like find out which one of our users is a power user influencers what they're

02:55.280 --> 03:01.820
called I guess meaning that they have the most comments the most likes on each one of their posts on

03:01.820 --> 03:02.450
average.

03:02.540 --> 03:06.980
So who in our database is getting on average the most likes and comments.

03:06.980 --> 03:12.050
Or we could do things like which hash tag generates the most traction and to do that we would need to

03:12.050 --> 03:17.270
analyze all of our hashtags and then take all of our photos that have those hash tags group them together

03:17.540 --> 03:21.280
figure out which one of those hashtags generates the most likes.

03:21.470 --> 03:24.690
So there's a lot of stuff that we can do with these aggregate functions.

03:24.740 --> 03:29.820
They form the backbone of a lot of the questions and analysis that we'll do throughout the course.

03:29.960 --> 03:30.650
All right.

03:30.650 --> 03:32.640
I'm rambling now I'm going to go away.

03:32.690 --> 03:34.520
Hopefully you enjoy this section.

03:34.520 --> 03:37.590
It's important to say that a lot but it really is.

03:37.760 --> 03:42.800
And of course we'll have a bunch of exercises throughout the course or throughout the section and especially

03:42.800 --> 03:43.560
at the end.

03:43.760 --> 03:45.200
And I'm trying to keep it interesting.

03:45.200 --> 03:46.260
Look forward to those.

03:46.400 --> 03:50.120
If you don't just remember it's a database course.

03:50.120 --> 03:51.580
I'm trying my best.

03:51.740 --> 03:53.500
It's you know it's databases.

03:53.780 --> 03:54.620
All right.

03:54.620 --> 03:55.200
I'm done.