WEBVTT

00:00.240 --> 00:01.420
Get here we go.

00:01.740 --> 00:05.240
So let's first start off by just making a new file to work in.

00:05.430 --> 00:07.040
So we have some cleaning.

00:07.470 --> 00:11.060
We'll just call it exercises as well.

00:11.140 --> 00:13.160
So we have a nice place to put the answers.

00:13.530 --> 00:14.030
OK.

00:14.400 --> 00:17.870
So our first one here find the earliest date a user joined.

00:18.330 --> 00:21.630
And we're not selecting anything except the earliest date.

00:21.750 --> 00:29.160
So all we need to do is take advantage of created that and use the men operator them in function.

00:29.430 --> 00:37.120
So it's just a matter of select men created at from users.

00:37.410 --> 00:38.250
And we'll see if that

00:41.680 --> 00:47.940
copy and paste it and the one that I get here looks like this.

00:47.940 --> 00:56.150
So 2016 5 one notice says earliest date so that's the easier part to fix.

00:56.190 --> 00:58.030
And then notice the format.

00:58.440 --> 01:03.260
So let's do as earliest date first.

01:03.330 --> 01:08.370
That will make it cleaner then we need to format the date.

01:08.500 --> 01:15.160
So I'm going to put this on a new line and the way that we format remember is we use date underscore

01:15.340 --> 01:16.180
format.

01:16.180 --> 01:18.310
Think back to our date section.

01:18.700 --> 01:22.780
So we're formatting the result of the minimum created which is this.

01:23.310 --> 01:27.280
And we have a second argument which is the date string how we want to format it.

01:27.490 --> 01:36.070
And the one I gave you had the month name which I believe is capital M and then that day with the suffix

01:36.340 --> 01:41.720
which is I believe capital D and the year four letter for numbers capital Y.

01:41.850 --> 01:50.860
So you need percent capital D percent capital no backwards capital M percent capital D percent capital

01:50.860 --> 01:51.600
Y.

01:52.060 --> 01:56.910
Let's see oh I have to copy this pasted in.

01:57.170 --> 01:57.810
And there we go.

01:57.810 --> 01:59.710
May 1st 2016.

01:59.780 --> 02:00.360
OK.

02:00.570 --> 02:06.150
Look we're only a day off from the first dataset that I added when I made these exercises I had a different

02:06.150 --> 02:11.970
set of data and you just watch me rerun it delete it or create 500 new ones so these will be slightly

02:11.970 --> 02:14.640
different than my answers to.

02:14.760 --> 02:17.950
Next find the email of that user.

02:18.180 --> 02:21.900
So the same one that we just found this is the earliest date.

02:21.930 --> 02:24.180
How do we find that e-mail.

02:24.600 --> 02:29.520
So let me label this challenge one and this one.

02:29.550 --> 02:35.700
We have to use a subquery because basically what we're trying to do is well first we can clean this

02:35.700 --> 02:36.570
up a bit.

02:36.690 --> 02:43.120
We don't need to format this one but basically what we're trying to do is take this earliest date which

02:43.120 --> 02:45.340
looks like this.

02:45.880 --> 02:57.150
And then we want to do a select well do e-mail from users where created at is equal to that date.

02:57.250 --> 02:59.250
How do we find that user.

03:00.160 --> 03:02.500
So let's just do it manually.

03:02.950 --> 03:10.600
It looks like our answer is Cole Bailey at yahoo dot com so we can use a so Kery to help us where all

03:10.600 --> 03:16.530
we have to do we can get rid of this and say select email Commack created at Or we could just say select

03:16.590 --> 03:27.410
star to make it shorter from users where created it equals and then we have to do our several query

03:28.550 --> 03:35.650
select men created out from users.

03:36.170 --> 03:40.800
So those parentheses indicate that this is a query that needs to happen.

03:40.820 --> 03:46.490
So this is evaluated what is men created that need my parentheses to work there go.

03:46.670 --> 03:47.750
What is the miscreated.

03:47.780 --> 03:50.680
Well it's going to be this.

03:50.720 --> 03:57.150
So then basically this whole thing is replaced by that and then we're finding users start from users

03:57.180 --> 04:00.230
where creativity is equal to that.

04:00.240 --> 04:02.930
So let's try it.

04:02.960 --> 04:04.360
There we go.

04:04.370 --> 04:05.090
Looks good.

04:05.100 --> 04:09.920
We've got e-mail and created at of course it looks different but this is the correct answer.

04:09.950 --> 04:16.040
Call Baly and we saw that because earlier we just manually selected it based off of that date and it

04:16.040 --> 04:18.800
worked perfect.

04:18.860 --> 04:19.830
Moving on.

04:19.940 --> 04:25.940
So we're taking every user taking they're created at date and then we're extracting the month name from

04:25.940 --> 04:28.880
that date whatever the month is combining them.

04:28.930 --> 04:31.900
We're grouping them and counting how many exist.

04:31.940 --> 04:34.120
So how many users signed up in November.

04:34.130 --> 04:37.540
In our case November was the highest and then we're.

04:37.560 --> 04:38.600
Us well.

04:38.630 --> 04:41.640
So let's start by just extracting the month name.

04:42.140 --> 04:46.980
So if we were to take a date like where's that.

04:47.200 --> 04:48.640
This one right here.

04:48.640 --> 04:50.830
How do we get the month name from that.

04:51.220 --> 05:03.040
Because if we just do select the month of that it gives us a number so that we can do instead is month

05:03.550 --> 05:04.360
name.

05:04.600 --> 05:07.030
And that gives us may.

05:07.030 --> 05:08.070
So we'll start with that.

05:08.230 --> 05:15.830
Well do select month name of created at from users

05:19.430 --> 05:22.520
and we get 500 different month names.

05:22.520 --> 05:26.330
Well there's only 12 different ones but 500 rows.

05:26.690 --> 05:36.440
So then what we want to do is a simple group by and let's just give it a name like as month and grouped

05:36.440 --> 05:38.310
by month instead.

05:38.540 --> 05:40.960
So rather than retyping that whole thing.

05:41.470 --> 05:48.920
OK so now we have 12 assuming that we have a date in every month of the year which we probably do if

05:48.920 --> 05:52.580
we have 500 but we don't see how many there are.

05:52.700 --> 06:07.980
So then we need our second field and our second field is count we just do count start just like that.

06:08.090 --> 06:11.790
And here we go almost there all that we need to do.

06:11.990 --> 06:21.660
I believe I gave this an alias of count and then we also want to order by count and we want it to be

06:21.660 --> 06:23.000
descending.

06:23.610 --> 06:25.040
Let's see if it works.

06:26.480 --> 06:31.850
Riego November the highest count as well with 52.

06:32.200 --> 06:33.520
Although it diverges.

06:33.520 --> 06:35.680
January was first or was second.

06:35.800 --> 06:38.450
In my case may and then April.

06:38.470 --> 06:39.310
All right.

06:39.310 --> 06:44.930
So we use this month name and we group them based off the third month.

06:44.950 --> 06:46.620
You could also just replace it with this.

06:46.630 --> 06:51.750
But it's easier just to type a month and then grouping them counting them and ordering them.

06:52.340 --> 06:55.440
I mean just put a note that this is challenge 3.

06:56.110 --> 06:57.140
Moving on.

06:57.160 --> 07:00.980
Challenge for count the number of users with Yahoo emails.

07:01.360 --> 07:06.850
So for this one it's a bit of a throwback and we need to do is use like to figure out you know which

07:07.150 --> 07:09.410
emails have Yahoo in them.

07:09.460 --> 07:19.090
So we'll do a challenge for let's do a select star from users where we just need to say we're created

07:19.090 --> 07:21.870
at like.

07:22.030 --> 07:23.820
And we don't just want to say Yahoo.

07:23.830 --> 07:29.320
Remember that's only going to find in one of my saying created an email like Yahoo.

07:29.500 --> 07:33.400
We don't want to do that because that's only where email is exactly Yahoo.

07:33.790 --> 07:40.600
So we need these placeholders but we could improve this a bit because this would match anything that

07:40.600 --> 07:45.130
had Yahoo in the string anywhere and we could have someone's email be you know Yahoo.

07:45.130 --> 07:48.760
Tim at gmail dot com.

07:48.910 --> 07:52.950
Or maybe someones last name is Yahoo or something Yahoo's.

07:53.440 --> 07:55.030
So we don't want that.

07:55.030 --> 08:03.670
So all we want to change is we'll do at Yahoo dot com and this will guarantee that the string ends at

08:03.670 --> 08:05.960
yahoo dot com.

08:06.010 --> 08:07.730
So now let's take a look.

08:07.750 --> 08:13.450
Here we go these are all yahoo dot com emails and then all we wanted to do is count how many.

08:13.450 --> 08:19.110
So rather than selecting star who would you select count Starr as Yahoo users.

08:19.110 --> 08:24.610
I think it's what we call it from users where email lists like Yahoo.

08:24.850 --> 08:33.330
And this time there is 170 and finally challenge and refine calculate the total number of users for

08:33.420 --> 08:34.710
each email host.

08:34.920 --> 08:39.930
So this one involves using like along with a case statement it's a little bit complicated.

08:39.960 --> 08:46.420
So we're doing the same thing we're recounting number of users but we're doing it dependent on the provider.

08:46.620 --> 08:52.950
So we're going to have a case statement so we'll just start right off with select we'll do case and

08:52.950 --> 08:55.210
I like to do my end as well.

08:55.410 --> 08:56.700
I think we called it provider.

08:56.730 --> 08:57.600
Yep.

08:57.690 --> 09:04.530
So we'll start with that and will do from users and what is our case.

09:04.980 --> 09:07.570
Well we can start with this Yahoo.

09:07.580 --> 09:15.870
One will say when email is like Yahoo then provider is Yahoo.

09:15.900 --> 09:16.910
Right.

09:17.010 --> 09:25.310
So we'll will do the same thing just basically copy this and say when email is like at gmail dot com

09:25.740 --> 09:34.520
and in Gmail then gmail one thing I should note there is this kind of nicer way of doing this if you

09:34.520 --> 09:38.660
are familiar with regular expressions but thats out of the scope of this course.

09:39.050 --> 09:41.540
But if you do know it you can get by with that.

09:41.540 --> 09:42.700
So we've also got Hotmail

09:46.380 --> 09:48.410
than Hotmail.

09:48.420 --> 09:49.410
There we go.

09:49.980 --> 09:50.900
OK.

09:51.240 --> 09:54.650
Then finally we'll have or else which will just be other.

09:55.020 --> 10:01.530
So we have this provider field now a column that will have gmail yahoo hotmail or other.

10:01.600 --> 10:09.340
So let's do e-mail Khama case so that we can see the email side by side against what we pulled out of

10:09.340 --> 10:09.580
it.

10:09.590 --> 10:10.530
We got 500.

10:10.530 --> 10:13.050
Take this one re-adapt more one at gmail.

10:13.170 --> 10:14.280
We get gmail.

10:14.290 --> 10:16.040
Ruben Medhurst at Yahoo.

10:16.070 --> 10:17.340
We get Yahoo.

10:17.350 --> 10:18.060
Perfect.

10:18.280 --> 10:21.740
So now just a matter of grouping them together and counting.

10:22.060 --> 10:23.100
So we can get rid of e-mail.

10:23.110 --> 10:32.430
We don't need to display it then we're just going to group by provider which is what we just created.

10:32.470 --> 10:33.600
So let's start with that.

10:36.710 --> 10:37.330
Cool.

10:37.400 --> 10:38.460
So we're grouping them together.

10:38.480 --> 10:41.040
Looks like we don't have any others in this case.

10:41.120 --> 10:41.990
We're going to group them.

10:42.050 --> 10:49.070
And then finally we're going to count after our case statement ends we added comma count star

10:52.130 --> 10:54.340
copy paste.

10:54.590 --> 10:58.560
And there's just a small discrepancy which is this is called total users.

10:58.580 --> 11:00.280
Also this is sorted.

11:00.650 --> 11:02.650
Very easy to change.

11:02.990 --> 11:09.800
Call it has total users like to capitalize that actually.

11:09.980 --> 11:14.530
And then finally well let's just make sure that that works to start.

11:14.900 --> 11:15.650
Great.

11:15.660 --> 11:25.070
Now we need to sort of order by total users and if we do it this way you'll see that it's ascending.

11:25.080 --> 11:28.300
We want to do the sending.

11:28.390 --> 11:33.000
Right now we've got all the providers on the left and their total number of users.

11:33.340 --> 11:34.500
Perfect.

11:34.500 --> 11:34.890
All right.

11:34.900 --> 11:41.380
So hopefully enjoyed that chance of working with more data though isn't hard coded in some of its dynamic

11:41.530 --> 11:44.320
all of its namak 500 plus users.

11:44.320 --> 11:49.870
Next up in this next section we're finally going to connect the database we just created which is filled

11:49.870 --> 11:54.370
with all that see data and we're going to connect it with a web application and figure out how to get

11:54.370 --> 11:55.140
those to talk.
