WEBVTT

1
00:00:00.000 --> 00:00:01.230
In this lesson,

2
00:00:01.230 --> 00:00:04.140
we will learn about metadata analysis.

3
00:00:04.140 --> 00:00:08.220
Metadata analysis is examining the underlying data

4
00:00:08.220 --> 00:00:11.850
about files, media, or communications.

5
00:00:11.850 --> 00:00:14.040
Metadata analysis can be used

6
00:00:14.040 --> 00:00:17.490
to uncover information about the origin,

7
00:00:17.490 --> 00:00:22.020
manipulation, or potential malicious intent of files,

8
00:00:22.020 --> 00:00:24.600
media, or communications.

9
00:00:24.600 --> 00:00:28.620
Analysis concepts include files and file systems,

10
00:00:28.620 --> 00:00:33.150
images, audio and video, and email header analysis.

11
00:00:33.150 --> 00:00:37.110
Within files and file systems, metadata analysis

12
00:00:37.110 --> 00:00:39.690
can reveal file creation dates,

13
00:00:39.690 --> 00:00:43.560
modification times, and user permissions.

14
00:00:43.560 --> 00:00:48.030
Next, metadata from images, audio, and video

15
00:00:48.030 --> 00:00:51.450
can include information about device models,

16
00:00:51.450 --> 00:00:54.780
GPS coordinates, or editing history.

17
00:00:54.780 --> 00:00:58.680
Finally, email header metadata provides details

18
00:00:58.680 --> 00:01:02.670
about the sender, recipient, and transmission path.

19
00:01:02.670 --> 00:01:05.640
Let's learn more about metadata analysis

20
00:01:05.640 --> 00:01:09.660
in files and file systems, images, audio and video,

21
00:01:09.660 --> 00:01:12.300
and email header analysis.

22
00:01:12.300 --> 00:01:16.620
First, we have metadata analysis in files and file systems.

23
00:01:16.620 --> 00:01:20.010
Metadata analysis in files and file systems

24
00:01:20.010 --> 00:01:24.450
involves examining the underlying information about a file,

25
00:01:24.450 --> 00:01:26.340
such as its creation date,

26
00:01:26.340 --> 00:01:29.970
modification history, and user permissions.

27
00:01:29.970 --> 00:01:31.500
A command line tool,

28
00:01:31.500 --> 00:01:34.980
like stat in Linux or Windows file properties

29
00:01:34.980 --> 00:01:36.870
can reveal this metadata,

30
00:01:36.870 --> 00:01:40.020
allowing analysts to track changes to files

31
00:01:40.020 --> 00:01:43.140
and detect unauthorized modifications.

32
00:01:43.140 --> 00:01:48.140
For example, the command stat /path/to/file.txt

33
00:01:51.420 --> 00:01:55.230
would return detailed metadata for file.text,

34
00:01:55.230 --> 00:01:59.310
including the file's access permissions, ownership,

35
00:01:59.310 --> 00:02:03.810
timestamps, last modified, accessed, and changed, and more.

36
00:02:03.810 --> 00:02:08.220
So if a file was modified outside normal working hours

37
00:02:08.220 --> 00:02:11.400
or showed unauthorized access patterns,

38
00:02:11.400 --> 00:02:13.860
this metadata could help pinpoint

39
00:02:13.860 --> 00:02:17.100
when and possibly how a file was altered,

40
00:02:17.100 --> 00:02:19.800
aiding in security investigations.

41
00:02:19.800 --> 00:02:23.250
File system metadata can also include details

42
00:02:23.250 --> 00:02:26.490
about file ownership and access control,

43
00:02:26.490 --> 00:02:30.210
helping analysts detect abnormal access patterns.

44
00:02:30.210 --> 00:02:35.210
For instance, forensic tools like Autopsy or FTK Imager,

45
00:02:35.430 --> 00:02:38.430
can extract metadata from a disc image

46
00:02:38.430 --> 00:02:42.750
to investigate who accessed or modified certain files.

47
00:02:42.750 --> 00:02:45.690
By analyzing this type of metadata,

48
00:02:45.690 --> 00:02:48.840
investigators contract the sequence of events

49
00:02:48.840 --> 00:02:51.360
leading to a security incident,

50
00:02:51.360 --> 00:02:55.650
such as tracing how malware was planted in a system,

51
00:02:55.650 --> 00:02:59.790
or uncovering hidden files with modified attributes

52
00:02:59.790 --> 00:03:02.250
designed to evade detection.

53
00:03:02.250 --> 00:03:04.980
Second, we have metadata analysis

54
00:03:04.980 --> 00:03:07.800
in images, audio, and video.

55
00:03:07.800 --> 00:03:11.400
Metadata analysis of media files like images,

56
00:03:11.400 --> 00:03:16.020
audio, and video often focus on extracting information

57
00:03:16.020 --> 00:03:19.560
about the device used to create the file,

58
00:03:19.560 --> 00:03:23.160
the GPS coordinates, and editing history.

59
00:03:23.160 --> 00:03:27.540
Cross platform tools such as ExifTool allow analysts

60
00:03:27.540 --> 00:03:30.960
to dig into the metadata of media files,

61
00:03:30.960 --> 00:03:34.350
uncovering details such as camera model,

62
00:03:34.350 --> 00:03:36.420
software used for editing,

63
00:03:36.420 --> 00:03:39.690
and even the location where media was captured.

64
00:03:39.690 --> 00:03:44.690
For example, the command exiftool /path/to/image.jpg

65
00:03:48.630 --> 00:03:52.230
would display metadata for image.jpg,

66
00:03:52.230 --> 00:03:56.100
including details like the camera model, creation date,

67
00:03:56.100 --> 00:03:58.590
GPS coordinates if available,

68
00:03:58.590 --> 00:04:01.830
software used for editing, and much more.

69
00:04:01.830 --> 00:04:04.604
Analysts can use this information to determine

70
00:04:04.604 --> 00:04:08.910
where and when the image was taken, whether it was edited,

71
00:04:08.910 --> 00:04:11.610
or to trace the source of the file.

72
00:04:11.610 --> 00:04:15.270
This type of analysis is particularly useful

73
00:04:15.270 --> 00:04:20.160
in cases of digital forensics or authenticity verification.

74
00:04:20.160 --> 00:04:23.640
For example, an image's metadata might reveal

75
00:04:23.640 --> 00:04:26.430
that it was edited with specific software,

76
00:04:26.430 --> 00:04:30.660
even though it was claimed to be an original photograph,

77
00:04:30.660 --> 00:04:33.330
indicating potential manipulation.

78
00:04:33.330 --> 00:04:37.740
Similarly, video or audio files can include metadata

79
00:04:37.740 --> 00:04:41.910
that provides details about the device used to record them,

80
00:04:41.910 --> 00:04:43.830
or timestamps that indicate

81
00:04:43.830 --> 00:04:47.220
when and where the recording took place.

82
00:04:47.220 --> 00:04:50.580
This can be critical in legal investigations

83
00:04:50.580 --> 00:04:53.670
as it helps establish the chain of custody

84
00:04:53.670 --> 00:04:56.310
or disproves false claims.

85
00:04:56.310 --> 00:04:59.400
For instance, if a video file is presented

86
00:04:59.400 --> 00:05:01.530
as evidence in a legal case,

87
00:05:01.530 --> 00:05:04.770
the metadata might show that the video was edited

88
00:05:04.770 --> 00:05:07.200
after the alleged event took place,

89
00:05:07.200 --> 00:05:09.570
challenging its authenticity.

90
00:05:09.570 --> 00:05:14.570
Third and last, we have metadata analysis in email headers.

91
00:05:14.730 --> 00:05:18.630
Email metadata analysis focuses on extracting

92
00:05:18.630 --> 00:05:22.740
and interpreting the details hidden in email headers

93
00:05:22.740 --> 00:05:27.180
to uncover the origin and transmission path of a message.

94
00:05:27.180 --> 00:05:29.550
Tools like MXToolbox

95
00:05:29.550 --> 00:05:32.760
or the header analysis features in email clients,

96
00:05:32.760 --> 00:05:37.080
like Gmail and Outlook, can display this information,

97
00:05:37.080 --> 00:05:40.080
allowing analysts to trace an email's journey

98
00:05:40.080 --> 00:05:41.910
through mail servers.

99
00:05:41.910 --> 00:05:45.480
Key fields such as received, message-ID,

100
00:05:45.480 --> 00:05:49.980
and return-path can reveal whether an email was spoofed

101
00:05:49.980 --> 00:05:52.470
or if it came from a legitimate source.

102
00:05:52.470 --> 00:05:56.176
By analyzing these fields, an investigator might detect

103
00:05:56.176 --> 00:06:00.120
that an email that appears to come from a trusted domain

104
00:06:00.120 --> 00:06:04.500
was actually sent from an unknown or malicious IP address.

105
00:06:04.500 --> 00:06:08.460
For example, in a phishing attack, email headers can show

106
00:06:08.460 --> 00:06:11.259
that the email originated from a server

107
00:06:11.259 --> 00:06:14.310
in a completely different region than expected,

108
00:06:14.310 --> 00:06:17.490
or it may lack proper authentication records,

109
00:06:17.490 --> 00:06:20.190
like DKIM or SPF.

110
00:06:20.190 --> 00:06:23.400
This metadata can help security teams

111
00:06:23.400 --> 00:06:27.390
identify phishing attempts, block malicious senders,

112
00:06:27.390 --> 00:06:29.790
and prevent further compromise.

113
00:06:29.790 --> 00:06:32.640
By closely examining email headers,

114
00:06:32.640 --> 00:06:36.000
cybersecurity professionals can also determine

115
00:06:36.000 --> 00:06:39.720
if emails were rerouted through suspicious servers

116
00:06:39.720 --> 00:06:43.410
or if there are inconsistencies in the timestamps

117
00:06:43.410 --> 00:06:45.510
that suggest message tampering.

118
00:06:45.510 --> 00:06:48.990
Now, let's take a look at email header analysis

119
00:06:48.990 --> 00:06:53.220
using the MXToolbox email header analyzer.

120
00:06:53.220 --> 00:06:57.330
On the right hand side of the screen is an email header.

121
00:06:57.330 --> 00:07:00.510
In Gmail, we can get access to email headers

122
00:07:00.510 --> 00:07:04.850
by selecting more and then show original.

123
00:07:04.850 --> 00:07:07.740
On the right hand side, let's take this email header

124
00:07:07.740 --> 00:07:12.479
that I've created and analyze it with MXToolbox.

125
00:07:12.479 --> 00:07:17.479
This is mxtoolbox.com specifically for email headers.

126
00:07:18.000 --> 00:07:20.610
We'll go ahead and paste our email header in here

127
00:07:20.610 --> 00:07:24.780
and then analyze it.

128
00:07:24.780 --> 00:07:28.800
As you can see, the email is a bit more sorted out here,

129
00:07:28.800 --> 00:07:33.565
but we can find some malicious indicators in this email.

130
00:07:33.565 --> 00:07:35.820
Specifically in this email header,

131
00:07:35.820 --> 00:07:39.330
there are four malicious indicators that stand out.

132
00:07:39.330 --> 00:07:43.080
The email was received from suspicious IP addresses.

133
00:07:43.080 --> 00:07:45.300
There was an SPF failure.

134
00:07:45.300 --> 00:07:49.080
There are DKIM and DMARC inconsistencies,

135
00:07:49.080 --> 00:07:52.800
and there's a misaligned reply to address.

136
00:07:52.800 --> 00:07:54.840
Let's look at each of these.

137
00:07:54.840 --> 00:07:59.070
First, the email was received from a suspicious IP address,

138
00:07:59.070 --> 00:08:04.058
specifically 45.227.253.94,

139
00:08:05.748 --> 00:08:10.748
and 178.62.193.182.

140
00:08:11.264 --> 00:08:14.640
I can take either of these emails

141
00:08:14.640 --> 00:08:18.720
and put them into the MXToolbox super tool

142
00:08:18.720 --> 00:08:21.570
to check whether or not they're on a blacklist,

143
00:08:21.570 --> 00:08:26.520
and you can see that IP address 178.62.193.182

144
00:08:28.140 --> 00:08:29.940
is on a blacklist.

145
00:08:29.940 --> 00:08:31.890
I can even select detail here

146
00:08:31.890 --> 00:08:35.100
to get more information about why

147
00:08:35.100 --> 00:08:38.571
that particular IP address is blacklisted.

148
00:08:38.571 --> 00:08:41.280
So both of these IP addresses

149
00:08:41.280 --> 00:08:43.560
are linked to known malicious domains

150
00:08:43.560 --> 00:08:47.820
or IP addresses associated with phishing activity.

151
00:08:47.820 --> 00:08:52.820
Next, the email failed an SPF check.

152
00:08:52.920 --> 00:08:56.640
This indicates that the IP address sending the email

153
00:08:56.640 --> 00:08:59.160
is not authorized to send emails

154
00:08:59.160 --> 00:09:03.750
on behalf of the legitimate domain, chase.com.

155
00:09:03.750 --> 00:09:06.570
Next, there is no DKIM signature

156
00:09:06.570 --> 00:09:09.990
and the DMARC check failed.

157
00:09:09.990 --> 00:09:13.110
This further confirms that the email is spoofed

158
00:09:13.110 --> 00:09:16.680
and did not originate from the legitimate domain.

159
00:09:16.680 --> 00:09:19.470
Finally, the reply to address.

160
00:09:19.470 --> 00:09:24.470
Replies to a domain secure-chase.login.com,

161
00:09:24.510 --> 00:09:29.510
which is different than the from address of chase.com.

162
00:09:29.580 --> 00:09:33.630
We can take this secure-chase.login.com

163
00:09:33.630 --> 00:09:37.560
and go again to the MXToolbox, to our super tool,

164
00:09:37.560 --> 00:09:40.650
and look up that domain,

165
00:09:40.650 --> 00:09:43.230
and we will find that there is no DMARC

166
00:09:43.230 --> 00:09:45.780
or DNS record published,

167
00:09:45.780 --> 00:09:49.140
further confirming that this is a fraudulent email

168
00:09:49.140 --> 00:09:51.960
intended to deceive recipients.

169
00:09:51.960 --> 00:09:55.590
Any replies to this email would be sent

170
00:09:55.590 --> 00:10:00.180
to the malicious domain secure-chase.login.com

171
00:10:00.180 --> 00:10:03.060
rather than chase.com.

172
00:10:03.060 --> 00:10:05.700
So this email header confirms

173
00:10:05.700 --> 00:10:07.770
that the email has been crafted

174
00:10:07.770 --> 00:10:12.540
to trick recipients into believing it is from Chase Bank.

175
00:10:12.540 --> 00:10:15.780
When analyzed with a tool like MXToolbox,

176
00:10:15.780 --> 00:10:19.530
the header clearly reveals failures in authentication

177
00:10:19.530 --> 00:10:22.050
and highlight that the email originated

178
00:10:22.050 --> 00:10:24.030
from a suspicious domain,

179
00:10:24.030 --> 00:10:27.000
raising significant phishing concerns.

180
00:10:27.000 --> 00:10:29.970
This is the end of our demonstration.

181
00:10:29.970 --> 00:10:34.970
So remember, metadata analysis involves examining

182
00:10:36.226 --> 00:10:39.270
the hidden data in files, media, or communications

183
00:10:39.270 --> 00:10:42.600
to uncover important details like origin,

184
00:10:42.600 --> 00:10:45.720
manipulation, or malicious intent.

185
00:10:45.720 --> 00:10:50.400
It can reveal key information, such as file creation dates,

186
00:10:50.400 --> 00:10:55.050
modifications, user permissions, and access history,

187
00:10:55.050 --> 00:10:57.630
making a metadata analysis useful

188
00:10:57.630 --> 00:11:01.260
for detecting tampering or unauthorized changes.

189
00:11:01.260 --> 00:11:04.590
Next, for images, audio, and video,

190
00:11:04.590 --> 00:11:08.940
metadata can show device details, GPS locations,

191
00:11:08.940 --> 00:11:12.420
and editing history, which helps track the source

192
00:11:12.420 --> 00:11:14.610
or verify authenticity.

193
00:11:14.610 --> 00:11:19.500
Finally, email header analysis focuses on revealing the path

194
00:11:19.500 --> 00:11:23.700
and origin of a message, helping to identify spoofing

195
00:11:23.700 --> 00:11:26.730
or phishing attempts by checking details,

196
00:11:26.730 --> 00:11:29.880
like IP addresses and server hubs.

197
00:11:29.880 --> 00:11:33.720
Overall, metadata analysis is a powerful tool

198
00:11:33.720 --> 00:11:36.150
in cybersecurity investigations,

199
00:11:36.150 --> 00:11:38.670
aiding in uncovering anomalies

200
00:11:38.670 --> 00:11:42.783
and tracing the origins of suspicious activities.