WEBVTT

1
00:00:00.000 --> 00:00:01.290
In this lesson,

2
00:00:01.290 --> 00:00:03.750
we will learn about Data Management.

3
00:00:03.750 --> 00:00:07.200
Data management involves organizing, protecting,

4
00:00:07.200 --> 00:00:08.850
and analyzing data

5
00:00:08.850 --> 00:00:12.150
to ensure its integrity, confidentiality,

6
00:00:12.150 --> 00:00:15.930
and availability within specialized systems.

7
00:00:15.930 --> 00:00:19.500
Data management concepts include aggregation

8
00:00:19.500 --> 00:00:21.450
and data analytics.

9
00:00:21.450 --> 00:00:24.090
Aggregation refers to the process

10
00:00:24.090 --> 00:00:26.430
of collecting and combining data

11
00:00:26.430 --> 00:00:28.530
from various sources to create

12
00:00:28.530 --> 00:00:31.200
a single comprehensive data set

13
00:00:31.200 --> 00:00:35.460
that can be analyzed for patterns, trends, or anomalies.

14
00:00:35.460 --> 00:00:39.630
Data analytics involves examining aggregated data

15
00:00:39.630 --> 00:00:41.490
to identify insights,

16
00:00:41.490 --> 00:00:43.710
detect potential security threats,

17
00:00:43.710 --> 00:00:46.770
and inform the decision-making process.

18
00:00:46.770 --> 00:00:51.180
Let's learn more about aggregation and data analytics.

19
00:00:51.180 --> 00:00:53.370
First, we have aggregation.

20
00:00:53.370 --> 00:00:57.600
Aggregation is the process of collecting and combining data

21
00:00:57.600 --> 00:01:01.740
from multiple sources into one comprehensive data set.

22
00:01:01.740 --> 00:01:06.090
The purpose of this is to have a single, unified data set

23
00:01:06.090 --> 00:01:08.670
that can be analyzed to find patterns,

24
00:01:08.670 --> 00:01:10.860
trends, or anomalies.

25
00:01:10.860 --> 00:01:14.640
Without aggregation, we would have fragmented bits

26
00:01:14.640 --> 00:01:17.820
of data spread all across different systems,

27
00:01:17.820 --> 00:01:20.610
making it difficult to get a full picture

28
00:01:20.610 --> 00:01:24.300
of what's going on within the enterprise network.

29
00:01:24.300 --> 00:01:27.690
To understand the benefits of aggregation,

30
00:01:27.690 --> 00:01:30.360
let's think about an e-commerce business

31
00:01:30.360 --> 00:01:33.960
that wants to better understand its customers' behavior.

32
00:01:33.960 --> 00:01:37.650
The company collects data from various network sources,

33
00:01:37.650 --> 00:01:39.780
customer purchases from its website,

34
00:01:39.780 --> 00:01:42.030
user activity from its mobile app,

35
00:01:42.030 --> 00:01:45.030
marketing data from social media campaigns,

36
00:01:45.030 --> 00:01:48.630
and feedback from customer-service interactions.

37
00:01:48.630 --> 00:01:52.860
Each of these data sources contains useful information,

38
00:01:52.860 --> 00:01:54.480
but if kept separate,

39
00:01:54.480 --> 00:01:58.230
they only provide isolated and partial snapshots

40
00:01:58.230 --> 00:02:01.260
of the customer and their interactions.

41
00:02:01.260 --> 00:02:05.280
But through data aggregation, all these data points,

42
00:02:05.280 --> 00:02:07.530
purchase history, app usage,

43
00:02:07.530 --> 00:02:09.980
social media engagement, and feedback

44
00:02:09.980 --> 00:02:13.400
are pulled together into a single data set.

45
00:02:13.400 --> 00:02:16.950
This aggregated data set allows businesses

46
00:02:16.950 --> 00:02:18.870
to see a bigger picture.

47
00:02:18.870 --> 00:02:23.040
Now, they can analyze patterns in the unified data set

48
00:02:23.040 --> 00:02:26.340
to find out that this customer and others,

49
00:02:26.340 --> 00:02:28.500
who engage more with their app,

50
00:02:28.500 --> 00:02:31.650
also tend to purchase higher-value products

51
00:02:31.650 --> 00:02:35.340
after seeing specific types of social media ads.

52
00:02:35.340 --> 00:02:39.480
This can drive the marketing plan and increase sales.

53
00:02:39.480 --> 00:02:41.400
From a technical standpoint,

54
00:02:41.400 --> 00:02:44.460
aggregation also requires a system that can

55
00:02:44.460 --> 00:02:46.800
collect data in real-time

56
00:02:46.800 --> 00:02:50.430
or at regular intervals, normalize the formats,

57
00:02:50.430 --> 00:02:52.500
because data from each source might be

58
00:02:52.500 --> 00:02:54.390
structured a little bit differently,

59
00:02:54.390 --> 00:02:57.600
and then combine it all into a central database

60
00:02:57.600 --> 00:03:01.980
or repository for analysis and data analytics.

61
00:03:01.980 --> 00:03:04.480
Second, we have data analytics.

62
00:03:04.480 --> 00:03:07.350
Now that data aggregation has occurred,

63
00:03:07.350 --> 00:03:09.930
we can move on to data analytics.

64
00:03:09.930 --> 00:03:12.240
Data analytics is the process

65
00:03:12.240 --> 00:03:15.600
of analyzing and processing aggregated data

66
00:03:15.600 --> 00:03:19.230
to detect security incidents and potential threats.

67
00:03:19.230 --> 00:03:22.530
This involves using specialized tools that sift

68
00:03:22.530 --> 00:03:24.390
through large amounts of data

69
00:03:24.390 --> 00:03:27.540
to identify patterns or anomalies.

70
00:03:27.540 --> 00:03:29.550
The large amounts of data come

71
00:03:29.550 --> 00:03:31.420
from various network sources,

72
00:03:31.420 --> 00:03:34.230
like endpoint protection software,

73
00:03:34.230 --> 00:03:36.690
identity and access management tools,

74
00:03:36.690 --> 00:03:40.320
network traffic, threat feeds, cloud platforms,

75
00:03:40.320 --> 00:03:44.670
and other applications used throughout an organization.

76
00:03:44.670 --> 00:03:47.340
The primary goal of data analytics

77
00:03:47.340 --> 00:03:50.040
is to identify security issues

78
00:03:50.040 --> 00:03:53.460
and monitor for any suspicious activities.

79
00:03:53.460 --> 00:03:56.310
A common system used for this purpose

80
00:03:56.310 --> 00:03:59.520
is a Security Information and Event Management,

81
00:03:59.520 --> 00:04:01.170
or SIEM, platform.

82
00:04:01.170 --> 00:04:04.710
A SIEM takes aggregated data, normalizes it,

83
00:04:04.710 --> 00:04:07.500
and prepares it for further analysis.

84
00:04:07.500 --> 00:04:11.460
Normalization is required because data from different

85
00:04:11.460 --> 00:04:14.370
systems may be in various formats,

86
00:04:14.370 --> 00:04:17.160
and while data aggregation may have

87
00:04:17.160 --> 00:04:20.130
initiated the process of standardization,

88
00:04:20.130 --> 00:04:23.760
a SIEM ensures the data is fully normalized.

89
00:04:23.760 --> 00:04:27.660
After normalization, the data is in a consistent

90
00:04:27.660 --> 00:04:30.150
format that can be easily searched,

91
00:04:30.150 --> 00:04:34.230
indexed, and analyzed in a centralized repository.

92
00:04:34.230 --> 00:04:36.000
Once the data is collected,

93
00:04:36.000 --> 00:04:40.020
it moves through what is known as a processing pipeline.

94
00:04:40.020 --> 00:04:43.530
This pipeline ensures the data flows correctly

95
00:04:43.530 --> 00:04:47.490
and is processed before reaching the central repository.

96
00:04:47.490 --> 00:04:51.420
In the past, batch processing pipelines were common,

97
00:04:51.420 --> 00:04:54.480
where data was collected over a fixed period,

98
00:04:54.480 --> 00:04:58.050
such as 8 hours, and then processed in bulk.

99
00:04:58.050 --> 00:05:00.690
While effective for certain tasks like

100
00:05:00.690 --> 00:05:02.670
handling logs and reports,

101
00:05:02.670 --> 00:05:05.160
this method introduced delays,

102
00:05:05.160 --> 00:05:08.190
which were problematic for security analysts

103
00:05:08.190 --> 00:05:11.160
that required real-time information.

104
00:05:11.160 --> 00:05:13.110
So, to address this issue,

105
00:05:13.110 --> 00:05:17.220
modern systems now use stream processing pipelines.

106
00:05:17.220 --> 00:05:18.840
With stream processing,

107
00:05:18.840 --> 00:05:22.410
data is collected and analyzed in near real-time,

108
00:05:22.410 --> 00:05:25.230
significantly reducing the delay between

109
00:05:25.230 --> 00:05:27.390
collection and analysis.

110
00:05:27.390 --> 00:05:30.600
This allows security teams to detect and

111
00:05:30.600 --> 00:05:33.510
respond to threats more quickly.

112
00:05:33.510 --> 00:05:36.690
Once the data is in the central repository,

113
00:05:36.690 --> 00:05:38.460
it must be indexed.

114
00:05:38.460 --> 00:05:41.790
Indexing is important because it speeds up searches,

115
00:05:41.790 --> 00:05:46.140
making the data more accessible and ensuring analysts can

116
00:05:46.140 --> 00:05:49.860
find the information they need quickly and efficiently.

117
00:05:49.860 --> 00:05:53.100
Another important step is log curation.

118
00:05:53.100 --> 00:05:56.190
Log curation helps prioritize and filter the

119
00:05:56.190 --> 00:05:58.800
most relevant data for collection.

120
00:05:58.800 --> 00:06:01.080
When gathering data across a network,

121
00:06:01.080 --> 00:06:04.860
it's important not to collect everything indiscriminately.

122
00:06:04.860 --> 00:06:06.960
While it might seem beneficial to

123
00:06:06.960 --> 00:06:09.030
collect all available data,

124
00:06:09.030 --> 00:06:13.770
doing so can overwhelm the system and slow down analysis.

125
00:06:13.770 --> 00:06:16.590
Collecting too much data wastes resources

126
00:06:16.590 --> 00:06:18.270
in processing and indexing,

127
00:06:18.270 --> 00:06:21.960
making it harder to focus on the important information.

128
00:06:21.960 --> 00:06:24.240
So, by curating the log data,

129
00:06:24.240 --> 00:06:26.640
we ensure that only the most important data

130
00:06:26.640 --> 00:06:28.710
is collected and stored.

131
00:06:28.710 --> 00:06:32.550
After the data is in storage, it's essential to protect it.

132
00:06:32.550 --> 00:06:36.660
This is where database activity monitoring tools come in.

133
00:06:36.660 --> 00:06:39.840
Database activity monitoring tools monitor

134
00:06:39.840 --> 00:06:41.550
and secure databases

135
00:06:41.550 --> 00:06:46.170
by detecting unauthorized access and suspicious activities.

136
00:06:46.170 --> 00:06:49.740
They also assist in compliance audits by providing

137
00:06:49.740 --> 00:06:53.820
continuous real-time monitoring of the database.

138
00:06:53.820 --> 00:06:57.840
Database activity monitoring tools gather information

139
00:06:57.840 --> 00:07:00.600
from sources such as database logs,

140
00:07:00.600 --> 00:07:04.530
sensors attached to the database, or transaction logs,

141
00:07:04.530 --> 00:07:08.280
offering visibility into database activity and

142
00:07:08.280 --> 00:07:11.580
ensuring data integrity and security.

143
00:07:11.580 --> 00:07:13.797
There are three main methods used

144
00:07:13.797 --> 00:07:16.260
for database activity monitoring.

145
00:07:16.260 --> 00:07:19.320
The first is the interception-based model,

146
00:07:19.320 --> 00:07:22.020
which monitors communication between the

147
00:07:22.020 --> 00:07:23.880
client and the server.

148
00:07:23.880 --> 00:07:26.580
The second is the memory-based model,

149
00:07:26.580 --> 00:07:29.790
where a sensor attached to the database captures

150
00:07:29.790 --> 00:07:33.000
SQL statements as they are executed.

151
00:07:33.000 --> 00:07:35.400
The third is the log-based model,

152
00:07:35.400 --> 00:07:40.320
which monitors transactions based on database server logs.

153
00:07:40.320 --> 00:07:44.010
Each method has its strengths and weaknesses.

154
00:07:44.010 --> 00:07:47.610
For instance, interception and log-based methods

155
00:07:47.610 --> 00:07:49.500
may miss some data,

156
00:07:49.500 --> 00:07:52.590
while memory-based models capture more data

157
00:07:52.590 --> 00:07:54.540
but can slow down the system.

158
00:07:54.540 --> 00:07:58.680
Database activity monitoring tools can also

159
00:07:58.680 --> 00:08:00.720
help with data classification,

160
00:08:00.720 --> 00:08:04.950
data loss prevention, and ensure data integrity.

161
00:08:04.950 --> 00:08:08.280
By using database activity monitoring tools,

162
00:08:08.280 --> 00:08:11.340
organizations can protect their databases

163
00:08:11.340 --> 00:08:15.180
from unauthorized access or malicious behavior.

164
00:08:15.180 --> 00:08:18.330
Database activity monitoring tools combine

165
00:08:18.330 --> 00:08:20.310
methods like network sniffing,

166
00:08:20.310 --> 00:08:23.220
memory scraping, reading system tables,

167
00:08:23.220 --> 00:08:26.430
and analyzing database logs to get a full

168
00:08:26.430 --> 00:08:29.340
picture of who is accessing the system.

169
00:08:29.340 --> 00:08:33.000
This makes database activity monitoring a key

170
00:08:33.000 --> 00:08:36.210
part of an organization's security strategy,

171
00:08:36.210 --> 00:08:38.520
ensuring that potential threats are

172
00:08:38.520 --> 00:08:41.010
detected and addressed quickly.

173
00:08:41.010 --> 00:08:45.900
So, remember, data management is all about organizing,

174
00:08:45.900 --> 00:08:49.350
protecting, and analyzing data to maintain

175
00:08:49.350 --> 00:08:52.020
its integrity and security.

176
00:08:52.020 --> 00:08:56.640
One key process within data management is data aggregation,

177
00:08:56.640 --> 00:09:00.270
where data from different sources is combined into

178
00:09:00.270 --> 00:09:03.450
a single data set for analysis.

179
00:09:03.450 --> 00:09:07.230
After aggregation, data analytics come into play,

180
00:09:07.230 --> 00:09:09.900
allowing organizations to analyze

181
00:09:09.900 --> 00:09:12.210
the collected data for patterns,

182
00:09:12.210 --> 00:09:14.980
trends, or potential security threats.

183
00:09:14.980 --> 00:09:19.530
Tools like Security Information and Event Management,

184
00:09:19.530 --> 00:09:21.270
or SIEM platforms,

185
00:09:21.270 --> 00:09:24.300
help normalize and process this data

186
00:09:24.300 --> 00:09:27.420
in real-time or near real-time

187
00:09:27.420 --> 00:09:31.800
to enable faster detection and response to issues.

188
00:09:31.800 --> 00:09:35.580
Finally, database activity monitoring tools protect

189
00:09:35.580 --> 00:09:38.160
and monitor the data in storage,

190
00:09:38.160 --> 00:09:41.523
ensuring it remains secure and compliant.

