The other day, I had a fun discussion with a colleague. We talked about data marts. Just in case you are not familiar with the term, let me take a moment to describe the concept.
A data mart is basically a small database that contains a sub-set of your company’s data. It is a copy. On the surface, it might sound like a waste of time. After all, maintaining two sets of data, can be challenging and it is an opportunity for mistakes. So why bother?
Most IT systems (programs) store their data in some kind of database. Some systems are (primarily) meant for gathering data and some are meant for displaying data. Big companies tend to have very large databases. Getting useful data out of them can be slow. Inserting data into a big DB can slow-down other people who are trying to read data. On a web site, you don’t want “slow”. There are of course, ways around this.
Managing/Dealing-with Large Data
When you think about data, it is best to think of your data like your money. Keeping it is good. The more (useful data) you gather, the more power it can provide to you. You always need some of it at-hand. Once you gather a bunch, you probably want to start putting some away (like a 401k). Your goal will be to accumulate it steadily. The expectation is that it will be more valuable someday. Having a good plan is important, so you can get a good value out of it someday.
Here is something that you probably don’t want to do: carry all of your money around with you all of the time. Why? Because you don’t want someone to steal it, it is probably big and bulky, and you don’t really need it right now. Just put it in the bank and you can get to it when you need it.
Data is like this too. On your web server, it is pretty rare for someone to need all of the data that your company has been accumulating. You usually only need current data, or a few summaries, and maybe sometimes, you need a big chunk, but not all of the time.
Solution: Data Mart
A data mart is basically, like a wallet full of data or maybe a debit card for data or something like that. It is a smaller database that only contains the data that your web server is typically going to use on a day, for one system. You don’t have your debit card connected to your 401 k, right? You also don’t need your web server connected to all of your data. Your data is also probably pretty bulky and slow. Maybe bad people would like to do evil stuff with it, so you should protect it and only carry-around the data that you actually need.
To summarize, a data mart for your web server, benefits you by:
- Isolating traffic – web server demand is isolated from your main DB. So all web traffic doesn’t affect your main DB, and traffic to your main DB doesn’t slow-down your web server. This protects you against a DDOS attack.
- Smaller data = faster – Certainly, it is much faster to query a smaller amount of data. This protects you against normal, healthy traffic and yields quick responses.
- Less data = less exposure – If your web server ever becomes compromised (hacked), the culprits are likely to get everything that is on the server, and might get everything connected to it. If you plan your systems for this possibility, you will see that this defensive posture (of having less data) minimizes the damage which could occur from a data breach.
Bottom line: keep different (isolated) systems for your internal users, and external users. It takes a little more thinking, planning, and equipment, but it is much better than walking around with a big sack of money.