Big data is the way businesses do business today. Your company is using a fair amount of data to get to know your customers, no doubt. You’ve probably heard of a thousand more ways that big data analytics are giving companies new insights. But how do you know if you’re keeping enough data? The right data?
Well, first we need to think about the question: what is big data? It’s important to remember all of the places that big data comes from. You’re familiar with some big data examples, like customer age, or the most frequently viewed pages on your website. You may have thought less about data like blog posts or weather sensor data. But yes, there is a ton of data that we can keep. It can come from customer interactions, internet chatter, industry and economic factors, and even information about climate.
Back to big data for your company. Even though all of this information exists, should you be keeping all of it? Let’s look at some big data basics to help you start thinking through your big data plan.
The Four Vs
You may have heard of the four Vs of big data. They are:
- Volume – the amount of data you keep
- Velocity – how quickly and easily the data can be processed or used
- Variety – the different kinds of data that we keep
- Veracity – how trustworthy the data is
What’s the point of the four Vs? Well, they are a good starting point to consider what information is best to keep. Is the data something that will be hard to use in the future? Is the data giving you trustworthy information?
Answering these questions will help you prioritize your data needs.
Structured, Unstructured, and Semi-Structured Data
When we’re thinking about how useable data is, it probably makes sense to start by thinking about how the information would be sorted or read. Here’s some more big data basics vocabulary to help us categorize:
- Structured data – This information fits neatly into a spreadsheet. For example, how many click-throughs did an email campaign get? The data that we would track is a number–easy to use. This is structured data. Other examples of structured data are a customer’s name and demographics from a sign-up form or business hours of operation listed on a website.
- Unstructured data – This is everything that can’t be automatically entered into a spreadsheet category. So this is everything from pictures on social media to blog post messaging. You might want to keep blog posts that mention your company, but the text doesn’t just slide into a category or tag. It’s harder to compare this data at a glance later.
- Semi-structured data – This is stuff that has some internal markers that can be tagged easily. A common example is email. The text of an email seems similar to the blog post example of unstructured data. But, email has some built-in information that can be sorted easily, like date, time, and maybe keywords. Part of it is structured; part of it is not.
We can’t talk about big data without thinking about security. As we know from data breach after data breach, the information that we mine can get our companies into trouble. Plus, there are regulations to follow, like credit card information privacy rules. So when setting your data priorities, make sure you consider the security burden you will have to take on.