So many people today are saying data is revolutionizing the world, and that with this much data, we’ll be able to solve complex problems and acquire a higher form of intelligence and knowledge.
In 2010, Eric Schmidt, then CEO of Google, was reported as saying: “Every two days, as much information is created as has been in existence since the dawn of civilization up until 2003.” Every day, over 2 quintillion bytes of data are created.
There is so much data that even the term data was not enough, so mankind invented a new one: "Big Data."
They say big data is changing the world and helping everyone get better at whatever they might be doing. Big data can answer complex questions. Big data can predict the future. Big data can cure diseases. Seems like big data is the second coming of God. It’s omnipotent!
If that’s true, then we should just sit back, relax, and wait for big data to magically solve all of our problems. However, let’s stop for a second and try to answer a few important questions.
What is data?
Let’s remove “Big” for a moment and talk about “Data.” What is data? According to Wikipedia:
Data is a set of values of qualitative or quantitative variables.
Okay… Most of the words are familiar. But how do we get an idea of what data really is? Not just the definition, but its real meaning.
According to The Economist, “Data is becoming the new raw material of business: an economic input almost on par with capital and labor.”
25 years ago we were only generating 100GB of data per day across the globe. Today, it’s roughly 3.5 Petabytes per day, or 40GB per second. To understand better these units, you can use a free online byte converter.
So where is all this data coming from? The source is all the digital photos, videos, posts to social media, purchase transactions, scientific records, navigation data, etc.
A few questions you can ask to determine what data is to you:
- Are prices in the grocery store, data?
- Is distance from Earth to the Moon, data?
- Is current time, data?
- Is Leonardo Da Vinci’s “The Mona Lisa”, data?
- Is graffiti under the bridge, data?
- Is the current position of all atoms in the universe, data?
- Is Wikipedia’s definition of data, data?
Obviously, there are many different types of data. The important question to ask yourself is:
Is it relevant (or interesting) to me?
Data is all the information that contributes to gaining the necessary knowledge in order to answer your questions in business and in life.
However, not all data will give you the right answer. And even though the answer might turn out to be wrong, it will still be data.
How do we study data?
The process of studying data is known as data analysis. To reveal meaningful information, we must process the data using analytics tools and algorithms, applicable to our situation.
Throughout history, we’ve been analyzing data in an attempt to solve all kinds of problems. This is how our minds work and it’s inherent in our nature. We review existing data to find new insights and solutions.
Let’s say you’ve grown sick and tired of sitting in traffic on the way to work every morning. How can you use data analysis to determine the best time to leave for work to avoid traffic? It’s pretty straightforward.
You can build a simple table of average speeds and travel times, based on historical traffic data from your favorite maps software and Voila! You have your answer!
That’s a pretty basic problem, and you won’t really need any advanced tools to solve it.
Now, let’s say you’re a big online retailer, and you’re trying to understand how your customers make purchasing decisions, and how you can remove operational inefficiencies in the process. This problem is way more complex and solving it requires the use of advanced analytics tools and strategies.
However, even some of the world’s most powerful analytics tools will not give you an extremely low level of data granularity required to solve such problems, and some say big data is the answer to such problems.
What is big data?
Big data is a phenomenon that is characterized by the rapid growth of raw data. It is a term used to describe data sets that are so large and complex that conventional data analytics tools are unable to deal with it. Big data is also a technology to transform analysis of data-heavy workloads.
We know big data comes from many different sources, which means big data is messy data. This data is moving so fast that it represents both a challenge and an opportunity.
The challenge is related to how this much data is stored and processed.
Big Data is so much data that you can’t process it directly, so you have to build additional solutions to answer questions.
Working with big data usually means having distributed storage (like Hadoop) and implementing distributed algorithms (like Mapreduce) to process data. Facebook processes large amounts of data by using thousands of computers with software developed by some of the world’s smartest engineers.
The opportunity is related to how this data will enable businesses to broaden the use of existing data, increase analytics performance and efficiency, improve business outcomes, and manage risks.
What’s the magic behind big data?
If you google “Big Data,” you’ll find a long list of products and services available, claiming to change the way we look at data. To some, it may seem like big data is just something you can acquire, apply to your data, and immediately start reaping the rewards.
Not so fast.
There are plenty of big data success stories out there, and you’ll definitely find a bunch of fantastic cases, but you’ll hardly get any specific figures on what exactly was accomplished, and what specific problems were solved.
Something to think about: even a data giant like Google failed at predicting a flu spread, and they are the biggest. Oh and they still can’t predict the weather accurately.
The truth is, who cares how big your data is. It’s not some magic crystal ball that will give you all the answers. Big data is all about optimal processing power, and to extract meaningful value from it, you’ll need analytics capabilities and skills.
Big Data does not provide any insights out of the box
Does that mean big data doesn’t work? No. It simply means big data alone, will not produce any results, and you’ll need to put in some effort.
How do I work with big data?
Since big data is just data, you won’t need to make a whole lot of changes to the way you work with it. We’ve already established that big data cannot think for itself, Ultimately, the value and effectiveness of big data depends on the human operators tasked with analyzing the data and formulating the proper questions to direct big data projects.
Remember ‘The Hitchhiker’s Guide to the Galaxy’?
After 7.5 million years of thinking “...the supercomputer… Deep Thought... points out that the answer seems meaningless because the beings who instructed it never actually knew what the question was.” [1] Vague questions will get you nowhere.
How big should my data be?
Or, do you need a lot of data to solve your problem?
Let’s say you’re an online retailer with a huge customer base, in the millions, and you are planning to introduce a new feature, or product on your website. Your first thought is to survey the entire customer base because you want to hear everyone’s opinion. Question is, do you really need to survey your entire customer base? Or, should you just survey a smaller group of your customers?
Picking a smaller subset of users is known as data sampling. A good sample of data can save you time spent on processing and analysis, while maintaining an acceptable level accuracy of the results.
Google Analytics uses data sampling in their charts and tables, and some would argue that even though it’s not the most accurate analysis, you get the data within seconds, which enables you to make decisions faster.
Of course, data sampling doesn’t work for every situation.
Let’s say your online store’s checkout conversion rates are low, and you’re trying to figure out exactly where and why your customers are dropping out of completing a purchase. In this case, you’ll need a lot more data because it’s better to analyze the entire customer base, so that you account for all varieties of user scenarios and understand your improvement opportunities.
Every situation is different, so you’ll need to determine what’s important to your business and the trade-offs you’re willing to deal with.
What tools should I use?
There are tons of data processing solutions on the market today, and it’s entirely up to you. We’re living in the age of inexpensive processing power and software, so your tool selection process shouldn’t be that challenging, and should be driven strictly by your business needs.
Typically you’ll have two choices:
- A customized, all-in-one, universal, complex solution that’s powerful but expensive. You’ll need time and money to set it up and configure, but it will be tailored precisely to your needs, and you won’t have to build anything of your own.
- A small, dedicated analytics service (usually online), which covers one or few specific data sets/categories.Such tools fill specialized market niches and require less technical involvement.
Either option has its pros and cons. Your requirements are what’s important here, and just as important is how quickly your requirements will be changing. You want a solution that’s scalable and extensible, and one that won’t require you to ‘re-invent the wheel’.
Chances are, someone out there is already solving problems similar to yours, so do your research and don’t jump to conclusions about big data.
Never stop learning.
As you see, big data doesn’t change anything by itself. Owning it isn’t enough. You should constantly improve your data analytics and algorithms, and ask tons of questions. That’s the only way you’ll grow your business.
The age of big data is not really about volumes of data. Most companies don’t need their data to be big, or they simply don’t have large amounts of it. It’s about having access to relevant data, so you can deepen your analysis.
Nowadays, for $50 a month, you can have an on-demand analytics service that will help your marketing team with lead generation analysis. Another $50 and you get a service that shows you how your marketing channels are performing. $50 more and your eCommerce team can get invaluable analysis on your customer’s shopping trends, so you can optimize your product experience.
Data is learning. Learning is growth.
And nowadays data is almost free. It has never been this easy to get data, and big data is just a part of a larger and more important shift happening today.
So, is big data right for you?
Unless you’re immediately struggling from data overload, and your data processing power is unable to catch up with business demands, you can probably skip asking this question altogether.
Instead, focus on addressing your business needs by asking the right questions and formulating effective strategies and solutions. There’s a pretty good chance you’ll be able to get the most out of your data, and solve complex business problems using one of the many data analytics tools out on the market today, and you won’t need big data.
on
Big data analysis is understood as the analysis of data arrays within the capabilities of a personal computer and within the capabilities of relational database management systems, while both in the first and in the second case, certain difficulties arise in the formation of both statistics and visualization, which are in the need to ensure the coordinated operation of computer programs on tens, hundreds or even thousands of servers.
Big data analysis can be characterized by the following parameters:
1. Volume, i.e. the amount of data generated. From this
indicator depends on whether a particular dataset can be considered big data or not. The data is stored by SQL servers in the cloud.
2. Diversity, i.e. the category to which big data belongs. Knowledge of this affiliation allows analysts to work with information most effectively.
3. Speed, i.e. the speed of generating or processing data in order to achieve the goals.
4. Variability, i.e. instability of data over time.
5. Credibility, i.e. the quality of the collected data, on which the accuracy of the analysis depends.
6. Complexity, i.e. laboriousness of the process of correlation and building relationships between data.