Data Science - Is Evolution ?? A complete analysis ~ Let we Learn IT

"We have lots of data – now what?"

(How can we unlock valuable insight from our data?)

Data science is deep knowledge discovery through data inference and exploration. This discipline often involves using mathematic and algorithmic techniques to solve some of the most analytically complex business problems, leveraging troves of raw information to figure out hidden insight that lies beneath the surface. It centers around evidence-based analytical rigor and building robust decision capabilities.

Ultimately, data science matters because it enables companies to operate and strategize more intelligently. It is all about adding substantial enterprise value by learning from data.

The variety of projects that a data scientist may be engaged in is incredibly broad. Here are few examples:

tactical optimization – improvement of marketing campaigns, business processes, etc
predictive analytics – anticipate future demand, future events, etc
nuanced learning – e.g. developing deep understanding of consumer behavior
recommendation engines – e.g. Amazon product recs, Netflix movie recs
automated decision engines – e.g. automated fraud detection, and even self-driving cars

The objectives of these types of initiatives may be clear, but the problems require extensive quantitative expertise to solve. They may require building predictive models, attribution models, segmentation models, heuristics for deep pattern-discovery in data, etc — this commands having exhaustive knowledge of all sorts of machine-learning algorithms and sharp technical ability. As you might guess, these are not the easiest skills to pick up.

What is data science – the requisite skill set

Data science is multidisciplinary; the skill set of a data scientist lies at the intersection of 3 main competencies:

Mathematics Expertise

At the heart of deriving insight from data is the ability to view the data through a quantitative lens. There are textures, patterns, dimensions, and correlations in data that can be expressed numerically, and discovering inference from data becomes a brain teaser of mathematical techniques. Solutions to many business problems often involve building analytic models that are deeply grounded in the hard math theory, and being able to understand how models work is as important as knowing the process to build them (danger of building without knowing the math).

Also, a big misconception is that data science all about statistics. While statistics are important, it is not the only type of mathematics that should be well-understood by a data scientist. First, there are two main branches of statistics – classical statistics and Bayesian statistics. When most people refer to stats they are generally referring to classical stats, but knowledge of both types is very helpful. Furthermore, many inferential techniques and machine learning algorithms lean heavily on knowledge of linear algebra. For example, key data science processes like SVD (used for dimension reduction / latent variable discovery) are grounded in matrix mathematics and have much less to do with classical statistics. Overall, data scientists should have substantial breadth and depth in their knowledge of math.

Technology and Hacking

First, let's clarify on that we are not talking about hacking as in breaking into computers. We're referring to the tech/developer subculture meaning of hacking – i.e., creativity and ingenuity in using technical skills to build things and find clever solutions to problems.
Why is hacking ability important? Because data scientists absolutely need to leverage technology in order to wrangle enormous data sets and work with complex algorithms, and it requires using tools far more sophisticated than Excel. Examples of such tools are SQL, SAS, and R, all of which require technical/coding ability. With these high-performance tools, a true 'hacker' is a technical ninja, able to use ingenious problem solving ability to achieve mastery in data exploration – piecing together unstructured information and teasing out golden nuggets of insight.
Another way to define a hacker is as a solid algorithmic thinker – that is, having the ability to break down messy problems and recompose them in ways that are solvable. This is critical for good data science, especially since data scientists work intimately within existing algorithmic frameworks and oftentimes create their own algorithms to solve complex problems. Clarity of thinking within deeply-abstract mental maps of data dimensions and processing capability is how challenging problems get solved.

Strong Business Acumen

It is very important to note that a data scientist is first and foremost a strategy consultant. Data science teams have become invaluable resources within companies because by being able to learn from data in ways no one else can, they are extraordinarily well-positioned to figure out how to add substantial business value. But this means having a keen sense of how to dissect and approach business problems becomes as important as having a keen sense of how to approach algorithmic problems. Ultimately, the value doesn't come from numbers; it comes from strategic thinking based on those numbers.
Additionally, a core competency of data science is in using data to cogently tell a story. This means no data-puking; rather, presenting a cohesive narrative of problem and solution, using data insights as supporting pillars, that lead to guidance.

Clearly, get all the competencies right — math, technology, and business — and this is an incredibly potent combination. There is a reason why data scientists are well paid and probably will never have to worry about job security. Not a bad place to be to have the rarefied talents that big companies everywhere are trying to recruit.

What is a data scientist – curiosity and training

The Mindset

A defining personality trait of data scientists is they are deep thinkers with intense intellectual curiosity. Data science is all about being inquisitive – asking new questions, making new discoveries, and learning new things. Ask true data scientists what drives them in their job, and they will not say "money". The real motivator is being able to use their creativity and ingenuity to solve hard problems and constantly indulge in their curiosity. Deriving insight from data is not about getting an answer, it is about uncovering "truth" that lies hidden beneath the surface. Problem solving is not a task, but rather an intellectually-stimulating journey to a solution. There is passion for the work, and great satisfaction in taking on challenge.

Training

While solid math skills are necessary, there is a glaring misconception out there that you need a Ph.D in Statistics to become a legitimate data scientist. That view completely misses the point that data science is multidisciplinary; years of study in academia may not leave graduates with the correct set of experience and abilities to excel – i.e. a Ph.D statistician may not have nimble hacking skills or strategic business intuition to complete the trifecta.

As a matter of fact, data science is such a relatively new and rising discipline that universities have not caught up in developing comprehensive data science degree programs – meaning that no one can really claim to have "done all the schooling" to be become a data scientist. Where does much of the training come from? The unyielding intellectual curiosity that data scientists possess drive them to be passionate autodidacts, motivated to learn skills on their own with deep determination (Read: where can you find people like this?).

Analytics and machine learning – how it ties to data science

There are a slew of terms closely related to data science, that we hope to add some clarity around.

What is Analytics?

Analytics has risen quickly in popular business lingo over the past several years; the term is used loosely, but generally meant to describe critical thinking that is quantitative in nature. Technically, analytics is the "science of analysis" — put another way, the practice of analyzing information to make decisions.
Is "analytics" the same thing as data science? Depends on context. Sometimes it is synonymous with the definition of data science that we have described, and sometimes it represents something else. A data scientist using raw data to build a predictive behavior model falls into the scope of analytics. At the same time, a general business user interpreting pre-built dashboard reports (e.g. GA) is also in the realm of analytics, but does not cross into the specialized skill needed in data science. Analytics has come to have fairly broad meaning, though at the end of the day, the semantics don't matter much.

What is the difference between an analyst and a data scientist?

"Analyst" is somewhat of an ambiguous term that can represent many different types of roles (marketing analyst, operations analyst, portfolio analyst, financial analyst, etc). Is an analyst the same as a data scientist? We've discussed pretty strict canon around what is a data scientist – as an expert's role with requisite talents in math, technology, and strategy consulting. Let's just say that some analysts are definitely data-scientists-in-training. As represented in this visual, there is a place in the middle where the distinction can blur a bit.

Here are examples of growth from analyst to veritable data scientist:

An analyst who has previously only mastered Excel, learns how to dive into raw warehouse data using SQL and R
An analyst who previously only knew enough stats to report the results of an A/B test, gains the expertise to build a predictive model with latent variable analysis and cross-validation

Overall point is that moving in the direction of "data scientist" requires motivation to learn many new skills. Many companies have actually found success cultivating their own home-grown data scientists, by giving their analysts the resources and training to take their abilities to the next level.

What is Machine Learning?

Machine learning is a term that is closely tied to data science. Simply, it means being able to train systems or algorithms to derive insight from a data set. The actual types of machine learning are varied, ranging from regression models to support vector machines to neural nets, but it all centers around 'teaching' a computer to become very good at pattern recognition. Examples of machine learning include:

predictive models that can anticipate user behavior
clustering algorithms that mine for natural similarities between different customers
classification models that can recognize and filter out spam
recommendation engines that 'learn' about preferences at an individual level
neural nets that can recognize what image patterns look like

Data scientists work intimately with machine learning techniques to build algorithms that automate elements of their problem-solving. It is a requisite part of the data science toolset, needed to tackle some of the most complex data-driven projects.

What is Data Munging?

Raw data can be unstructured and messy, with information from disparate data sources and mismatched records. Data munging is a term to describe the important process of cleaning up data so that it is ready for data analysis and use in machine learning algorithms. This requires good pattern-recognition ability and clever hacking skills in order to merge and transform masses of raw information. Dirty data can obfuscate the 'truth' hidden in the data and completely mislead an analysis, thus, any data scientist must be skillful and nimble at data munging in order to have accurate data for deriving insight.

Final word

In any organization that wants to leverage big data to gain value, data science is the secret sauce. But, it is incredibly difficult to find experts who embody all the necessary talents – so if you manage to hire a data scientist, nurture them, keep them engaged, and give them autonomy to be their own architects in figuring out how to add value to the business. At the end of the day, data science is a capability that turns information to gold, and data scientists are uniquely positioned to be transformative figures within a company.

A Superb article from DataJobs.com

Let we Learn IT