Frequently Asked Questions:

What is Data Science? 

In 1994, BusinessWeek  released an article entitled Database Marketing surrounding the advancements in data availability and the monumental task of deciphering what to do with it all; "Companies are collecting mountains of information about you, crunching it to predict how likely you are to buy a product, and using that knowledge to craft a marketing message precisely calibrated to get you to do so... Many companies were too overwhelmed by the sheer quantity of data to do anything useful with the information". As computational capabilities grew throughout the 90's and into the new millennium, the methods for combining statistics and computer science to tackle this problem grew. In 2001, William S. Cleveland states "knowledge among computer scientists about how to think of and approach the analysis of data is limited, just as the knowledge of computing environments by statisticians is limited. A merger of knowledge bases would produce a powerful force for innovation." in his article titled Data Science: An Action Plan for Expanding the Technical Areas of the Field of Statistics. Thus was the beginning of today's data scientist. A thorough history of this new discipline is provided by Forbes within the article A Very Short History Of Data Science.


Who are Today's data scientists?

A data scientist is a jack of all trades who combines statistical and programming knowledge with a clients overall business and technological goals. Their backgrounds vary greatly and may include physics, business management, engineering, biostatistics, computer science, econometrics, or applied mathematics. Regardless of their background, a successful data scientist must have statistical intuition, creativity, grit, drive to continually learn and expand their toolbox, as well as great communication to link all of these skills together. 

What types of problems do data scientists solve?

Companies such as Facebook and General Motors look for data scientists who have expertise in predictive analytics, machine learning, statistical analysis and data mining. Their responsibilities are to work with engineering, research, and marketing teams to design and implement analyses to solve problems and identify opportunities. Data scientist's are problem solvers who, because of their broad background, can communicate with the IT department and the management team alike. 


According to one of the largest statistical software companies, SAS, "Big data is a term that describes the large volume of data- both structured and unstructured-that inundates a business on a day-to-day basis. But it's not the amount of data that's important. It's what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decision and strategic business moves"

What is the Benefit of Hiring A data scientist who is also a statistician?

A data scientist with a statistical background is able to quantify uncertainty that comes along with data science insights. This gives clients a competitive edge through understanding risk associated with both technical and business decisions. A background in statistics also gives a data scientist the critical eye necessary for identifying fatal flaws in sampling plans, experimental designs, and biases added during data management or model development stages. They will also have a larger tool belt and will determine when it is best to apply more traditional statistical methods versus a modern approach. 

What technology and tools do data scientists use?

What is the difference between a data ARCHITECT or engineer and a data scientist?

Data engineers, otherwise known as data architects, are responsible for the overall architecture of a database system. This includes developing and managing the database, transforming data, warehousing, and collecting data. They may also be responsible for integrating new management technologies into the existing structure. A data scientist may be involved in parts of these processes because they understand how the data will be used downstream. Generally a data engineer will supply needed data to a data scientist. Analytics Vidhya has provided a clear job comparison between data engineers, data scientists, and statisticians.