We Are Oakleaf
What is Data Science?
Data science is a particular domain or field of study that uses scientific methods, processes, and algorithms to extract or extrapolate knowledge and insights from different types of data. This knowledge can then be used for a range of different purposes in order to help an organisation make decisions.
In the modern age, data science is becoming an essential part of practices for a vast range of industries and the organisations within them. This guide explains what data science is used for, who oversees its functioning within an organisation, and gives examples of data science applications.
The stages involved in data science
There are several stages involved in carrying out data science. Altogether, these are often known as the “data science lifecycle”. Experts differ in opinion in how many stages there are exactly, but most suggest that there are between 4 and 7 that can be considered distinct.
A typical 5-stage lifecycle might look like this:
- Capture: Data Collection or Acquisition, Data Entry, Signal Reception, Data Extraction. This is the stage that involves gathering raw structured and unstructured data.
- Maintain: Data Warehousing, Data Cleansing, Data Staging, Data Processing, Data Architecture. This covers taking raw data and putting it in a form that can be used.
- Process: Data Mining, Clustering/Classification, Data Modelling, Data Summarisation. Data scientists take the prepared data and examine its patterns, ranges, and biases to determine how useful it will be in predictive analysis.
- Analyse: Exploratory/Confirmatory, Predictive Analysis, Regression, Text Mining, Qualitative Analysis. This involves performing various analyses of the data.
- Communicate: Data Reporting, Data Visualisation, Business Intelligence (BI), Decision Making. In this stage, analysts prepare the analyses in an easily readable form, such as a chart, a graph, or a report.
What is data science used for?
In essence, data science is used to find patterns. These patterns assist in gaining knowledge about behaviours and processes. Data science can also be used to write algorithms that process large amounts of information quickly and efficiently, increase the security and privacy of sensitive data, and help to guide data-driven decision-making for business decisions.
Examples of data science applications
Data science has found its applications in an extremely wide range of industries, including:
- Healthcare
- Gaming
- Logistics
- Fraud detection
- Internet search
- Speech recognition
- Targeted advertising
- Airline route planning
Examples of use cases within industries
Entertainment
Data science allows streaming services to follow and evaluate what their consumers are watching, helping the creation of new TV and film content. Data-driven algorithms are also used to provide audiences with tailored suggestions based on their viewing history.
Finance
Banks and credit card firms collect and analyse data in order to detect fraudulent activity, manage financial risks on loans and credit lines, and assess client portfolios.
Healthcare
Machine learning models and other components of data science are used by hospitals and other healthcare providers in order to automate X-ray analysis. They also assist doctors in diagnosing illnesses and planning treatments based on previous outcomes.
Manufacturing
Data science applications in manufacturing may be related to supply chain management and distribution optimisation. It may also be utilised for predictive maintenance, predicting and anticipating probable equipment faults in facilities and equipment before they happen.
Data science vs business intelligence
The terms “data science” and “business intelligence” are both related to an organisation’s data and analysis of that data. However, the two differ in terms of focus.
BI is typically used as an umbrella term for the technology that enables data preparation, data mining, data management, and data visualisation. Tools and processes built for and utilised in BI allow end users to identify actionable information from raw data. This facilitates data-driven decision-making within organisations across various industries. Data science tools do overlap in much of this, but BI tools focus more on data collected in the past and the insights from them are more descriptive in nature. It uses data to understand what happened before, which helps to inform a particular course of action.
BI is also designed and equipped for static data (data which doesn’t change), which is also usually structured. While data science uses descriptive data, it normally uses this to determine predictive variables. These can then be used to make forecasts or to categorise data.
Data science and business intelligence are not used exclusively. Organisations wishing to stay up-to-date in their practices will utilise both in order to fully understand and extract value from the data they collect.
Data science tools
The field of data science utilises a range of tools and programming languages for a range of different purposes:
- SAS, Jupyter, R Studio, MATLAB, Excel, or RapidMiner for data analysis
- Informatica, Talend, or AWS Redshift for data warehousing
- Jupyter, Tableau, Cognos, or RAW for data visualisation
- Spark MLib, Mahout, or Azure ML Studio for machine learning
Prerequisites for data science
There are several technical concepts that a person should be familiar with before learning more about data science, or considering embarking on a career in data science:
Databases
This is an organised set of data stored and accessed electronically. A data scientist needs to understand how these work, how they should be managed, and how data can be extracted from them.
Machine learning
This is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn. This helps to gradually improve software applications’ accuracy at predicting outcomes without specifically being programmed to do this.
Modelling
This is a part of machine learning and involves identifying which algorithm is the most suitable for solving a given problem.
Programming
A certain level of programming is required in order to successfully execute a data science project. The most common programming languages for this are Python and R. The former is especially popular because it is considered easy to learn and it supports multiple libraries for data science and machine learning.
Statistics
Statistics are at the heart of data science. As such, a data scientist with a firm grasp of utilising statistics can help you to extract more intelligence and obtain more meaningful results.
Who oversees the data science process?
There are several job titles and positions involved in overseeing the data science process:
Business managers
Business managers are those in charge of overseeing the data science training method. Their primary responsibility is to collaborate with the data science team to characterise a problem and establish an analytical method for coming up with and implementing a solution. A data scientist may oversee the marketing, finance, or sales department, and report to the executive in charge of that department. The goal of the business manager is to make sure projects are completed on time by working closely with data scientists and IT managers.
Data science managers
Data science managers will primarily trace and supervise the working procedures of all the members of the team. They also help to manage and oversee the day-to-day activities. These managers are expected to be team-builders who are capable of blending project planning and monitoring with team growth.
IT managers
IT managers are primarily responsible for developing the infrastructure and architecture required to enable data science activities. They will constantly monitor and resource teams accordingly to ensure they are operating efficiently and safely. These managers may also be in charge of creating and maintaining IT environments for the data science team or teams within their organisation.
What is a data scientist?
A data scientist is a practitioner within the field of data science. They are an analytical data professional with the technical ability to handle complicated issues while also investigating solutions to questions that need to be answered. They are not necessarily responsible for all of the processes involved in the data science lifecycle, but they may be able to make recommendations about what kind of data will be useful or needed for any given circumstance.
In order to ensure work is carried out as expected, data scientists may partner with other professionals in similar positions to their own. This is commonly achieved with data analysts, particularly in relation to exploratory data analysis and data visualisation. However, a data scientist’s skillset is generally considered to be broader than the average data analyst. A data scientist will typically leverage programming languages, such as R and Python, to conduct more statistical interference and data visualisation.
Performing these tasks requires computer science and pure science skills beyond those usually expected of business analysts or data analysts. A data scientist will also need to be able to understand the specifics of the organisation they are working for, whether that organisation is based in manufacturing, commerce, or healthcare.
Responsibilities of a data scientist
In order to successfully carry out their job, a data scientist must be able to:
- Know enough about the organisation to ensure they are able to ask the right questions and identify business pain points
- Apply statistics and computer science, as well as business acumen, to data analysis
- Use a wide range of tools and techniques for preparing and extracting data
- Extract insights from big data using predictive analysis and AI, including machine learning models, natural language processing, and deep learning
- Write programs that automate data processing and calculations
- Tell and illustrate stories that clearly and straightforwardly convey the meaning of results to management, decision-makers, and stakeholders at every level of understanding
- Explain how the results will help to solve business problems
- Collaborate with other members of the data science team
&nsbp;
Why become a data scientist?
Data science is a fast-growing field and the demand for skilled individuals working within it is only expected to increase. Forbes even recently reported on a study from the US Bureau of Labor Statistics which projected a growth rate of nearly 28% in the number of jobs requiring data science skills by 2026. These statistics are only for the United States, but do demonstrate the kind of increase expected elsewhere in the world as well.
These numbers indicate that the profession is likely to be secure, and as the profession is in high demand it is likely that organisations will also be willing to offer competitive salaries.
 :
Other careers related to data science
The field of data science is broad. You will always have the opportunity to specialise in one aspect of it, which in turn means that you might decide to become any of the following:
Business intelligence developer
Also known as “BI developers”, a business intelligence developer is an individual responsible for designing strategies that allow organisations to find the information needed to make decisions quickly and efficiently.
Data analyst
Data analysts are responsible for visualising, transforming, and manipulating the data collected. They may also sometimes be responsible for web analytics tracking and A/B testing analysis.
Data architect
Data architects help to ensure that data is well-formatted and accessible for data scientists and analysts, while also improving the data pipelines’ performance. They also help to design and create new database systems.
Data engineer
Data engineers are responsible for designing, building, and maintaining data pipelines. They also test ecosystems and prepare them for the data scientists to run their algorithms.
Data storyteller
Data storytelling is about finding the narrative which best describes the data and developing ways to express that narrative.
Database administrator
A database administrator is responsible for monitoring the database and for keeping track of the data flow, creating backups and recoveries, and overseeing security by granting relevant permissions to employees.
Machine learning engineer
As well as designing and building machine learning systems, machine learning engineers will need to run tests (for example, A/B tests) while monitoring different systems’ performance and ability to function.
Machine learning scientist
A machine learning scientist researches new approaches to data manipulation, in order to design new algorithms. They are often part of the research and development (R&D) department and their work normally leads to the publication of research papers.
Submit an Application
If you are interested in making a move into the field of data science, or if your firm or organisation is in need of top talent to help fill a vacant role in data science, contact us or send us your CV today. Our specialist team will be ready and waiting to provide the advice and assistance you need. You will then be matched with the talent or the job role which meets all the criteria you have set out with us.
Resources
Our resources section provides information, guides and advice on the technology industry.