Skip to content
January 7, 2025 by @KareimGazer

Becoming A Data Analyst

An Introduction & Career Outlook - Know Your Path

"Businesses today recognize the untapped value in data and data analytics as a crucial factor for business competitiveness. To drive their data and analytics initiatives, companies are hiring and upskilling people. They're expanding their teams and creating centers of excellence to set up a multipronged data and analytics practice in their organizations." - The Power Of Data To Transform Your Business, a Forrester report


business leaders realized that data holds the key to competitive advantage and now organizations are using data to uncover opportunities and differentiate themselves from their competititors for example:

  • Identifying patterns in financial data to detect fraud, predict sales, and gain market insights.
  • Mining social media posts to discover what customers really like
  • Analyzing customer behavior for personalizing offers (recommendation systems)

Data Analytics is a higly sought after and well-paid profession. You can chooce between mastering data analytis as a career path (associate, senior, lead) or leveraging it as a stepping stone to branch out to other data professions such as:

  • Data Science
  • Data Engineering
  • Business Analytics (BI)
  • Business Intelligence Analytics (BA)

which might be a better choice these days given that AI agents can now do most basic tasks done by data analysts. this does not mean a replacement. instead, new capabilities for the analysts means stronger competitions and the need for didication and skills enhancements

"The constant increase in data processing speeds and bandwidth, the nonstop invention of new tools for creating, sharing, and consuming data, and the steady addition of new data creators and consumers around the world, ensure that data growth continues unabated. Data begets more data in a constant virtuous cycle." - Forbes 2020 Report

Modern Data Ecosystem

Data integrated from disparate sources

Data is available in a variety of structured and unstructured datasets. residing in:

  • text
  • images
  • videos
  • click streams
  • user conversations
  • social media platforms
  • real-time events that stream data like IOT devices sensors data, or users metrics from large streaming platforms like YouTube, and Twitch
  • legacy databases in large enterprises
  • data sourced from professional data providers and agencies like ready APIs meaning data that you can get on demand from trusted sources

As you can see the possibilites are endless and The sources have never before been so diverse and dynamic but first let's differentiate between data (raw material), and information (insights). you can consider all of the above as raw data sources. but with the data only we can't arrive to conclusions. And here comes the Fundamentals of data analysis: data gathering, wrangling, mining, analysis and visualization that help us gain insights (information) from our data.

Reliability, security, and integrity of the data being acquired are some of the challenges you work through at this stage. Once the raw data is in a common place, it needs to get organized, cleaned up, and optimized for access by end users. The data will also need to conform to compliances and standards enforced in the organization. For example, conforming to guidelines that regulate the storage and use of personal data, such as health, biometrics or household data in the case of IoT devices. Adhering to master data tables within the organization to ensure standardization of master data across all applications and systems of an organization is another example. The key challenges at this stage could involve data management and working with data repositories that provide high availability, flexibility, accessibility, and security.

Needs Different types of analysis and skills to generate insights

When you're working with so many different sources of data, the first step is to pull a copy of the data from the original sources into a data repository. you're only looking at acquiring the data you need.

working with data formats, sources, and interfaces through which this data can be pulled in.

Has Active stakeholders to collaborate and act on insights generated

  • applications
  • programmers
  • analysts
  • data scientists

all are pulling these data from the enterprise data repository (warehouse). The key challenge at this stage is to provide the appropriate interface for each stakeholder. For example, data analysts may need the raw data to work with. Business stakeholders may need reports and dashboards. Applications may need custom APIs to pull this data.

Has Many Tools, applications, and infrastructure

to store, process, and disseminate data as required. Note the influence of the new technologies that are shaping today's data ecosystem, like cloud computing, machine learning, and big data. with cloud technologies, every company today has access to limitless storage, high-performance computing, open source technologies, machine learning, and the latest tools and libraries. Data scientists are creating predictive models by training machine learning algorithms on huge amount of data. Today, we're dealing with datasets that are so massive and so varied that traditional tools and analysis methods are no longer adequate, paving the way for new tools and techniques like data lakes and data warehouses.

Career Paths

Data engineers

are people who develop and maintain data architectures and make data available for business analysts. Data engineers work to extract, integrate, and organize data from different sources. Clean transform and prepare data design, store and manage data in data repositories (warehouses).

They enable data to be accessible in formats and systems that the various business applications as well as stakeholders like data analysts and data scientists can use. A data engineer must have strong knowledge of programming, cloud, and in depth understanding of relational and non-relational databases.

Data analyst

Translates data and numbers into plain language (insights), so organizations can make decisions. data analysts inspect and clean data to get insights, identify correlations, find patterns. They apply statistical methods to analyze and visualize data to interpret and present the findings.

Analysts are the people who answer questions such as:

  • Are the users search experiences generally good or bad with the search functionality on our site?
  • What is the popular perception of people regarding our rebranding initiatives?
  • is there a correlation between sales, and one product and another?

Data analysts require good knowledge of spreadsheets, writing SQL queries, and using statistical tools to create charts and dashboards.

Modern data analysts also need to have programming skills and strong analytical & storytelling skills.

Data scientist

Analyzes data for actionable insights and build machine learning or deep learning models that train on past data to create predictive models.

Data scientists are people who answer questions such as, How many new social media followers am I likely to get next month, or what percentage of my customers am I likely to lose to competition in the next quarter, or is this financial transaction unusual for this customer?

Data scientists need knowledge of mathematics, statistics, python, databases, building ML models, and domain knowledge.

Business analyst

leverages the work of data analysts and data scientists to look at possible implications for their business and the actions they need to take or recommend.

BI analyst

does the same except. Their focus is on the market forces and external influences that shape their business. They provide business intelligent solutions by organizing and monitoring data on different business functions and exploring that data to extract insights and actionables that improve business performance.

To summarize, in simple terms: data engineer converts raw data into usable data. Data analyst uses this data to generate insights. Data scientist uses data analytics and data engineering to build machine learning models to predict the future using data from the past. business analyst and business intelligence analyst use these insights and predictions to drive decisions that benefit and grow their business.

It's not uncommon for data professionals to start their career in one of these roles and transition to another by supplementing their skills.

So, What is Data Analysis?

Data Analysis is about collecting information around you to take decision. just like getting the weather report (reporting results like a data analyst) and deciding (taking decision like a stakeholder) what to wear. But, it has a technical term now and is used everywhere. in fact, the weather data was collected and analized by data analysts to generate the weather reports that is given to the broadcasters (stakeholders & you) so you can take your decision.

Data Analysis is the process of gathering, cleaning, analyzing and mining (getting value) data, interpreting results (getting insights), and reporting the findings (making reports).

With data analysis we find patterns within the data and correlations between different data points.

It is through these patterns and correlations that insights are generated, and conclusions are drawn.

Data analysis helps businesses understand their past performance and informs their decision-making persons (stakeholders) for future actions.

Data analysis helps businesses validate a course of action before committing to it. Saving valuable time and resources and also ensuring greater success.

There are four primary types of data analysis, each with a different goal and place in the data analysis process:

  1. Descriptive Analytics helps answer questions about what happened over a given period of time by summarizing past data and presenting the findings to stakeholders. It helps provide essential insights into past events. For example, tracking past performance based on the organization's key performance indicators or cash flow analysis.

  2. Diagnostic analytics helps answer the question. Why did it happen? It takes the insights from descriptive analytics to dig deeper to find the cause of the outcome. For example, a sudden change in traffic to a website without an obvious cause or an increase in sales in a region where there has been no change in marketing.

  3. Predictive analytics helps answer the question, What will happen next? Historical data and trends are used to predict future outcomes. Some of the areas in which businesses apply predictive analysis are risk assessment and sales forecasts. It's important to note that the purpose of predictive analytics is not what will happen in the future, but its objective is to forecast what might happen in the future. All predictions are probabilistic in nature.

  4. Prescriptive Analytics helps answer the question, What should be done about it? By analyzing past decisions and events, the likelihood of different outcomes. It is estimated on the basis of which a course of action is decided. Self-driving cars are a good example of Prescriptive Analytics. They analyze the environment to make decisions regarding speed, changing lanes, which route to take, etc. Or airlines automatically adjusting ticket prices based on customer demand. Gas prices, the weather or traffic on connecting routes.

The Process

Now let's look at some of the key steps in any data analysis process:

Understanding the problem and desired result: Data analysis begins with understanding the problem that needs to be solved and the desired outcome that needs to be achieved. Where you are and where you want to be needs to be clearly defined before the analysis process can begin. Setting a clear metric. This stage of the process includes deciding what will be measured. For example, number of product X sold in a region and how it will be measured, for example. In a quarter or during a festival season, gathering data once you know what you're going to measure and how you're going to measure it, you identify the data you require, the data sources you need to pull this data from, and the best tools for the job.

Cleaning data: Having gathered the data, the next step is to fix quality issues in the data that could affect the accuracy of the analysis. This is a critical step because the accuracy of the analysis can only be ensured if the data is clean. You will clean the data for missing or incomplete values and outliers. For example, a customer demographics data in which the age field has a value of 150 is an outlier. You will also standardize the data coming in from multiple sources.

Analyzing and mining data: Once the data is clean, you will extract and analyze the data from different perspectives. You may need to manipulate your data in several different ways to understand the trends, identify correlations and find patterns and variations.

Interpreting results: After analyzing your data and possibly conducting further research, which can be an iterative loop, it's time to interpret your results. As you interpret your results, you need to evaluate if your analysis is defendable against objections, and if there are any limitations or circumstances under which your analysis may not hold true.

Presenting your findings: Ultimately, the goal of any analysis is to impact decision making. The ability to communicate and present your findings in clear and impactful ways is as important a part of the data analysis process as is the analysis itself. Reports, dashboards, charts, graphs, maps, case studies are just some of the ways in which you can present your data.

So, in summary. There are four primary types of Data Analysis:

  • Descriptive Analytics: that helps decode “What happened?”

  • Diagnostic Analytics: that helps us understand “Why it happened?”

  • Predictive Analytics: that analyzes historical data and trends to suggest “What will happen next?”

  • Prescriptive Analytics: that prescribes “What should be done next?”

The Data Analysis process involves:

  • Developing an understanding of the problem and the desired outcome.

  • Setting a clear metric for evaluating outcomes.

  • Gathering, cleaning, analyzing, and mining data to interpret results.

  • Communicating the findings in ways that impact decision-making.

Analyzing Data

Whenever we collect data from a sample, there are two different types of statistics we can run. Descriptive statistics to summarize information about the sample; and Inferential statistics to make inferences or generalizations about the broader population of the sample.

Descriptive Statistics

Enables you to present data in a meaningful way allowing simpler interpretation of the data. Data is described using summary charts, tables, and graphs without any attempts to draw conclusions about the population from which the sample is taken. The objective is to make it easier to understand and visualize raw data without making conclusions regarding any hypotheses that were made.

For example, describing the English test scores in a specific class of 25 students. We record the test scores of all students, calculate the summary statistics, and produce a graph. Some of the common measures of Descriptive Statistical Analysis include Central Tendency, Dispersion, and Skewness:

  • Central Tendency (locating the center of a data sample): Some of the common measures of central tendency include mean, median, and mode. looking at your dataset through these values can help you get a clearer understanding of your dataset.

  • Dispersion is the measure of variability in a dataset. Common measures of statistical dispersion are Variance, Standard Deviation, and Range.

Inferential Statistics

Inferential statistics takes data from a sample to make inferences about the larger population from which the sample was drawn. Using methods of inferential statistics you can draw generalizations that apply the results of the sample to the population as a whole.

Some common methodologies of Inferential Statistics include :

  • Hypothesis Testing — For example, for studying the effectiveness of a vaccine by comparing outcomes in a control group, hypothesis tests can tell you whether the efficacy of a vaccine observed in a control group is likely to exist in the population as well.

  • Confidence Intervals incorporate the uncertainty and sample error to create a range of values the actual population value is like to fall within.

  • Regression Analysis incorporates hypothesis tests that help determine whether the relationships observed in the sample data actually exist in the population rather than just the sample.

Statistics form the core of data mining (see later) by: Providing measures and methodologies necessary for data mining; and Identifying patterns that help identify differences between random noise and significant findings.

*Note: Machine Learning algorithms works by giving you this statistical model through iterating through the training set in which called statistical learning. By using this model you can generalize (predict) across new examples that represents new samples from the same population.

Data Mining

The process of extracting knowledge from data. it has several techniques:

  • Classification is a technique that classifies attributes into target categories, for example, classifying customers into low, medium, or high spenders based on how much they earn.

  • Clustering is similar to classification, but involves grouping data into clusters so they can be treated as groups. For example, clustering customers based on geographic.

  • anomaly or outlier detection is a technique that helps find patterns and data that are not normal or unexpected. For example, spikes in the usage of a credit card that can flag possible misuse.

  • Association rule mining is a technique that helps establish our relationship between two data events. For example, the purchase of a laptop being frequently accompanied by the purchase of a cooling pad.

  • Sequential patterns is the technique that traces a series of events that take place in a sequence. For example, tracing a customer shopping trail from the time they log into an online store to the time they log out.

  • Affinity grouping is a technique used to discover Co occurrence in relationships. This technique is widely used in on line stores for cross selling and up selling their products by recommending products to people based on the purchase history of other people who purchased the same item.

  • Decision trees help build classification models in the form of a tree structure with multiple branches, where each branch represents a probable occurrence. This technique helps to build a clear understanding of the relationship between input and output.

  • Regression is a technique that helps identify the nature of the relationship between two variables, which could be causal or correlational. For example, based on factors such as location and covered area, a regression model could be used to predict the value of a house.

Data mining essentially helps separate the noise from the real information and helps businesses focus their energies on only what is relevant.

Glossary

Mode is the value that occurs most frequently in a set of observations. For example, if the most common score in this group of 25 students is 72%, then that is the mode for this dataset.

Dispersion is the measure of variability in a dataset. Common measures of statistical dispersion are Variance, Standard Deviation, and Range.

Variance defines how far away the data points fall from the center, that is, the distribution of values. When a distribution has lower variability, the values in a dataset are more consistent. However, when the variability is higher, the data points are more dissimilar, and extreme values become more likely. Understanding variability can help you grasp the likelihood of an event happening.

Standard deviation tells you how tightly your data is clustered around the mean.

Skewness is the measure of whether the distribution of values is symmetrical around a central value or skewed left or right. Skewed data can affect which types of analyses are valid to perform.

Career Advice

Responsibilities of a Data Analyst

  • Acquiring data from data sources
  • Creating queries to extract required data from databases
  • Filtering, cleaning, standardizing, and reorganizing data in preparation for data analysis
  • Using statistical tools to interpret data sets
  • Using statistical techniques to identify patterns and correlations in data
  • Analyzing patterns in complex data sets and interpreting trends
  • Preparing reports and charts that effectively communicate trends and patterns
  • Creating appropriate documentation to define and demonstrate the steps of the data analysis process.

skills that are valuable for a Data Analyst

  • Expertise in using spreadsheets such as Microsoft Excel or Google Sheets

  • Proficiency in statistical analysis and visualization tools and software such as Microsoft Power BI, SAS, and Tableau

  • Proficiency in at least one of the programming languages such as R, Python, and in some cases C++, Java, and MATLAB

  • Good knowledge of SQL, and ability to work with data in relational and NoSQL databases

  • The ability to access and extract data from data repositories such as data warehouses, data lakes, and data pipelines

  • Familiarity with Big Data processing tools such as Hadoop, Hive, and Spark.

  • Proficiency in Statistics to help you analyze your data, validate your analysis, and identify fallacies and logical errors

  • Analytical skills that help you research and interpret data, theorize, and make forecasts.

  • Problem-solving skills, because ultimately, the end-goal of all data analysis is to solve problems.

  • Probing skills that are essential for the discovery process, that is, for understanding a problem from the perspective of varied stakeholders and users—because the data analysis process really begins with a clear articulation of the problem statement and desired outcome.

  • Data Visualization skills that help you decide on the techniques and tools that present your findings effectively based on your audience, type of data, context, and end-goal of your analysis.

  • Project Management skills to manage the process, people, dependencies, and timelines of the initiative.

  • ability to work collaboratively with business and cross-functional teams

  • communicate effectively to report and present your findings

  • tell a compelling and convincing story

  • gather support and buy-in for your work.

  • Above all, being curious, is at the heart of data analysis.

You will also hear data analysis practitioners talk about intuition as a must-have quality. It’s essential to note that intuition, is the ability to have a sense of the future based on pattern recognition and past experiences.

What Employers look for in a Data Analyst

  • Analysts with integrity: who prefer getting the right answer than meeting the deadline. remember conclusions drive actions and these actions need to be correct because actions have consequences.
  • Clear communication: If you do the most brilliant analysis in the world, but you can't communicate it to external stakeholders, then it's really not worth anything
  • Fluency with numbers, understand complex analysis
  • Trouble-shooters , Problem-solvers, Think outside the box
  • Ability to understand AB tests
  • Strong Programming skills including Python, R, SQL, and pick up technical skills quickly
  • growth mindset and wilingness to learn
  • Ability to work with data in different formats, Being dynamic and adaptable
  • Detail-oriented
  • Over-achievers
  • what kind of data will solve the given problem

Generative AI

generative models generate entirely new data points, opening a realm of possibilities for data analytics.

Generative AI Apps for Data Analytics

  • generate synthetic data sets for testing and development solving the problem of data availability

  • enhance data visualizations by transforming data from one form to another like text to images

  • fill in missing data points.

  • automate and enhance data cleaning, normalization, transforming processes, and streamlining the path from raw data to actionable insights.

  • When it comes to querying databases. It can assist in formulating complex queries, optimizing database interactions, and adapting to evolving data structures.

  • empowers Q&A models, enabling users to interact with data naturally, ask questions in plain language, and receive meaningful insights in return.

  • simplifies and accelerates dashboard creation, offering dynamic layouts, insightful widgets, and personalized user experiences for more effective data communication.

  • enhances storytelling in data analytics. It can generate narrative elements, highlight key insights, and provide a cohesive structure transforming raw data into compelling narratives.

  • automating exploratory data analysis