Skip to main content

All about data analysis and which programming language to choose to perform data analysis?

 What is data analysis ?

Data analysis is the process of exploring, cleansing, transforming and modelling data in order to derive useful insight, supporting decision.


Tools available for it !

There are two kinds of tools used in order to carry out data analysis:

1) Auto managed closed tools:

These are the tools whose source code is not available, that is these are not open source. If you want to use these tools then you have to pay for them. Also, as these tools are not open source, if you want to learn these tools then you have to follow their documentation site. Though some auto managed tools have their free versions available. 

Pros & Cons:

  • Closed Source
  • Expensive
  • They are limited 
  • Easy to learn

Example: Tableau, Qlik View, Excel (Paid Version), Power BI (Paid Version), Zoho Analytics, SAS

2) Programming Languages:

Then there are suitable programming languages which can derive the same result like auto managed closed tools. 

Pros & Cons:

  • These are open source
  • Learning curve is steep
  • Free 
  • Huge Community Support
Example: Python, R-Studio, Julia, SQL etc

It is always need to remember that for data analysis, depending on the type of task we want to do, type of data we have and type of analysis we want to perform, according to that we choose the programming language. 

For example, python is a very powerful tool both for programming & data analysis because it has huge and powerful libraries. Other reasons to use it :
  • Very simple and intuitive to learn.
  • Have huge community of supporters.
  • Its free and open source.
But python is not always the answer, suppose when you have a statistical file or data, then in that case in order have extreme performance, or to use statistical methods then in that case you can R-Studio cause it is very useful for statistical analysis.

Different Processes In Data Analysis

There are five processes used while performing data analysis.

  1. Data Extraction
  2. Data Cleaning
  3. Data Wrangling
  4. Data Analysis
  5. Action

1. Data Extraction

Data extraction is also known as connecting to the data source or getting the data from the dat source. These data sources can be :

  • SQL Databases
  • Centralized Databases
  • Distributed Databases
  • Data from web scrapping
  • File formats like CSV (Comma Separated Values), JSON Files, XML Files
  • Consulting APIs
  • Buying Data
2. Data Cleaning 

Data cleaning refers to the process of detecting, correcting, replacing, modifying or removing messy data from a record set, table, or database. It involves operations like,

  • Adding missing values 
  • Fill the empty values
  • Eliminate the duplicate values
  • Remove the irrelevant data or part of the data
  • Statistical sanitization 
  • Correct the incorrect 
3. Data Wrangling

In this process we need to rearrange and reshape the data for better analysis, transforming the fields, merging the tables, combining data from multiple sources.

It involves,
  • Hierarchical Data
  • Handling categorical data
  • Reshaping and transforming structure
  • Merging, combining & Joining data

4. Analysis

The process of analysis involves extracting patterns from the data that is now clean, we also perform statistical analysis in this step.

It involves,
  • Exploration
  • Building statistical model
  • Visualization & Representation
  • Correlation
  • Hypothesis Testing 
  • Statistical analysis
  • Reporting

5. Action

Now since our data is ready to use, now we can perform following things,
  • Building Machine Learning Models
  • Moving ML into production
  • Building ETL Pipelines
  • Live Dashboard and Reporting
  • Decision making and real life testing



DATA - ANALYSIS  Vs  DATA SCIENCE

  • Data Scientists have good grip over programming and mathematics and they use these skills to build ETL Pipelines & ML.
  • The Analysts on the other hand have good communication skills, creating better reports and have good story telling abilities.


How Python Data Analysts Think

If you are coming from a traditional data analysis place using tools like Excel and Tableau, you are probably used to have a constant visual reference of your data. All these tools are just point and click.

This works great for a small amount of data but it's less useful when the amount of record grow. It's impossible for humans to visually reference too much data and processing gets incredibly slow. In contrast when we work with python, we don't have a constant visual reference of the data we are working with . We just know that it is there. We know How it looks like , we know the main statistical properties of it, but were not constantly looking at it . These allow us to work with millions of records incredibly fast. This also means you can move your data analysis process from one computer to other computer. For example, to cloud with out much overhead. 




Comments

Popular posts from this blog

How to Remove Dandruff: A Complete Guide

Dandruff can be an embarrassing and frustrating condition, but the good news is that it’s manageable. In this guide, we’ll explore what dandruff is, its causes, and the most effective ways to eliminate it. Whether you prefer home remedies or over-the-counter solutions, there’s something here for everyone. What is Dandruff? Dandruff is a common scalp condition characterized by flaking and itching. It occurs when the scalp sheds dead skin cells excessively, often due to dryness, sensitivity, or fungal infections. While it’s not harmful, it can be a nuisance and impact self-confidence. Causes of Dandruff Understanding the root causes of dandruff can help you choose the right treatment. Here are some common reasons: - Dry Skin: A dry scalp often leads to flaking, especially during winter months. - Sensitivity to Hair Products: Certain shampoos, conditioners, or styling products can irritate the scalp. - Fungal Infections: Malassezia, a type of yeast, thrives on oily scalps and can trigger ...

The Environmental Toll of Data Centers: Energy Consumption, Water Usage, and Carbon Emissions

Why Data Centers Are Danger To Environment ?     Data centers are critical for modern society because they serve as the backbone for modern infrastructure, to power modern business and technologies. They play crucial role to power modern internet, to host websites, applications and process customer data, storing huge volumes of data and powering e-commerce platforms. But with these great things there are some disadvantages are also related to data centers which makes them a threat to environment. Data centers helps in support cloud services, analytics, Storage, cloud computing, empowering streaming services like Amazon, Netflix, Facebook, You Tube, also AI and Machine learning rely on these data centers to process huge data to process business logics etc. But in order to do all these great tasks they need tremendous amount of energy and electricity to power networking, servers, storage equipment, cloud services and the infrastructure supporting these services. Data centers ae...

Revolutionizing Data Centers: Cutting-Edge Construction Techniques Reshaping the Digital Landscape

  In a world where a single data center can consume as much water in a day as a small city does in a year, the race to build more efficient digital fortresses is on. The humble data center, once a nondescript building humming with servers, has become a hotbed of architectural and engineering innovation. As these facilities evolve to meet the insatiable appetite for data processing and storage, they're reshaping the very foundations of construction technology. Modular Design: The Future of Data Center Architecture Gone are the days of painstakingly slow, brick-by-mortar builds. Today's data centers are rising from the ground at breakneck speeds, thanks to modular design. This isn't just a trend; it's a revolution, with up to 70% of facilities now being pieced together like high-tech Lego sets in factories before ever touching their final destination. The benefits are as stackable as the modules themselves: Speed demons: These prefab marvels sprint to completion 60% faste...