Skip to main content

All about data analysis and which programming language to choose to perform data analysis?

 What is data analysis ?

Data analysis is the process of exploring, cleansing, transforming and modelling data in order to derive useful insight, supporting decision.


Tools available for it !

There are two kinds of tools used in order to carry out data analysis:

1) Auto managed closed tools:

These are the tools whose source code is not available, that is these are not open source. If you want to use these tools then you have to pay for them. Also, as these tools are not open source, if you want to learn these tools then you have to follow their documentation site. Though some auto managed tools have their free versions available. 

Pros & Cons:

  • Closed Source
  • Expensive
  • They are limited 
  • Easy to learn

Example: Tableau, Qlik View, Excel (Paid Version), Power BI (Paid Version), Zoho Analytics, SAS

2) Programming Languages:

Then there are suitable programming languages which can derive the same result like auto managed closed tools. 

Pros & Cons:

  • These are open source
  • Learning curve is steep
  • Free 
  • Huge Community Support
Example: Python, R-Studio, Julia, SQL etc

It is always need to remember that for data analysis, depending on the type of task we want to do, type of data we have and type of analysis we want to perform, according to that we choose the programming language. 

For example, python is a very powerful tool both for programming & data analysis because it has huge and powerful libraries. Other reasons to use it :
  • Very simple and intuitive to learn.
  • Have huge community of supporters.
  • Its free and open source.
But python is not always the answer, suppose when you have a statistical file or data, then in that case in order have extreme performance, or to use statistical methods then in that case you can R-Studio cause it is very useful for statistical analysis.

Different Processes In Data Analysis

There are five processes used while performing data analysis.

  1. Data Extraction
  2. Data Cleaning
  3. Data Wrangling
  4. Data Analysis
  5. Action

1. Data Extraction

Data extraction is also known as connecting to the data source or getting the data from the dat source. These data sources can be :

  • SQL Databases
  • Centralized Databases
  • Distributed Databases
  • Data from web scrapping
  • File formats like CSV (Comma Separated Values), JSON Files, XML Files
  • Consulting APIs
  • Buying Data
2. Data Cleaning 

Data cleaning refers to the process of detecting, correcting, replacing, modifying or removing messy data from a record set, table, or database. It involves operations like,

  • Adding missing values 
  • Fill the empty values
  • Eliminate the duplicate values
  • Remove the irrelevant data or part of the data
  • Statistical sanitization 
  • Correct the incorrect 
3. Data Wrangling

In this process we need to rearrange and reshape the data for better analysis, transforming the fields, merging the tables, combining data from multiple sources.

It involves,
  • Hierarchical Data
  • Handling categorical data
  • Reshaping and transforming structure
  • Merging, combining & Joining data

4. Analysis

The process of analysis involves extracting patterns from the data that is now clean, we also perform statistical analysis in this step.

It involves,
  • Exploration
  • Building statistical model
  • Visualization & Representation
  • Correlation
  • Hypothesis Testing 
  • Statistical analysis
  • Reporting

5. Action

Now since our data is ready to use, now we can perform following things,
  • Building Machine Learning Models
  • Moving ML into production
  • Building ETL Pipelines
  • Live Dashboard and Reporting
  • Decision making and real life testing



DATA - ANALYSIS  Vs  DATA SCIENCE

  • Data Scientists have good grip over programming and mathematics and they use these skills to build ETL Pipelines & ML.
  • The Analysts on the other hand have good communication skills, creating better reports and have good story telling abilities.


How Python Data Analysts Think

If you are coming from a traditional data analysis place using tools like Excel and Tableau, you are probably used to have a constant visual reference of your data. All these tools are just point and click.

This works great for a small amount of data but it's less useful when the amount of record grow. It's impossible for humans to visually reference too much data and processing gets incredibly slow. In contrast when we work with python, we don't have a constant visual reference of the data we are working with . We just know that it is there. We know How it looks like , we know the main statistical properties of it, but were not constantly looking at it . These allow us to work with millions of records incredibly fast. This also means you can move your data analysis process from one computer to other computer. For example, to cloud with out much overhead. 




Comments

Popular posts from this blog

Scrape PDF's using python

Hi, guys welcome to this blog post, i hope you guys are doing well. In this post i will discuss about how to scrape any specific text data or tables from PDF's and what kind of problems one can face while scrapping the PDF data.  The data trapped inside PDF are unstructured data and they can come from different sources like manually typed or system generated and depending on the source we have classified the PDF's into two categories  Simple or readable PDF's. Complex or scanned PDF's. Simple or readable PDF's: Simple PDF's can be of system generated or can come from data entry related sources and generally such kind of PDF's are less complicated and any kind of data can be easily extracted from such kind of PDFs.  Complex or scanned PDF's: On the other hand complex PDFs  or scanned PDFs are may come from system generated sources and generally are in scanned format and it is very difficult to handle the scanned PDFs and extracting data from it because so...

What is IoT (Internet Of Things)...???

This world has changed a lot, since the very beginning. Humans has made an significant development on this earth as compared to the other species. But it is not the thing that makes us so special on this earth, what makes us so special is the power of thinking that we have bestowed with. This is the thing that makes us so special on this planet earth. But are we humans really deserves this level of intelligence, although we have it naturally no doubt !!! For this... If we go back to the history and start digging our culture. how we survived from the great calamities, how we changed the world, how we developed our society, all the answers lies in it....!!! Biologically if we see every species on this is so well designed is to dominate the other species by making them on the top one but natures engineering is so perfect that it created a food chain, this food chain works so well that it maintains a perfect relationship between the apex ones and those species are at the bottom of the ...