What is data analysis ?
Data analysis is the process of exploring, cleansing, transforming and modelling data in order to derive useful insight, supporting decision.
Tools available for it !
There are two kinds of tools used in order to carry out data analysis:
1) Auto managed closed tools:
These are the tools whose source code is not available, that is these are not open source. If you want to use these tools then you have to pay for them. Also, as these tools are not open source, if you want to learn these tools then you have to follow their documentation site. Though some auto managed tools have their free versions available.
Pros & Cons:
- Closed Source
- Expensive
- They are limited
- Easy to learn
Example: Tableau, Qlik View, Excel (Paid Version), Power BI (Paid Version), Zoho Analytics, SAS
2) Programming Languages:
Then there are suitable programming languages which can derive the same result like auto managed closed tools.
Pros & Cons:
- These are open source
- Learning curve is steep
- Free
- Huge Community Support
- Very simple and intuitive to learn.
- Have huge community of supporters.
- Its free and open source.
- Data Extraction
- Data Cleaning
- Data Wrangling
- Data Analysis
- Action
Data extraction is also known as connecting to the data source or getting the data from the dat source. These data sources can be :
- SQL Databases
- Centralized Databases
- Distributed Databases
- Data from web scrapping
- File formats like CSV (Comma Separated Values), JSON Files, XML Files
- Consulting APIs
- Buying Data
Data cleaning refers to the process of detecting, correcting, replacing, modifying or removing messy data from a record set, table, or database. It involves operations like,
- Adding missing values
- Fill the empty values
- Eliminate the duplicate values
- Remove the irrelevant data or part of the data
- Statistical sanitization
- Correct the incorrect
- Hierarchical Data
- Handling categorical data
- Reshaping and transforming structure
- Merging, combining & Joining data
- Exploration
- Building statistical model
- Visualization & Representation
- Correlation
- Hypothesis Testing
- Statistical analysis
- Reporting
- Building Machine Learning Models
- Moving ML into production
- Building ETL Pipelines
- Live Dashboard and Reporting
- Decision making and real life testing
- Data Scientists have good grip over programming and mathematics and they use these skills to build ETL Pipelines & ML.
- The Analysts on the other hand have good communication skills, creating better reports and have good story telling abilities.
Comments
Post a Comment