Skip to main content

An Overview on Data Science

So before we get into what is data science let us first understand what is data actually, and how it is important for business, e-Commerce, for security, for identity of someone, even for scientific purpose or research and for even much more.

So data is nothing but a piece of information , the information that we are collecting could be anything it can be your date of birth, your body weight, your eyes or hair colour, your meal list, what you are searching in your mobile or computer, the places you visit, so we can say anything around you either connected to you or around you can be data. 

But if someone is novice he will ask, how all these things can be data ? Answer is Data is everywhere but what type of data is our need and which type of data is not our need makes the all difference.

Lets understand this clearly through an example- Suppose you want to do some shopping on Amazon, and you decided to buy a new mobile phone, you fixed the budget, then features that you want in the that you are looking for, also battery performance, camera quality, style but unfortunately you did not find a phone matching to your view point, so you decided to search another day, when you searched another day you find that Amazon recommending some phones based on your past search activities, and hopefully you find the type of phone that you are looking for. So how did Amazon find that the type of phone you are looking for ? Answer is when you searched for the phone and its specifications the amazon stored your search data and then it processed that data, and according to that it recommended you lots of phones of same specifications, so here data is the type of text that you have typed and what you are looking for.



What is Data Science ?

So basically data science is the science of data, in more details data science is a multidisciplinary subject which includes statistics, computer science, ML (machine Learning) and domain expertise to get knowledge  or insights of data. Though it is a multidisciplinary subject the end product of data science is to develop a data product.

Now if you are thinking what data product is, let me tell you by taking the same above example, so after when you tried to search for a phone of same specifications, according to your previous data amazon recommended you phones type you are looking for, so you can say that by taking your previous search data the e-commerce company changed that data into a data product which helped the user to get his product and by it company sold that product, so here we can that data product is nothing but a program used to solve problem.

If you are looking for definition then here it is :"Changing the data of a company into a product to solve a problem is called a data product"

since we have seen that data science is a combination of subjects, now lets take a close why these subjects are very important in the field of dat science and also what role they play,

Statistics

In data science statistics is very important because, you are dealing with a large size of data or Big Data, statistics helps to find mean, median, mode from the data and with these it helps the  data scientists to analyze and understand the data. Now there are three type statistics used in data science,

  • Descriptive Statistics : This type of statistics helps the data scientists to consolidate or summarize the data for further analysis.
  • Inferential Statistics : It helps to find the relationship between the samples of data collected.
  • Regression Analysis : This helps to find the relationship between multiple variables. 


Machine Learning
ML helps the data scientists to solve the complex calculations, an to develop algorithms and predict a variable.

Domain Expertise
Domain expertise is nothing but the knowledge of data set. Suppose if the data set is related to healthcare the healthcare is the domain expertise, if the data set is related to science the domain expertise is science, if the data set is related to business then domain expertise is business etc.

Data Visualization is also a part of data science, it has its own field called data analytics, in which you have to represent the data in graphical forms to get insights from the data.



Comments

Popular posts from this blog

Scrape PDF's using python

Hi, guys welcome to this blog post, i hope you guys are doing well. In this post i will discuss about how to scrape any specific text data or tables from PDF's and what kind of problems one can face while scrapping the PDF data.  The data trapped inside PDF are unstructured data and they can come from different sources like manually typed or system generated and depending on the source we have classified the PDF's into two categories  Simple or readable PDF's. Complex or scanned PDF's. Simple or readable PDF's: Simple PDF's can be of system generated or can come from data entry related sources and generally such kind of PDF's are less complicated and any kind of data can be easily extracted from such kind of PDFs.  Complex or scanned PDF's: On the other hand complex PDFs  or scanned PDFs are may come from system generated sources and generally are in scanned format and it is very difficult to handle the scanned PDFs and extracting data from it because so...

What is IoT (Internet Of Things)...???

This world has changed a lot, since the very beginning. Humans has made an significant development on this earth as compared to the other species. But it is not the thing that makes us so special on this earth, what makes us so special is the power of thinking that we have bestowed with. This is the thing that makes us so special on this planet earth. But are we humans really deserves this level of intelligence, although we have it naturally no doubt !!! For this... If we go back to the history and start digging our culture. how we survived from the great calamities, how we changed the world, how we developed our society, all the answers lies in it....!!! Biologically if we see every species on this is so well designed is to dominate the other species by making them on the top one but natures engineering is so perfect that it created a food chain, this food chain works so well that it maintains a perfect relationship between the apex ones and those species are at the bottom of the ...

All about data analysis and which programming language to choose to perform data analysis?

  What is data analysis ? Data analysis is the process of exploring, cleansing, transforming and modelling data in order to derive useful insight, supporting decision. Tools available for it ! There are two kinds of tools used in order to carry out data analysis: 1) Auto managed closed tools: These are the tools whose source code is not available, that is these are not open source. If you want to use these tools then you have to pay for them. Also, as these tools are not open source, if you want to learn these tools then you have to follow their documentation site. Though some auto managed tools have their free versions available.  Pros & Cons: Closed Source Expensive They are limited  Easy to learn Example: Tableau, Qlik View, Excel (Paid Version), Power BI (Paid Version), Zoho Analytics, SAS 2) Programming Languages: Then there are suitable programming languages which can derive the same result like auto managed closed tools.  Pros & Cons: These are open so...