So Data Science they say and most of us wonder about it thinking of it as AI but practically looking in depth about, its a field involved with mostly extracting and exploiting insights from nothing but DATA,
And the Data it self might take a whole range of forms and formats, it can be texts, CSVs, Excels, Image/Video, and the workflows involving this follows a particular norm, as it goes from
And to get this Data Science party rolling as a starter the following might give an idea!
Learning About Programming Fundamentals that covers:
Common data structures (data types, lists, dictionaries, sets, and tuples), writing functions, logic, control flow, searching and sorting algorithms, object-oriented programming, and interacting with external libraries are all covered in Python.
SQL scripting: Using joins, aggregations, and subqueries to query databases.
Using Git for version control and GitHub for collaboration
Learning About Data Collection and Cleaning that covers:
Finding appropriate data to assist you solve your problem is an important element of data science job. Data can be gathered from a variety of acceptable sources, including scraping (if the website permits), APIs, databases, and publicly accessible repositories.
An analyst will frequently find themselves cleaning data frames, working with multi-dimensional arrays, performing descriptive/scientific computations, and manipulating data frames to aggregate data once they have data in hand.
Data that is clean and prepared for use in the “real world” is rarely available. Pandas and NumPy are the two libraries that are used to transform data from dirty to ready-to-analyze.
A good starting point would be to begin learning how to use libraries such as pandas and numpy.
Learning About Dynamic Data Analysis and Storytelling:
Data analysis and storytelling are the next areas to grasp. A Data Analyst’s main role is to extract insights from data and then communicate them to management in simple terms and visualizations.
The storytelling aspect necessitates data visualization expertise as well as great communication abilities.
Learning About Data Engineering that covers:
At large data-driven companies, data engineering supports R&D teams by making clean data available to research engineers and scientists. It is a separate field, and if you only want to focus on the statistical algorithm side of the problems, you got permission to scoop over this section!
Building an efficient data architecture, simplifying data processing, and sustaining large-scale data systems are all responsibilities of a data engineer.
Engineers develop ETL pipelines, automate file system chores, and optimize database processes to make them high-performance using Shell (CLI), SQL, and Python/Scala.
Another important talent is the ability to deploy these data structures, which necessitates knowledge of cloud service providers such as Amazon Web Services, Google Cloud Platform, Microsoft Azure, and others.
Learning About Applied Statistics and Mathematics that covers:
Data science relies heavily on statistical approaches. The majority of data science interviews focus on descriptive and inferential statistics.
People frequently begin coding machine learning algorithms without first gaining a thorough understanding of the statistical and mathematical principles that explain how the algorithms function. Of course, this isn’t the most efficient method.
Learning About Machine Learning and AI that covers:
You should now be ready to get started with the fancy ML algorithms after grilling and roasting yourself and going over all of the important aforementioned principles,
Learning can be divided into three categories:
Supervised Learning: Problems with regression and classification are included. Simple linear regression, multiple regression, polynomial regression, naive Bayes, logistic regression, KNNs, tree models, and ensemble models are all things to look at. Learn about the different types of evaluation measures.
Unsupervised Learning : is most commonly used for clustering and dimensionality reduction. Learn everything there is to know about PCA, K-means clustering, hierarchical clustering, and gaussian mixtures.
Reinforcement learning : aids in the creation of self-rewarding systems Learn how to use the TF-Agents library to maximize rewards, create Deep Q-networks, and more.
To conclude, this article is intended only to provide a high-level overview of the vast field of data science. You may do a deep dive into each of these subjects and construct a low-level concept-based approach for each of the categories, using resources such as tensor flow documentations as a starting point. please keep in mind that this article is based on a variety of sources and studies.