Telegram Web Link
Where to find Data for Machine Learning

High quality data is key for building useful machine learning models. Models learn their behaviour from data. So, finding the right data is a big part of the work to build machine learning into your products.

This article gives a concise explanation on finding the right data for your models.

https://towardsdatascience.com/where-to-find-data-for-machine-learning-e375e2a515c8
18 Best Data Science PodCasts
SQL Free Resources
Looking to learn SQL for free? Here is a curated list of websites you can use to upgeade your SQL skill level or practice writing queries. Remember SQL is a necessary skill to have in your toolkit as a data professional.

1. W3 Schools

https://w3schools.com/sql

2. SQL Zoo

http://sqlzoo.net

3. SQLBolt

http://sqlbolt.com

4. Khan Academy

https://khanacademy.org/computing/computer-programming/sql

5. FreeCode Camp

https://youtu.be/HXV3zeQKqGY

To Practice what you have learned and build your skill at hte same time , you can use these:

6. Hacker Rank

https://hackerrank.com/domains/sql

7. SQL Murder Mystery Game

https://mystery.knightlab.com

#datascience #SQL


Join @datascience_bds for more cool data science materials.
*This channel belongs to @bigdataspecialist group
Machine Learning with Python: Zero to GBMs

This is a practical and beginner-friendly introduction to supervised machine learning, decision trees, and gradient boosting using Python. This is a self-paced course where you can:

👌Watch hands-on coding-focused video tutorials
👌Practice coding with cloud Jupyter notebooks
👌Build an end-to-end real-world course project
👌Earn a verified certificate of accomplishment
👌Interact with a global community of learners
👌You will solve 2 coding assignments & build a course project where you'll train ML models using a large real-world datasets

Link: https://jovian.ai/learn/machine-learning-with-python-zero-to-gbms
Text Classification with TensorFlow

This is an intermediate-level Python course taught by MIT grad student Kylie Ying. You can code along at home in your browser.

You'll use TensorFlow to train Neural Networks, visualize a diabetes dataset, and perform Text Classification on wine reviews. (2 hour YouTube course)

Link: https://www.freecodecamp.org/news/text-classification-tensorflow/
Introduction to Machine Learning, IIT Kharagpur

🆓 Free Online Course
💻 44 Lecture Videos
🏃‍♂️ Self paced
Teacher 👨‍🏫 : Prof. S. Sarkar

🔗 https://nptel.ac.in/courses/106105152
The Scikit-Learn Guide

Looking to improve your knowledge on machine Learning ALgorithms, there's no better place to start from than to check the sklearn documentation

There is alot of interesting information you can gain there

https://scikit-learn.org/stable/
Want to make sure your Spark applications reach the best performance?

We invite you to our Dynamic Talks #90 | Spark performance mastery!
Date and time: July 20, 6:30 pm (CET)

The speaker is Iñigo San Aniceto Orbegozo, Staff Big Data Engineer at Grid Dynamics.

💻 Participation is free but registration is required: https://forms.gle/UVvfWG5LeZAXTuNQ6

More about event: https://fb.me/e/1U9Vq4epw
Just wanted to share this 👆 here as well in case somebody is interested.
**A List Of Free Data Science Tutorials**

🔘Python for Data Science - Great Learning
Rating ⭐️: 4.2 out of 5
Duration : 1 hour 55 mins on-demand video
Students 👨‍🏫: 25,605
Created by: Bharani Akella
🔗 Course link

🔘A - Z Python crash course for Data Science 2021
Rating ⭐️: 4.4 out of 5
Duration : 2 hours on-demand video
Students 👨‍🏫: 7,012
Created by: Abb Selec
🔗 Course link

🔘An Athlete’s Guide To Data Science
Rating ⭐️: 3.0 out of 5
Duration : I hour 1 min on-demand video
Students 👨‍🏫: 1,975
Created by: Jon pierre Jones
🔗 Course link

🔘NumPy for Data Science Beginners: 2021
Rating ⭐️: 4.0 out of 5
Duration : I hour 51 mins on-demand video
Students 👨‍🏫: 11,535
Created by: Abb Selec
🔗 Course link

🔘Learn Data Science With R Part 1 of 10
Rating ⭐️: 4.1 out of 5
Duration : 8 hours 42 mins on-demand video
Students 👨‍🏫: 32,824
Created by: Ram Reddy
🔗 Course link

🔘Data Science with Analogies, Algorithms and Solved Problems
Rating ⭐️: 4.1 out of 5
Duration : 1 hour 19 mins on-demand video
Students 👨‍🏫: 15,706
Created by: Ajay Dhruv, Neha Mayekar, Shreya Pattewar, Shubham Patil
🔗 Course link

🔘Data Science, Machine Learning, Data Analysis, Python & R
Rating ⭐️: 3.8 out of 5
Duration : 8 hours 7 mins on-demand video
Students 👨‍🏫: 89,564
Created by: DATAhill Solutions Srinivas Reddy
🔗 Course link

🔘Intro to Data for Data Science
Rating ⭐️: 4.6 out of 5
Duration : 1 hour 1 min on-demand video
Students 👨‍🏫: 9,727
Created by: Matthew Renze
🔗 Course link

🔘Learn NumPy Fundamentals (Python Library for Data Science)
Rating ⭐️: 4.3 out of 5
Duration : 1 hour 49 mins on-demand video
Students 👨‍🏫: 27,038
Created by: Derrick Sherrill
🔗 Course link

#datascience #datanalysis #python #numpy #pandas #machinelearning

Join @datascience_bds for more cool data science materials.
*This channel belongs to @bigdataspecialist group
Tools Regularly Used By Data Scientist
Fundamentals of Data Visualization

A primer on making informative and compelling figures
Author: Claus . O . Wike
Book Link; Read Me!
A Guide to Understanding Different Types of Data

Hey There😃!!
Do you know the different formats your data can be in and how to identify them?😌
Here's a guide that can help you😉

Structured Data : It is in a standardized format, has a well-defined structure, complies to a data model, follows a persistent order, and is easily accessed by humans and programs. This data type is generally stored in a database. Normally in a table or number of tables.
Examples: Data from surveys, different sensors, point-of-sale details, and financial information

Unstructured Data: It does not conform to any other model and has no easily identifiable structure. There is no organization to it and it cannot be stored in any logical way. Unstructured data does not fit into any database structure, has no rules or format, and it cannot be easily used by programs.
Examples: raw videos from surveillance cameras, reports, file shared with corporate documents, images, and memos.

Semi Structured Data: It is not in a relational database, does not conform to a data model, but has some elements of structure. It cannot be stored in rows and columns or databases. This data contains metadata and tags which helps it to be grouped appropriately and describes the way it is stored. Semi-structured data is organized hierarchically, although the entities within that group may not have the same properties or attributes. It is difficult to automate and manage and is hard for programs to access.
Examples: wikipedia pages with links, collection of scientific papers in JSON format with authors, emails, zipped files, web files, and binary executables.
Different Data Sources and How They Are Collected

1) Company Data Sources:
Web Events, Survey Data,
Customer Data,
Logistics Data and Financial Transactions.

2) Open Data Sources:
Public Data APIs,
Public Records
APIs request data over the internet. Interesting API's include:
Twitter, Wikipedia, Yahoo Finance, Google Maps etc
Public records data can be collected by international organisations like World Bank, UN, WTO

3) National Statistical Offices:
Censuses
Surveys

4) Government Agencies:
Weather Data
Environment Data
Population Data
Model Evaluation Metrics
Interesting Terminologies to Understand in Machine Learning

Bag of words
: A technique used to extract features from the text. It counts how many times a word appears in a document (corpus), and then transforms that information into a dataset.

A categorical label has a discrete set of possible values, such as "is a cat" and "is not a cat."

Clustering. Unsupervised learning task that helps to determine if there are any naturally occurring groupings in the data.

CNN: Convolutional Neural Networks (CNN) represent nested filters over grid-organized data. They are by far the most commonly used type of model when processing images.

A continuous (regression) label does not have a discrete set of possible values, which means possibly an unlimited number of possibilities.

Data vectorization: A process that converts non-numeric data into a numerical format so that it can be used by a machine learning model.

Discrete: A term taken from statistics referring to an outcome taking on only a finite number of values (such as days of the week).

FFNN: The most straightforward way of structuring a neural network, the Feed Forward Neural Network (FFNN) structures neurons in a series of layers, with each neuron in a layer containing weights to all neurons in the previous layer.

Hyperparameters are settings on the model which are not changed during training but can affect how quickly or how reliably the model trains, such as the number of clusters the model should identify.

Log loss is used to calculate how uncertain your model is about the predictions it is generating.

Hyperplane: A mathematical term for a surface that contains more than two planes.

Impute is a common term referring to different statistical tools which can be used to calculate missing values from your dataset.

Label refers to data that already contains the solution.

Loss function is used to codify the model’s distance from this goal

Machine learning, or ML, is a modern software development technique that enables computers to solve problems by using examples of real-world data.

Model accuracy is the fraction of predictions a model gets right. Discrete: A term taken from statistics referring to an outcome taking on only a finite number of values (such as days of the week). Continuous: Floating-point values with an infinite range of possible values. The opposite of categorical or discrete values, which take on a limited number of possible values.

Model inference is when the trained model is used to generate predictions.

Model is an extremely generic program, made specific by the data used to train it.

Model parameters are settings or configurations the training algorithm can update to change how the model behaves.

Model training algorithms work through an interactive process where the current model iteration is analyzed to determine what changes can be made to get closer to the goal. Those changes are made and the iteration continues until the model is evaluated to meet the goals.

Neural networks: a collection of very simple models connected together. These simple models are called neurons. The connections between these models are trainable model parameters called weights.

Outliers are data points that are significantly different from others in the same sample.

Plane: A mathematical term for a flat surface (like a piece of paper) on which two points can be joined by a straight line.

Regression: A common task in supervised machine learning.

In reinforcement learning, the algorithm figures out which actions to take in a situation to maximize a reward (in the form of a number) on the way to reaching a specific goal.

RNN/LSTM: Recurrent Neural Networks (RNN) and the related Long Short-Term Memory (LSTM) model types are structured to effectively represent for loops in traditional computing, collecting state while iterating over some object. They can be used for processing sequences of data.
2024/10/04 13:22:46
Back to Top
HTML Embed Code: