๐ Data Science Riddle
A query runs slowly due to large table scans. What's the most targeted fix?
A query runs slowly due to large table scans. What's the most targeted fix?
Anonymous Quiz
54%
Add indexes
17%
Use aliases
16%
Add DISTINCT
12%
Increase RAM
๐ Data Science Riddle
You want to detect extreme values visually in one plot. Which one is best?
You want to detect extreme values visually in one plot. Which one is best?
Anonymous Quiz
54%
Box plot
30%
Heatmap
9%
Line chart
7%
Area plot
Mining of Massive Datasets (Leskovec, Stanford).pdf
2.9 MB
The Big Data bible from Stanford: MapReduce, Spark, recommendation systems, PageRank, locality-sensitive hashing, Large scale machine learning and mining social networks/streams all explained clearly with real algorithms you can code today. 500 pages of pure gold.
โค3
๐ Data Science Riddle
You want to prevent inconsistent data across environments. What helps most?
You want to prevent inconsistent data across environments. What helps most?
Anonymous Quiz
30%
Checkpoints
19%
Contracts
40%
Indexes
10%
Sharding
๐ ๏ธ Running Code in Jupyter Notebooks
Jupyter Notebooks let you write & run code interactively.
Hereโs a quick guide to make your workflow smoother:
โถ๏ธ Kernel & Code Cells
- Each notebook is tied to a single kernel (e.g. IPython).
- Code cells are where you write and execute code.
โจ๏ธ Useful Shortcuts
- Shift + Enter โ run current cell, move to next
- Alt + Enter โ run current cell, insert new one below
- Ctrl + Enter โ run current cell, stay in place
๐ Kernel Management
- Interrupt the kernel if code hangs.
- Restart kernel to reset memory & variables.
๐ฅ๏ธ Output Handling
- Results & errors appear directly under the cell.
- Long-running code outputs appear as theyโre generated.
- Large outputs can be scrolled or collapsed for clarity.
๐ก Pro Tip:
Always โRestart & Run Allโ before sharing or saving a notebook.
This ensures reproducibility and clean results.
๐ Explore
Jupyter Notebooks let you write & run code interactively.
Hereโs a quick guide to make your workflow smoother:
โถ๏ธ Kernel & Code Cells
- Each notebook is tied to a single kernel (e.g. IPython).
- Code cells are where you write and execute code.
โจ๏ธ Useful Shortcuts
- Shift + Enter โ run current cell, move to next
- Alt + Enter โ run current cell, insert new one below
- Ctrl + Enter โ run current cell, stay in place
๐ Kernel Management
- Interrupt the kernel if code hangs.
- Restart kernel to reset memory & variables.
๐ฅ๏ธ Output Handling
- Results & errors appear directly under the cell.
- Long-running code outputs appear as theyโre generated.
- Large outputs can be scrolled or collapsed for clarity.
๐ก Pro Tip:
Always โRestart & Run Allโ before sharing or saving a notebook.
This ensures reproducibility and clean results.
๐ Explore
โค2
๐ Data Science Riddle
You need fast reads of small files. What storage options fits best?
You need fast reads of small files. What storage options fits best?
Anonymous Quiz
23%
Distributed FS
10%
Cold storage
21%
Object Storage
46%
Local SSD
โค4
๐ Data Science Riddle
A feature has low importance but domain experts insist it matters. What do you do?
A feature has low importance but domain experts insist it matters. What do you do?
Anonymous Quiz
27%
Encode it differently
21%
Scale it
12%
Drop the feature
40%
Check interaction effects
Advanced Data Science on Spark.pdf
1.8 MB
Covers Spark for ML, graph processing (GraphFrames), and integration with Hadoop from Stanford University.
โค4
๐ Data Science Riddle
Your estimate has high variance. Best fix?
Your estimate has high variance. Best fix?
Anonymous Quiz
55%
Increase sample size
27%
Change confidence level
10%
Reduce bin count
8%
Switch to bootstrap
The Difference Between Model Accuracy and Business Accuracy
A model can be 95% accurateโฆ
yet deliver 0% business value.
Whyโ
Because data science metrics โ business metrics.
๐ Examples:
- A fraud model catches tiny fraud but misses large ones
- A churn model predicts already obvious churners
- A recommendation model boosts clicks but reduces revenue
Always align ML metrics with business KPIs.
Otherwise, your โgreat modelโ is just a great illusion.
A model can be 95% accurateโฆ
yet deliver 0% business value.
Whyโ
Because data science metrics โ business metrics.
๐ Examples:
- A fraud model catches tiny fraud but misses large ones
- A churn model predicts already obvious churners
- A recommendation model boosts clicks but reduces revenue
Always align ML metrics with business KPIs.
Otherwise, your โgreat modelโ is just a great illusion.
โค4
๐ Data Science Riddle
Your model's loss fluctuates but doesn't decrease overall. What's the most likely issue?
Your model's loss fluctuates but doesn't decrease overall. What's the most likely issue?
Anonymous Quiz
28%
Gradient exploding
38%
Weak regularization
22%
Small batch size
12%
Slow optimizer
โ
Complete AI (Artificial Intelligence) Roadmap ๐ค๐
1๏ธโฃ Basics of AI
๐น What is AI?
๐น Types: Narrow AI vs General AI
๐น AI vs ML vs DL
๐น Real-world applications
2๏ธโฃ Python for AI
๐น Python syntax & libraries
๐น NumPy, Pandas for data handling
๐น Matplotlib, Seaborn for visualization
3๏ธโฃ Math Foundation
๐น Linear Algebra: Vectors, Matrices
๐น Probability & Statistics
๐น Calculus basics
๐น Optimization techniques
4๏ธโฃ Machine Learning (ML)
๐น Supervised vs Unsupervised
๐น Regression, Classification, Clustering
๐น Scikit-learn for ML
๐น Model evaluation metrics
5๏ธโฃ Deep Learning (DL)
๐น Neural Networks basics
๐น Activation functions, backpropagation
๐น TensorFlow / PyTorch
๐น CNNs, RNNs, LSTMs
6๏ธโฃ NLP (Natural Language Processing)
๐น Text cleaning & tokenization
๐น Word embeddings (Word2Vec, GloVe)
๐น Transformers & BERT
๐น Chatbots & summarization
7๏ธโฃ Computer Vision
๐น Image processing basics
๐น OpenCV for CV tasks
๐น Object detection, image classification
๐น CNN architectures (ResNet, YOLO)
8๏ธโฃ Model Deployment
๐น Streamlit / Flask APIs
๐น Docker for containerization
๐น Deploy on cloud: Render, Hugging Face, AWS
9๏ธโฃ Tools & Ecosystem
๐น Git & GitHub
๐น Jupyter Notebooks
๐น DVC, MLflow (for tracking models)
๐ Build AI Projects
๐น Chatbot, Face recognition
๐น Spam classifier, Stock prediction
๐น Language translator, Object detector
1๏ธโฃ Basics of AI
๐น What is AI?
๐น Types: Narrow AI vs General AI
๐น AI vs ML vs DL
๐น Real-world applications
2๏ธโฃ Python for AI
๐น Python syntax & libraries
๐น NumPy, Pandas for data handling
๐น Matplotlib, Seaborn for visualization
3๏ธโฃ Math Foundation
๐น Linear Algebra: Vectors, Matrices
๐น Probability & Statistics
๐น Calculus basics
๐น Optimization techniques
4๏ธโฃ Machine Learning (ML)
๐น Supervised vs Unsupervised
๐น Regression, Classification, Clustering
๐น Scikit-learn for ML
๐น Model evaluation metrics
5๏ธโฃ Deep Learning (DL)
๐น Neural Networks basics
๐น Activation functions, backpropagation
๐น TensorFlow / PyTorch
๐น CNNs, RNNs, LSTMs
6๏ธโฃ NLP (Natural Language Processing)
๐น Text cleaning & tokenization
๐น Word embeddings (Word2Vec, GloVe)
๐น Transformers & BERT
๐น Chatbots & summarization
7๏ธโฃ Computer Vision
๐น Image processing basics
๐น OpenCV for CV tasks
๐น Object detection, image classification
๐น CNN architectures (ResNet, YOLO)
8๏ธโฃ Model Deployment
๐น Streamlit / Flask APIs
๐น Docker for containerization
๐น Deploy on cloud: Render, Hugging Face, AWS
9๏ธโฃ Tools & Ecosystem
๐น Git & GitHub
๐น Jupyter Notebooks
๐น DVC, MLflow (for tracking models)
๐ Build AI Projects
๐น Chatbot, Face recognition
๐น Spam classifier, Stock prediction
๐น Language translator, Object detector
โค2๐1
๐ Data Science Riddle - CNN Kernels
Which convolution increases channel depth but not spatial size?
Which convolution increases channel depth but not spatial size?
Anonymous Quiz
6%
1x1 convolution
31%
3x3 convolution
47%
Depthwise convolution
16%
Transposed convolution
โค1
Normalization vs Standardization: Why Theyโre Not the Same
People treat these two as interchangeable. theyโre not.
๐ Normalization (Min-Max scaling):
Compresses values to 0โ1.
Useful when magnitude matters (pixel values, distances).
๐ Standardization (Z-score):
Centers data around mean=0, std=1.
Useful when distribution shape matters (linear/logistic regression, PCA).
๐ Key idea:
Normalization preserves relative proportions.
Standardization preserves statistical structure.
Pick the wrong one, and your modelโs geometry becomes distorted.
People treat these two as interchangeable. theyโre not.
๐ Normalization (Min-Max scaling):
Compresses values to 0โ1.
Useful when magnitude matters (pixel values, distances).
๐ Standardization (Z-score):
Centers data around mean=0, std=1.
Useful when distribution shape matters (linear/logistic regression, PCA).
๐ Key idea:
Normalization preserves relative proportions.
Standardization preserves statistical structure.
Pick the wrong one, and your modelโs geometry becomes distorted.
โค4๐1
Hey everyone ๐
Tomorrow we are kicking off a new short & free series called:
๐ Data Importing Series ๐
Weโll go through all the real ways to pull data into Python:
โ CSV, Excel, JSON and more
โ Databases & SQL databases
โ APIs, Google Sheets, even PDFs & web scraping
Short lessons, ready-to-copy code, zero boring theory.
First part drops tomorrow.
Turn on notifications so you donโt miss it ๐
Whoโs excited? React with a ๐ฅ if you are.
Tomorrow we are kicking off a new short & free series called:
๐ Data Importing Series ๐
Weโll go through all the real ways to pull data into Python:
โ CSV, Excel, JSON and more
โ Databases & SQL databases
โ APIs, Google Sheets, even PDFs & web scraping
Short lessons, ready-to-copy code, zero boring theory.
First part drops tomorrow.
Turn on notifications so you donโt miss it ๐
Whoโs excited? React with a ๐ฅ if you are.
๐ฅ10โค2
Data science/ML/AI
Hey everyone ๐ Tomorrow we are kicking off a new short & free series called: ๐ Data Importing Series ๐ Weโll go through all the real ways to pull data into Python: โ CSV, Excel, JSON and more โ Databases & SQL databases โ APIs, Google Sheets, even PDFsโฆ
Click Me Load More a CSV file in Python
CSV stands for Comma-Separated Values the most common format for tabular data everywhere.
With pandas, turning a CSV into a powerful, queryable DataFrame takes just a few clear lines.
Next up โก๏ธ Click Me Load More an Excel file in Python
๐Join @datascience_bds for more
Part of the @bigdataspecialist family
CSV stands for Comma-Separated Values the most common format for tabular data everywhere.
With pandas, turning a CSV into a powerful, queryable DataFrame takes just a few clear lines.
# Import the pandas library
import pandas as pd
# Specify the path to your CSV file
filename = "data.csv"
# Read the CSV file into a DataFrame
df = pd.read_csv(filename)
#Checking the first five rows
df.head()
Next up โก๏ธ Click Me Load More an Excel file in Python
๐Join @datascience_bds for more
Part of the @bigdataspecialist family
โค2
