๐ Data Science Riddle
A numeric feature has many repeated exact values with occasional jumps. What type of variable is this?
A numeric feature has many repeated exact values with occasional jumps. What type of variable is this?
Anonymous Quiz
30%
Discrete
22%
Ordinal
16%
Continuous
32%
Interval
โค4
Machine Learning Notes.pdf
226.8 KB
A Stanford CS' Lecture note diving into supervised/unsupervised algorithms, neural networks, SVMs with math proofs and Python pseudocode.
โค6
๐ Data Science Riddle
Two team members run the same notebook but get different results. What's the culprit?
Two team members run the same notebook but get different results. What's the culprit?
Anonymous Quiz
6%
Loss Curves
12%
Batch shapes
59%
Random seeds
23%
Metric choice
๐ Data Science Riddle
A query runs slowly due to large table scans. What's the most targeted fix?
A query runs slowly due to large table scans. What's the most targeted fix?
Anonymous Quiz
54%
Add indexes
17%
Use aliases
16%
Add DISTINCT
13%
Increase RAM
๐ Data Science Riddle
You want to detect extreme values visually in one plot. Which one is best?
You want to detect extreme values visually in one plot. Which one is best?
Anonymous Quiz
53%
Box plot
30%
Heatmap
9%
Line chart
8%
Area plot
Mining of Massive Datasets (Leskovec, Stanford).pdf
2.9 MB
The Big Data bible from Stanford: MapReduce, Spark, recommendation systems, PageRank, locality-sensitive hashing, Large scale machine learning and mining social networks/streams all explained clearly with real algorithms you can code today. 500 pages of pure gold.
โค3
๐ Data Science Riddle
You want to prevent inconsistent data across environments. What helps most?
You want to prevent inconsistent data across environments. What helps most?
Anonymous Quiz
32%
Checkpoints
20%
Contracts
38%
Indexes
10%
Sharding
๐ ๏ธ Running Code in Jupyter Notebooks
Jupyter Notebooks let you write & run code interactively.
Hereโs a quick guide to make your workflow smoother:
โถ๏ธ Kernel & Code Cells
- Each notebook is tied to a single kernel (e.g. IPython).
- Code cells are where you write and execute code.
โจ๏ธ Useful Shortcuts
- Shift + Enter โ run current cell, move to next
- Alt + Enter โ run current cell, insert new one below
- Ctrl + Enter โ run current cell, stay in place
๐ Kernel Management
- Interrupt the kernel if code hangs.
- Restart kernel to reset memory & variables.
๐ฅ๏ธ Output Handling
- Results & errors appear directly under the cell.
- Long-running code outputs appear as theyโre generated.
- Large outputs can be scrolled or collapsed for clarity.
๐ก Pro Tip:
Always โRestart & Run Allโ before sharing or saving a notebook.
This ensures reproducibility and clean results.
๐ Explore
Jupyter Notebooks let you write & run code interactively.
Hereโs a quick guide to make your workflow smoother:
โถ๏ธ Kernel & Code Cells
- Each notebook is tied to a single kernel (e.g. IPython).
- Code cells are where you write and execute code.
โจ๏ธ Useful Shortcuts
- Shift + Enter โ run current cell, move to next
- Alt + Enter โ run current cell, insert new one below
- Ctrl + Enter โ run current cell, stay in place
๐ Kernel Management
- Interrupt the kernel if code hangs.
- Restart kernel to reset memory & variables.
๐ฅ๏ธ Output Handling
- Results & errors appear directly under the cell.
- Long-running code outputs appear as theyโre generated.
- Large outputs can be scrolled or collapsed for clarity.
๐ก Pro Tip:
Always โRestart & Run Allโ before sharing or saving a notebook.
This ensures reproducibility and clean results.
๐ Explore
โค2
๐ Data Science Riddle
You need fast reads of small files. What storage options fits best?
You need fast reads of small files. What storage options fits best?
Anonymous Quiz
23%
Distributed FS
8%
Cold storage
20%
Object Storage
48%
Local SSD
โค4
๐ Data Science Riddle
A feature has low importance but domain experts insist it matters. What do you do?
A feature has low importance but domain experts insist it matters. What do you do?
Anonymous Quiz
27%
Encode it differently
21%
Scale it
13%
Drop the feature
39%
Check interaction effects
Advanced Data Science on Spark.pdf
1.8 MB
Covers Spark for ML, graph processing (GraphFrames), and integration with Hadoop from Stanford University.
โค3
๐ Data Science Riddle
Your estimate has high variance. Best fix?
Your estimate has high variance. Best fix?
Anonymous Quiz
58%
Increase sample size
27%
Change confidence level
9%
Reduce bin count
6%
Switch to bootstrap
The Difference Between Model Accuracy and Business Accuracy
A model can be 95% accurateโฆ
yet deliver 0% business value.
Whyโ
Because data science metrics โ business metrics.
๐ Examples:
- A fraud model catches tiny fraud but misses large ones
- A churn model predicts already obvious churners
- A recommendation model boosts clicks but reduces revenue
Always align ML metrics with business KPIs.
Otherwise, your โgreat modelโ is just a great illusion.
A model can be 95% accurateโฆ
yet deliver 0% business value.
Whyโ
Because data science metrics โ business metrics.
๐ Examples:
- A fraud model catches tiny fraud but misses large ones
- A churn model predicts already obvious churners
- A recommendation model boosts clicks but reduces revenue
Always align ML metrics with business KPIs.
Otherwise, your โgreat modelโ is just a great illusion.
โค4
๐ Data Science Riddle
Your model's loss fluctuates but doesn't decrease overall. What's the most likely issue?
Your model's loss fluctuates but doesn't decrease overall. What's the most likely issue?
Anonymous Quiz
25%
Gradient exploding
39%
Weak regularization
25%
Small batch size
11%
Slow optimizer
โ
Complete AI (Artificial Intelligence) Roadmap ๐ค๐
1๏ธโฃ Basics of AI
๐น What is AI?
๐น Types: Narrow AI vs General AI
๐น AI vs ML vs DL
๐น Real-world applications
2๏ธโฃ Python for AI
๐น Python syntax & libraries
๐น NumPy, Pandas for data handling
๐น Matplotlib, Seaborn for visualization
3๏ธโฃ Math Foundation
๐น Linear Algebra: Vectors, Matrices
๐น Probability & Statistics
๐น Calculus basics
๐น Optimization techniques
4๏ธโฃ Machine Learning (ML)
๐น Supervised vs Unsupervised
๐น Regression, Classification, Clustering
๐น Scikit-learn for ML
๐น Model evaluation metrics
5๏ธโฃ Deep Learning (DL)
๐น Neural Networks basics
๐น Activation functions, backpropagation
๐น TensorFlow / PyTorch
๐น CNNs, RNNs, LSTMs
6๏ธโฃ NLP (Natural Language Processing)
๐น Text cleaning & tokenization
๐น Word embeddings (Word2Vec, GloVe)
๐น Transformers & BERT
๐น Chatbots & summarization
7๏ธโฃ Computer Vision
๐น Image processing basics
๐น OpenCV for CV tasks
๐น Object detection, image classification
๐น CNN architectures (ResNet, YOLO)
8๏ธโฃ Model Deployment
๐น Streamlit / Flask APIs
๐น Docker for containerization
๐น Deploy on cloud: Render, Hugging Face, AWS
9๏ธโฃ Tools & Ecosystem
๐น Git & GitHub
๐น Jupyter Notebooks
๐น DVC, MLflow (for tracking models)
๐ Build AI Projects
๐น Chatbot, Face recognition
๐น Spam classifier, Stock prediction
๐น Language translator, Object detector
1๏ธโฃ Basics of AI
๐น What is AI?
๐น Types: Narrow AI vs General AI
๐น AI vs ML vs DL
๐น Real-world applications
2๏ธโฃ Python for AI
๐น Python syntax & libraries
๐น NumPy, Pandas for data handling
๐น Matplotlib, Seaborn for visualization
3๏ธโฃ Math Foundation
๐น Linear Algebra: Vectors, Matrices
๐น Probability & Statistics
๐น Calculus basics
๐น Optimization techniques
4๏ธโฃ Machine Learning (ML)
๐น Supervised vs Unsupervised
๐น Regression, Classification, Clustering
๐น Scikit-learn for ML
๐น Model evaluation metrics
5๏ธโฃ Deep Learning (DL)
๐น Neural Networks basics
๐น Activation functions, backpropagation
๐น TensorFlow / PyTorch
๐น CNNs, RNNs, LSTMs
6๏ธโฃ NLP (Natural Language Processing)
๐น Text cleaning & tokenization
๐น Word embeddings (Word2Vec, GloVe)
๐น Transformers & BERT
๐น Chatbots & summarization
7๏ธโฃ Computer Vision
๐น Image processing basics
๐น OpenCV for CV tasks
๐น Object detection, image classification
๐น CNN architectures (ResNet, YOLO)
8๏ธโฃ Model Deployment
๐น Streamlit / Flask APIs
๐น Docker for containerization
๐น Deploy on cloud: Render, Hugging Face, AWS
9๏ธโฃ Tools & Ecosystem
๐น Git & GitHub
๐น Jupyter Notebooks
๐น DVC, MLflow (for tracking models)
๐ Build AI Projects
๐น Chatbot, Face recognition
๐น Spam classifier, Stock prediction
๐น Language translator, Object detector
โค1
