Telegram Web Link
Eigenvalues & Eigenvectors — Why PCA Actually Works

You’ve heard of PCA. But what’s really happening underneath?

PCA finds the directions (vectors) where your data varies the most.

Those directions are eigenvectors of the covariance matrix and the eigenvalues tell you how much variance each captures.

You’re basically rotating your data to find its “natural axes.”

PCA isn’t compression — it’s discovering how your data wants to be seen.
6👏2
📚 Data Science Riddle

Your spark job fails due to executor memory pressure. Most effective optimization?
Anonymous Quiz
14%
Broadcast variables
27%
Larger cluster
41%
More shuffle partitions
18%
Persist fewer objects
BigDataAnalytics-Lecture.pdf
10.2 MB
Notes on HDFS, MapReduce, YARN, Hadoop vs. traditional systems and much more... from Columbia University.
7
📚 Data Science Riddle

You fit a forecasting model and residuals show increasing variance. What is needed?
Anonymous Quiz
20%
Differnecing
48%
Smoothing
25%
Decomposition
7%
Box-Cox
👍31
4 Pillars of Data Science
🔥4
AI vs Machine Learning vs Deep Learning Vs Generative AI
4
📚 Data Science Riddle

A numeric feature has many repeated exact values with occasional jumps. What type of variable is this?
Anonymous Quiz
30%
Discrete
23%
Ordinal
16%
Continuous
31%
Interval
4
Machine Learning Notes.pdf
226.8 KB
A Stanford CS' Lecture note diving into supervised/unsupervised algorithms, neural networks, SVMs with math proofs and Python pseudocode.
6
Kafka 101
5
📚 Data Science Riddle

Two team members run the same notebook but get different results. What's the culprit?
Anonymous Quiz
7%
Loss Curves
13%
Batch shapes
57%
Random seeds
24%
Metric choice
The Simplest Machine Learning Cheatsheet
5👍1
📚 Data Science Riddle

A query runs slowly due to large table scans. What's the most targeted fix?
Anonymous Quiz
54%
Add indexes
17%
Use aliases
15%
Add DISTINCT
13%
Increase RAM
Everything You need To Know About Databricks
3
📚 Data Science Riddle

You want to detect extreme values visually in one plot. Which one is best?
Anonymous Quiz
53%
Box plot
30%
Heatmap
9%
Line chart
8%
Area plot
Mining of Massive Datasets (Leskovec, Stanford).pdf
2.9 MB
The Big Data bible from Stanford: MapReduce, Spark, recommendation systems, PageRank, locality-sensitive hashing, Large scale machine learning and mining social networks/streams all explained clearly with real algorithms you can code today. 500 pages of pure gold.
3
If you want to become a Data Scientist, this is the path to follow.
👍5
📚 Data Science Riddle

You want to prevent inconsistent data across environments. What helps most?
Anonymous Quiz
32%
Checkpoints
20%
Contracts
38%
Indexes
10%
Sharding
🛠️ Running Code in Jupyter Notebooks

Jupyter Notebooks let you write & run code interactively.
Here’s a quick guide to make your workflow smoother:

▶️ Kernel & Code Cells
- Each notebook is tied to a single kernel (e.g. IPython).
- Code cells are where you write and execute code.

⌨️ Useful Shortcuts
- Shift + Enter → run current cell, move to next
- Alt + Enter → run current cell, insert new one below
- Ctrl + Enter → run current cell, stay in place

🔄 Kernel Management
- Interrupt the kernel if code hangs.
- Restart kernel to reset memory & variables.

🖥️ Output Handling
- Results & errors appear directly under the cell.
- Long-running code outputs appear as they’re generated.
- Large outputs can be scrolled or collapsed for clarity.

💡 Pro Tip:
Always “Restart & Run All” before sharing or saving a notebook.
This ensures reproducibility and clean results.

👉   Explore
2
📚 Data Science Riddle

You need fast reads of small files. What storage options fits best?
Anonymous Quiz
24%
Distributed FS
11%
Cold storage
17%
Object Storage
48%
Local SSD
4
6 Must-Know Data Engineering Tools For Beginners
2
2025/12/05 03:41:10
Back to Top
HTML Embed Code: