Telegram Web Link
πŸ“š Data Science Riddle

Your spark job fails due to executor memory pressure. Most effective optimization?
Anonymous Quiz
14%
Broadcast variables
27%
Larger cluster
41%
More shuffle partitions
18%
Persist fewer objects
BigDataAnalytics-Lecture.pdf
10.2 MB
Notes on HDFS, MapReduce, YARN, Hadoop vs. traditional systems and much more... from Columbia University.
❀7
πŸ“š Data Science Riddle

You fit a forecasting model and residuals show increasing variance. What is needed?
Anonymous Quiz
20%
Differnecing
48%
Smoothing
25%
Decomposition
7%
Box-Cox
πŸ‘3❀1
4 Pillars of Data Science
πŸ”₯4
AI vs Machine Learning vs Deep Learning Vs Generative AI
❀4
πŸ“š Data Science Riddle

A numeric feature has many repeated exact values with occasional jumps. What type of variable is this?
Anonymous Quiz
30%
Discrete
22%
Ordinal
16%
Continuous
32%
Interval
❀4
Machine Learning Notes.pdf
226.8 KB
A Stanford CS' Lecture note diving into supervised/unsupervised algorithms, neural networks, SVMs with math proofs and Python pseudocode.
❀6
Kafka 101
❀5
πŸ“š Data Science Riddle

Two team members run the same notebook but get different results. What's the culprit?
Anonymous Quiz
7%
Loss Curves
13%
Batch shapes
57%
Random seeds
23%
Metric choice
The Simplest Machine Learning Cheatsheet
❀5πŸ‘1
πŸ“š Data Science Riddle

A query runs slowly due to large table scans. What's the most targeted fix?
Anonymous Quiz
55%
Add indexes
17%
Use aliases
15%
Add DISTINCT
13%
Increase RAM
Everything You need To Know About Databricks
❀3
πŸ“š Data Science Riddle

You want to detect extreme values visually in one plot. Which one is best?
Anonymous Quiz
52%
Box plot
29%
Heatmap
10%
Line chart
8%
Area plot
Mining of Massive Datasets (Leskovec, Stanford).pdf
2.9 MB
The Big Data bible from Stanford: MapReduce, Spark, recommendation systems, PageRank, locality-sensitive hashing, Large scale machine learning and mining social networks/streams all explained clearly with real algorithms you can code today. 500 pages of pure gold.
❀3
If you want to become a Data Scientist, this is the path to follow.
πŸ‘5
πŸ“š Data Science Riddle

You want to prevent inconsistent data across environments. What helps most?
Anonymous Quiz
34%
Checkpoints
20%
Contracts
37%
Indexes
9%
Sharding
πŸ› οΈ Running Code in Jupyter Notebooks

Jupyter Notebooks let you write & run code interactively.
Here’s a quick guide to make your workflow smoother:

▢️ Kernel & Code Cells
- Each notebook is tied to a single kernel (e.g. IPython).
- Code cells are where you write and execute code.

⌨️ Useful Shortcuts
- Shift + Enter β†’ run current cell, move to next
- Alt + Enter β†’ run current cell, insert new one below
- Ctrl + Enter β†’ run current cell, stay in place

πŸ”„ Kernel Management
- Interrupt the kernel if code hangs.
- Restart kernel to reset memory & variables.

πŸ–₯️ Output Handling
- Results & errors appear directly under the cell.
- Long-running code outputs appear as they’re generated.
- Large outputs can be scrolled or collapsed for clarity.

πŸ’‘ Pro Tip:
Always β€œRestart & Run All” before sharing or saving a notebook.
This ensures reproducibility and clean results.

πŸ‘‰   Explore
❀2
πŸ“š Data Science Riddle

You need fast reads of small files. What storage options fits best?
Anonymous Quiz
24%
Distributed FS
10%
Cold storage
18%
Object Storage
49%
Local SSD
❀4
6 Must-Know Data Engineering Tools For Beginners
❀3
πŸ“š Data Science Riddle

A feature has low importance but domain experts insist it matters. What do you do?
Anonymous Quiz
25%
Encode it differently
18%
Scale it
13%
Drop the feature
44%
Check interaction effects
2025/12/06 07:44:16
Back to Top
HTML Embed Code: