๐ Data Science Riddle
Which Metric is best for imbalanced classification?
Which Metric is best for imbalanced classification?
Anonymous Quiz
20%
Accuracy
17%
Precision
19%
Recall
43%
F1-Score
๐ Data Science Riddle
A dataset has 20% missing values in a critical column. What's the most practical choice?
A dataset has 20% missing values in a critical column. What's the most practical choice?
Anonymous Quiz
5%
Drop all rows
49%
Fill with mean/median
42%
Use model-based imputation
5%
Ignore missing data
โค2
ML models donโt all think alike ๐ค
โ๏ธ Naive Bayes = probability
โ๏ธ KNN = proximity
โ๏ธ Discriminant Analysis = decision boundaries
Different paths, same goal: accurate classification.
Which one do you reach for first?
โ๏ธ Naive Bayes = probability
โ๏ธ KNN = proximity
โ๏ธ Discriminant Analysis = decision boundaries
Different paths, same goal: accurate classification.
Which one do you reach for first?
โค4
๐ Data Science Riddle
In a medical diagnosis project, what's more important?
In a medical diagnosis project, what's more important?
Anonymous Quiz
33%
High precision
14%
High recall
40%
High accuracy
13%
High F1-score
Important LLM Terms
๐น Transformer Architecture
๐น Attention Mechanism
๐น Pre-training
๐น Fine-tuning
๐น Parameters
๐น Self-Attention
๐น Embeddings
๐น Context Window
๐น Masked Language Modeling (MLM)
๐น Causal Language Modeling (CLM)
๐น Multi-Head Attention
๐น Tokenization
๐น Zero-Shot Learning
๐น Few-Shot Learning
๐น Transfer Learning
๐น Overfitting
๐น Inference
๐น Language Model Decoding
๐น Hallucination
๐น Latency
๐น Transformer Architecture
๐น Attention Mechanism
๐น Pre-training
๐น Fine-tuning
๐น Parameters
๐น Self-Attention
๐น Embeddings
๐น Context Window
๐น Masked Language Modeling (MLM)
๐น Causal Language Modeling (CLM)
๐น Multi-Head Attention
๐น Tokenization
๐น Zero-Shot Learning
๐น Few-Shot Learning
๐น Transfer Learning
๐น Overfitting
๐น Inference
๐น Language Model Decoding
๐น Hallucination
๐น Latency
โค9
Why is Kafka Called Kafkaโ
Hereโs a fun fact that surprises a lot of people.
The โKafkaโ you use for real-time data pipelines isโฆ named after the novelist Franz Kafka.
Why? Jay Kreps (the creator) once explained it simply:
- He liked the name.
- It sounded mysterious.
- And Kafka (the author) wrote a lot.
That last part is key.
Because Apache Kafka is all about writing: streams of events, logs, and data in motion.
So the name stuck.
Today, Millions of engineers across the globe talk about โKafkaโ every single dayโฆ and most donโt realize theyโre also invoking a 20th-century novelist.
It's funny how small choices like naming your project can shape how the world remembers it.
Hereโs a fun fact that surprises a lot of people.
The โKafkaโ you use for real-time data pipelines isโฆ named after the novelist Franz Kafka.
Why? Jay Kreps (the creator) once explained it simply:
- He liked the name.
- It sounded mysterious.
- And Kafka (the author) wrote a lot.
That last part is key.
Because Apache Kafka is all about writing: streams of events, logs, and data in motion.
So the name stuck.
Today, Millions of engineers across the globe talk about โKafkaโ every single dayโฆ and most donโt realize theyโre also invoking a 20th-century novelist.
It's funny how small choices like naming your project can shape how the world remembers it.
โค4๐1๐1
๐ Data Science Riddle
Why do CNNs use pooling layers?
Why do CNNs use pooling layers?
Anonymous Quiz
50%
Reduce dimensionality
16%
Increase non-linearity
14%
Normalize activations
20%
Improve learning rate
โค4
Data Analyst ๐ Data Engineer: Key Differences
Confused about the roles of a Data Analyst and Data Engineer? ๐ค Here's a breakdown:
๐จโ๐ป Data Analyst:
๐ฏ Role: Analyzes, interprets, & visualizes data to extract insights for business decisions.
๐ Best For: Those who enjoy finding patterns, trends, & actionable insights.
๐ Responsibilities:
๐งน Cleaning & organizing data.
๐ Using tools like Excel, Power BI, Tableau & SQL.
๐ Creating reports & dashboards.
๐ค Collaborating with business teams.
Skills: Analytical skills, SQL, Excel, reporting tools, statistical analysis, business intelligence.
โ Outcome: Guides decision-making in business, marketing, finance, etc.
โ๏ธ Data Engineer:
๐๏ธ Role: Designs, builds, & maintains data infrastructure.
๐ Best For: Those who enjoy technical data management & architecture for large-scale analysis.
๐ Responsibilities:
๐๏ธ Managing databases & data pipelines.
๐ Developing ETL processes.
๐ Ensuring data quality & security.
โ๏ธ Working with big data technologies like Hadoop, Spark, AWS, Azure & Google Cloud.
Skills: Python, Java, Scala, database management, big data tools, data architecture, cloud technologies.
โ Outcome: Creates infrastructure & pipelines for efficient data flow for analysis.
In short: Data Analysts extract insights, while Data Engineers build the systems for data storage, processing, & analysis. Data Analysts focus on business outcomes, while Data Engineers focus on the technical foundation.
Confused about the roles of a Data Analyst and Data Engineer? ๐ค Here's a breakdown:
๐จโ๐ป Data Analyst:
๐ฏ Role: Analyzes, interprets, & visualizes data to extract insights for business decisions.
๐ Best For: Those who enjoy finding patterns, trends, & actionable insights.
๐ Responsibilities:
๐งน Cleaning & organizing data.
๐ Using tools like Excel, Power BI, Tableau & SQL.
๐ Creating reports & dashboards.
๐ค Collaborating with business teams.
Skills: Analytical skills, SQL, Excel, reporting tools, statistical analysis, business intelligence.
โ Outcome: Guides decision-making in business, marketing, finance, etc.
โ๏ธ Data Engineer:
๐๏ธ Role: Designs, builds, & maintains data infrastructure.
๐ Best For: Those who enjoy technical data management & architecture for large-scale analysis.
๐ Responsibilities:
๐๏ธ Managing databases & data pipelines.
๐ Developing ETL processes.
๐ Ensuring data quality & security.
โ๏ธ Working with big data technologies like Hadoop, Spark, AWS, Azure & Google Cloud.
Skills: Python, Java, Scala, database management, big data tools, data architecture, cloud technologies.
โ Outcome: Creates infrastructure & pipelines for efficient data flow for analysis.
In short: Data Analysts extract insights, while Data Engineers build the systems for data storage, processing, & analysis. Data Analysts focus on business outcomes, while Data Engineers focus on the technical foundation.
โค5
Softmax vs Sigmoid Functions
Two of the most common activation functionsโฆ and two of the most misunderstood.
Sigmoid: squashes input into a range between 0 and 1. Perfect for binary classification (yes/no problems). Example: spam or not spam.
Softmax: takes a vector of numbers and turns them into probabilities that sum to 1. Perfect for multi-class classification (cat vs dog vs horse).
๐ Rule of thumb:
Binary task โ use Sigmoid.
Multi-class task โ use Softmax.
Simple, but if you get this wrong, your model will never make sense.
Two of the most common activation functionsโฆ and two of the most misunderstood.
Sigmoid: squashes input into a range between 0 and 1. Perfect for binary classification (yes/no problems). Example: spam or not spam.
Softmax: takes a vector of numbers and turns them into probabilities that sum to 1. Perfect for multi-class classification (cat vs dog vs horse).
๐ Rule of thumb:
Binary task โ use Sigmoid.
Multi-class task โ use Softmax.
Simple, but if you get this wrong, your model will never make sense.
โค2