📚 Data Science Riddle 
Model Accuracy improves after dropping half the features. Why?
  Model Accuracy improves after dropping half the features. Why?
Anonymous Quiz
    12%
    Model became smaller
      
    71%
    Overfitting reduced
      
    12%
    Data size shrank
      
    6%
    Training faster
      
    ❤3
  Understanding the Forecast Statistics and Four Moments (4P).pdf
    181.8 KB
  Statistical Moments (M1, M2) for Data Analysis
Here are 5 curated PDFs diving into the mean (M1), variance (M2), and their applications in crafting research questions and sourcing data.
A channel member requested resources on this topic and we delivered.
If you have a topic you want resources on let us know, and we’ll make it happen!
@datascience_bds
Here are 5 curated PDFs diving into the mean (M1), variance (M2), and their applications in crafting research questions and sourcing data.
A channel member requested resources on this topic and we delivered.
If you have a topic you want resources on let us know, and we’ll make it happen!
@datascience_bds
❤8
  📚 Data Science Riddle 
Why do we use Batch Normalization?
  Why do we use Batch Normalization?
Anonymous Quiz
    28%
    Speeds up training
      
    42%
    Prevents overfitting
      
    9%
    Adds non-linearity
      
    20%
    Reduces dataset size
      
    ❤4
  📚 Data Science Riddle 
Your object detection model misses small objects. Easiest fix?
  Your object detection model misses small objects. Easiest fix?
Anonymous Quiz
    20%
    Use larger input images
      
    31%
    Add more classes
      
    35%
    Reduce learning rate
      
    14%
    Train longer
      
    🤖 AI that creates AI: ASI-ARCH finds 106 new SOTA architectures
ASI-ARCH — experimental ASI that autonomously researches and designs neural nets. It hypothesizes, codes, trains & tests models.
💡 Scale:
1,773 experiments → 20,000+ GPU-hours.
Stage 1 (20M params, 1B tokens): 1,350 candidates beat DeltaNet.
Stage 2 (340M params): 400 models → 106 SOTA winners.
Top 5 trained on 15B tokens vs Mamba2 & Gated DeltaNet.
📊 Results:
PathGateFusionNet: 48.51 avg (Mamba2: 47.84, Gated DeltaNet: 47.32).
BoolQ: 60.58 vs 60.12 (Gated DeltaNet).
Consistent gains across tasks.
🔍 Insights:
Prefers proven tools (gating, convs), refines them iteratively.
Ideas come from: 51.7% literature, 38.2% self-analysis, 10.1% originality.
SOTA share: self-analysis ↑ to 44.8%, literature ↓ to 48.6%.
@datascience_bds
ASI-ARCH — experimental ASI that autonomously researches and designs neural nets. It hypothesizes, codes, trains & tests models.
💡 Scale:
1,773 experiments → 20,000+ GPU-hours.
Stage 1 (20M params, 1B tokens): 1,350 candidates beat DeltaNet.
Stage 2 (340M params): 400 models → 106 SOTA winners.
Top 5 trained on 15B tokens vs Mamba2 & Gated DeltaNet.
📊 Results:
PathGateFusionNet: 48.51 avg (Mamba2: 47.84, Gated DeltaNet: 47.32).
BoolQ: 60.58 vs 60.12 (Gated DeltaNet).
Consistent gains across tasks.
🔍 Insights:
Prefers proven tools (gating, convs), refines them iteratively.
Ideas come from: 51.7% literature, 38.2% self-analysis, 10.1% originality.
SOTA share: self-analysis ↑ to 44.8%, literature ↓ to 48.6%.
@datascience_bds
❤4
  🚀 Databricks Tip: REPLACE vs MERGE  
When updating Delta tables, you’ve got two powerful options:
🔹 REPLACE TABLE … ON
📚 Like throwing away the entire library and rebuilding it.
- Drops the old table & recreates it.
- Schema + data = fully replaced.
- ⚡ Super fast but destructive (old data gone).
- ✅ Best for full refreshes or schema changes.
🔹 MERGE
📖 Like updating only the books that changed.
- Works row by row.
- Updates, inserts, or deletes specific records.
- 🔍 Preserves unchanged data.
- ✅ Best for incremental updates or CDC (Change Data Capture).
⚖️ Key Difference
- REPLACE = Start fresh with a new table.
- MERGE = Surgically update rows without losing the rest.
👉 Rule of thumb:
Use REPLACE for full rebuilds,
Use MERGE for incremental upserts.
#Databricks #DeltaLake
When updating Delta tables, you’ve got two powerful options:
🔹 REPLACE TABLE … ON
📚 Like throwing away the entire library and rebuilding it.
- Drops the old table & recreates it.
- Schema + data = fully replaced.
- ⚡ Super fast but destructive (old data gone).
- ✅ Best for full refreshes or schema changes.
🔹 MERGE
📖 Like updating only the books that changed.
- Works row by row.
- Updates, inserts, or deletes specific records.
- 🔍 Preserves unchanged data.
- ✅ Best for incremental updates or CDC (Change Data Capture).
⚖️ Key Difference
- REPLACE = Start fresh with a new table.
- MERGE = Surgically update rows without losing the rest.
👉 Rule of thumb:
Use REPLACE for full rebuilds,
Use MERGE for incremental upserts.
#Databricks #DeltaLake
❤4
  📚 Data Science Riddle 
You have messy CSVs arriving daily. What's your first production step?
  You have messy CSVs arriving daily. What's your first production step?
Anonymous Quiz
    8%
    Train model right away
      
    16%
    Manually clean each file
      
    58%
    Automate data validation pipeline
      
    19%
    Combine all into one CSV
      
    