Telegram Web Link
Machine learning books and papers pinned «نفرات ۳،۴ و ۵ این پروژه رو برای مشارکت در نظر گرفتیم. ژورنال مورد نظر برای ارسال Finance innovation If: 6.5 دوستانی که مایل به شرکت هستند با ایدی بنده در ارتباط باشند. @Raminmousa»
Mathematics for Machine Learning

📚 Book

@Machine_learn
Forwarded from Github LLMs
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

24 Jan 2025 · Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu ·

We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants: FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder-Adapter-LLM framework leveraging large language model (LLM) capabilities. On public Mandarin benchmarks, FireRedASR-LLM (8.3B parameters) achieves an average Character Error Rate (CER) of 3.05%, surpassing the latest SOTA of 3.33% with an 8.4% relative CER reduction (CERR). It demonstrates superior generalization capability over industrial-grade baselines, achieving 24%-40% CERR in multi-source Mandarin ASR scenarios such as video, live, and intelligent assistant. FireRedASR-AED: Designed to balance high performance and computational efficiency and to serve as an effective speech representation module in LLM-based speech models. It utilizes an Attention-based Encoder-Decoder (AED) architecture. On public Mandarin benchmarks, FireRedASR-AED (1.1B parameters) achieves an average CER of 3.18%, slightly worse than FireRedASR-LLM but still outperforming the latest SOTA model with over 12B parameters. It offers a more compact size, making it suitable for resource-constrained applications. Moreover, both models exhibit competitive results on Chinese dialects and English speech benchmarks and excel in singing lyrics recognition.

Paper: https://arxiv.org/pdf/2501.14350v1.pdf

Code: https://github.com/fireredteam/fireredasr

Datasets: LibriSpeech - AISHELL-1 - AISHELL-2 - WenetSpeech

https://www.tg-me.com/deep_learning_proj
یکی از ابزارهای خوبی که بنده تونستم توسعه بدم ابزار Stock Ai می باشد. در این ابزار از ۳۶۰ اندیکاتور استفاده کردم. گزارشات back test این ابزار در ویدیو های زیر موجود می باشد.

نفرات ۴ و ۵ از این مقاله باقی مونده است.
🔹🔹🔹🔹


May 2024 :

https://youtu.be/aSS99lynMFQ?si=QSk8VVKhLqO_2Qi3

July 2014:

https://youtu.be/ThyZ0mZwsGk?si=FKPK7Hkz-mRx-752&t=209

از این رو سعی میکنیم مقاله ای این کار رو بنویسیم. شروع مقاله ی این کار ۲۰ اسفند خواهد بود.
دوستانی که می تونن به هر نحوی کمک کنند تا شروع مقاله می تونن نام نویسی کنند.
@Raminmousa
Please open Telegram to view this post
VIEW IN TELEGRAM
Machine learning books and papers pinned «یکی از ابزارهای خوبی که بنده تونستم توسعه بدم ابزار Stock Ai می باشد. در این ابزار از ۳۶۰ اندیکاتور استفاده کردم. گزارشات back test این ابزار در ویدیو های زیر موجود می باشد. نفرات ۴ و ۵ از این مقاله باقی مونده است. 🔹🔹🔹🔹 May 2024 : https://youtu.b…»
⭐️ Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

🖥 Github: https://github.com/bcmi/Light-A-Video

📕 Paper: https://arxiv.org/abs/2502.08590v1

🌟 Dataset: https://paperswithcode.com/task/image-relighting

@Machine_learn
Forwarded from Github LLMs
⚡️ LLM4Decompile .

git clone https://github.com/albertan017/LLM4Decompile.git
cd LLM4Decompile
conda create -n 'llm4decompile' python=3.9 -y
conda activate llm4decompile
pip install -r requirements.txt


🟡 Github
🟡 Models
🟡 Paper
🟡 Colab
https://www.tg-me.com/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
LIMO: Less is More for Reasoning

5 Feb 2025 · Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, PengFei Liu ·

We present a fundamental discovery that challenges our understanding of how complex reasoning emerges in large language models. While conventional wisdom suggests that sophisticated reasoning tasks demand extensive training data (>100,000 examples), we demonstrate that complex mathematical reasoning abilities can be effectively elicited with surprisingly few examples. Through comprehensive experiments, our proposed model LIMO demonstrates unprecedented performance in mathematical reasoning. With merely 817 curated training samples, LIMO achieves 57.1% accuracy on AIME and 94.8% on #MATH, improving from previous SFT-based models' 6.5% and 59.2% respectively, while only using 1% of the training data required by previous approaches. LIMO demonstrates exceptional out-of-distribution generalization, achieving 40.5% absolute improvement across 10 diverse benchmarks, outperforming models trained on 100x more data, challenging the notion that SFT leads to memorization rather than generalization. Based on these results, we propose the Less-Is-More Reasoning Hypothesis (#LIMO Hypothesis): In foundation models where domain knowledge has been comprehensively encoded during pre-training, sophisticated reasoning capabilities can emerge through minimal but precisely orchestrated demonstrations of cognitive processes. This hypothesis posits that the elicitation threshold for complex reasoning is determined by two key factors: (1) the completeness of the model's encoded knowledge foundation during pre-training, and (2) the effectiveness of post-training examples as "cognitive templates" that show the model how to utilize its knowledge base to solve complex reasoning tasks. To facilitate reproducibility and future research in data-efficient reasoning

Paper: https://arxiv.org/pdf/2502.03387v1.pdf

Codes:
https://github.com/gair-nlp/limo
https://github.com/zhaoolee/garss


@Machine_learn
📄 How natural language processing derived techniques are used on biological data: a systematic review


📎 Study the paper


@Machine_learn
Probabilistic Artificial Intelligence

📄 Link

@Machine_learn
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation


Paper: https://arxiv.org/pdf/2502.05179v1.pdf

Code: https://github.com/foundationvision/flashvideo

@Machine_learn
Forwarded from Papers
با عرض سلام برای یکی از کارهای پژوهشیمون در wound image classification نیاز به نفر سوم داریم. شخص علاوه بر کار بخشی از هزینه سرور رو هم باید تقبل کنه.
Journal: https://www.nature.com/srep/
جهت هماهنگی می تونین با ایدی بنده در ارتباط باشین.

@Raminmousa
Painful intelligence: What AI can tell us about human suffering

📄 Book


@Machine_learn
CapsF: Capsule Fusion for Extracting psychiatric stressors for suicide from Twitter

Author links open overlay panel
Mohammad Ali Dadgostarnia ,
Ramin Mousa , Saba Hesaraki ,
Mahdi Hemmasian

https://www.sciencedirect.com/science/article/pii/S294971912500010X

@Machine_learn
OmniParser for Pure Vision Based GUI Agent

1 Aug 2024 · Yadong Lu, Jianwei Yang, Yelong Shen, Ahmed Awadallah

The recent success of large vision language models shows great potential in driving the agent system operating on user interfaces. However, we argue that the power multimodal models like GPT-4V as a general agent on multiple operating systems across different applications is largely underestimated due to the lack of a robust screen parsing technique capable of: 1) reliably identifying interactable icons within the user interface, and 2) understanding the semantics of various elements in a screenshot and accurately associate the intended action with the corresponding region on the screen. To fill these gaps, we introduce \textsc{OmniParser}, a comprehensive method for parsing user interface screenshots into structured elements, which significantly enhances the ability of #GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface. We first curated an interactable icon detection dataset using popular webpages and an icon description dataset. These datasets were utilized to fine-tune specialized models: a detection model to parse interactable regions on the screen and a caption model to extract the functional semantics of the detected elements. \textsc{#OmniParser} significantly improves GPT-4V's performance on ScreenSpot benchmark. And on #Mind2Web and AITW benchmark, \textsc{OmniParser} with screenshot only input #outperforms the GPT-4V baselines requiring additional information outside of screenshot.

Paper: https://arxiv.org/pdf/2408.00203v1.pdf

Code: https://github.com/microsoft/omniparser

Dataset: ScreenSpot


@Machine_learn
Competitive Programming with Large Reasoning Models
OpenAI∗


link

@Machine_learn
2025/02/22 05:41:52
Back to Top
HTML Embed Code: