Telegram Web Link
LLM4Decompile: Decompiling Binary Code with Large Language Models

8 Mar 2024 · Hanzhuo Tan, Qi Luo, Jing Li, Yuqun Zhang ·

Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose LLM4Decompile, the first and largest open-source #LLM series (1.3B to 33B) trained to decompile binary code. We optimize the LLM training process and introduce the LLM4Decompile-End models to decompile binary directly. The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate. Additionally, we improve the standard refinement approach to fine-tune the LLM4Decompile-Ref models, enabling them to effectively refine the decompiled code from Ghidra and achieve a further 16.2% improvement over the LLM4Decompile-End. LLM4Decompile demonstrates the potential of LLMs to revolutionize binary code decompilation, delivering remarkable improvements in readability and executability while complementing conventional tools for optimal results.

Paper: https://arxiv.org/pdf/2403.05286v3.pdf

Code: https://github.com/albertan017/LLM4Decompile



@Machine_learn
Machine learning books and papers pinned «نفرات ۳،۴ و ۵ این پروژه رو برای مشارکت در نظر گرفتیم. ژورنال مورد نظر برای ارسال Finance innovation If: 6.5 دوستانی که مایل به شرکت هستند با ایدی بنده در ارتباط باشند. @Raminmousa»
Mathematics for Machine Learning

📚 Book

@Machine_learn
Forwarded from Github LLMs
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

24 Jan 2025 · Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu ·

We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants: FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder-Adapter-LLM framework leveraging large language model (LLM) capabilities. On public Mandarin benchmarks, FireRedASR-LLM (8.3B parameters) achieves an average Character Error Rate (CER) of 3.05%, surpassing the latest SOTA of 3.33% with an 8.4% relative CER reduction (CERR). It demonstrates superior generalization capability over industrial-grade baselines, achieving 24%-40% CERR in multi-source Mandarin ASR scenarios such as video, live, and intelligent assistant. FireRedASR-AED: Designed to balance high performance and computational efficiency and to serve as an effective speech representation module in LLM-based speech models. It utilizes an Attention-based Encoder-Decoder (AED) architecture. On public Mandarin benchmarks, FireRedASR-AED (1.1B parameters) achieves an average CER of 3.18%, slightly worse than FireRedASR-LLM but still outperforming the latest SOTA model with over 12B parameters. It offers a more compact size, making it suitable for resource-constrained applications. Moreover, both models exhibit competitive results on Chinese dialects and English speech benchmarks and excel in singing lyrics recognition.

Paper: https://arxiv.org/pdf/2501.14350v1.pdf

Code: https://github.com/fireredteam/fireredasr

Datasets: LibriSpeech - AISHELL-1 - AISHELL-2 - WenetSpeech

https://www.tg-me.com/deep_learning_proj
یکی از ابزارهای خوبی که بنده تونستم توسعه بدم ابزار Stock Ai می باشد. در این ابزار از ۳۶۰ اندیکاتور استفاده کردم. گزارشات back test این ابزار در ویدیو های زیر موجود می باشد.

نفرات ۴ و ۵ از این مقاله باقی مونده است.
🔹🔹🔹🔹


May 2024 :

https://youtu.be/aSS99lynMFQ?si=QSk8VVKhLqO_2Qi3

July 2014:

https://youtu.be/ThyZ0mZwsGk?si=FKPK7Hkz-mRx-752&t=209

از این رو سعی میکنیم مقاله ای این کار رو بنویسیم. شروع مقاله ی این کار ۲۰ اسفند خواهد بود.
دوستانی که می تونن به هر نحوی کمک کنند تا شروع مقاله می تونن نام نویسی کنند.
@Raminmousa
Please open Telegram to view this post
VIEW IN TELEGRAM
Machine learning books and papers pinned «یکی از ابزارهای خوبی که بنده تونستم توسعه بدم ابزار Stock Ai می باشد. در این ابزار از ۳۶۰ اندیکاتور استفاده کردم. گزارشات back test این ابزار در ویدیو های زیر موجود می باشد. نفرات ۴ و ۵ از این مقاله باقی مونده است. 🔹🔹🔹🔹 May 2024 : https://youtu.b…»
⭐️ Light-A-Video: Training-free Video Relighting via Progressive Light Fusion

🖥 Github: https://github.com/bcmi/Light-A-Video

📕 Paper: https://arxiv.org/abs/2502.08590v1

🌟 Dataset: https://paperswithcode.com/task/image-relighting

@Machine_learn
Forwarded from Github LLMs
⚡️ LLM4Decompile .

git clone https://github.com/albertan017/LLM4Decompile.git
cd LLM4Decompile
conda create -n 'llm4decompile' python=3.9 -y
conda activate llm4decompile
pip install -r requirements.txt


🟡 Github
🟡 Models
🟡 Paper
🟡 Colab
https://www.tg-me.com/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
LIMO: Less is More for Reasoning

5 Feb 2025 · Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, PengFei Liu ·

We present a fundamental discovery that challenges our understanding of how complex reasoning emerges in large language models. While conventional wisdom suggests that sophisticated reasoning tasks demand extensive training data (>100,000 examples), we demonstrate that complex mathematical reasoning abilities can be effectively elicited with surprisingly few examples. Through comprehensive experiments, our proposed model LIMO demonstrates unprecedented performance in mathematical reasoning. With merely 817 curated training samples, LIMO achieves 57.1% accuracy on AIME and 94.8% on #MATH, improving from previous SFT-based models' 6.5% and 59.2% respectively, while only using 1% of the training data required by previous approaches. LIMO demonstrates exceptional out-of-distribution generalization, achieving 40.5% absolute improvement across 10 diverse benchmarks, outperforming models trained on 100x more data, challenging the notion that SFT leads to memorization rather than generalization. Based on these results, we propose the Less-Is-More Reasoning Hypothesis (#LIMO Hypothesis): In foundation models where domain knowledge has been comprehensively encoded during pre-training, sophisticated reasoning capabilities can emerge through minimal but precisely orchestrated demonstrations of cognitive processes. This hypothesis posits that the elicitation threshold for complex reasoning is determined by two key factors: (1) the completeness of the model's encoded knowledge foundation during pre-training, and (2) the effectiveness of post-training examples as "cognitive templates" that show the model how to utilize its knowledge base to solve complex reasoning tasks. To facilitate reproducibility and future research in data-efficient reasoning

Paper: https://arxiv.org/pdf/2502.03387v1.pdf

Codes:
https://github.com/gair-nlp/limo
https://github.com/zhaoolee/garss


@Machine_learn
📄 How natural language processing derived techniques are used on biological data: a systematic review


📎 Study the paper


@Machine_learn
Probabilistic Artificial Intelligence

📄 Link

@Machine_learn
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation


Paper: https://arxiv.org/pdf/2502.05179v1.pdf

Code: https://github.com/foundationvision/flashvideo

@Machine_learn
Forwarded from Papers
با عرض سلام برای یکی از کارهای پژوهشیمون در wound image classification نیاز به نفر سوم داریم. شخص علاوه بر کار بخشی از هزینه سرور رو هم باید تقبل کنه.
Journal: https://www.nature.com/srep/
جهت هماهنگی می تونین با ایدی بنده در ارتباط باشین.

@Raminmousa
Painful intelligence: What AI can tell us about human suffering

📄 Book


@Machine_learn
CapsF: Capsule Fusion for Extracting psychiatric stressors for suicide from Twitter

Author links open overlay panel
Mohammad Ali Dadgostarnia ,
Ramin Mousa , Saba Hesaraki ,
Mahdi Hemmasian

https://www.sciencedirect.com/science/article/pii/S294971912500010X

@Machine_learn
2025/02/21 03:22:48
Back to Top
HTML Embed Code: