LLM4Decompile: Decompiling Binary Code with Large Language Models
8 Mar 2024 · Hanzhuo Tan, Qi Luo, Jing Li, Yuqun Zhang ·
Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose LLM4Decompile, the first and largest open-source #LLM series (1.3B to 33B) trained to decompile binary code. We optimize the LLM training process and introduce the LLM4Decompile-End models to decompile binary directly. The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate. Additionally, we improve the standard refinement approach to fine-tune the LLM4Decompile-Ref models, enabling them to effectively refine the decompiled code from Ghidra and achieve a further 16.2% improvement over the LLM4Decompile-End. LLM4Decompile demonstrates the potential of LLMs to revolutionize binary code decompilation, delivering remarkable improvements in readability and executability while complementing conventional tools for optimal results.
Paper: https://arxiv.org/pdf/2403.05286v3.pdf
Code: https://github.com/albertan017/LLM4Decompile
@Machine_learn
8 Mar 2024 · Hanzhuo Tan, Qi Luo, Jing Li, Yuqun Zhang ·
Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose LLM4Decompile, the first and largest open-source #LLM series (1.3B to 33B) trained to decompile binary code. We optimize the LLM training process and introduce the LLM4Decompile-End models to decompile binary directly. The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate. Additionally, we improve the standard refinement approach to fine-tune the LLM4Decompile-Ref models, enabling them to effectively refine the decompiled code from Ghidra and achieve a further 16.2% improvement over the LLM4Decompile-End. LLM4Decompile demonstrates the potential of LLMs to revolutionize binary code decompilation, delivering remarkable improvements in readability and executability while complementing conventional tools for optimal results.
Paper: https://arxiv.org/pdf/2403.05286v3.pdf
Code: https://github.com/albertan017/LLM4Decompile
@Machine_learn
Machine learning books and papers
یکی از ابزارهای خوبی که بنده تونستم توسعه بدم ابزار Stock Ai می باشد. در این ابزار از ۳۶۰ اندیکاتور استفاده کردم. گزارشات back test این ابزار در ویدیو های زیر موجود می باشد. May 2024 : https://youtu.be/aSS99lynMFQ?si=QSk8VVKhLqO_2Qi3 July 2014: ht…
نفرات ۳،۴ و ۵ این پروژه رو برای مشارکت در نظر گرفتیم. ژورنال مورد نظر برای ارسال
Finance innovation
If: 6.5
دوستانی که مایل به شرکت هستند با ایدی بنده در ارتباط باشند.
@Raminmousa
Finance innovation
If: 6.5
دوستانی که مایل به شرکت هستند با ایدی بنده در ارتباط باشند.
@Raminmousa
Machine learning books and papers pinned «نفرات ۳،۴ و ۵ این پروژه رو برای مشارکت در نظر گرفتیم. ژورنال مورد نظر برای ارسال Finance innovation If: 6.5 دوستانی که مایل به شرکت هستند با ایدی بنده در ارتباط باشند. @Raminmousa»
Forwarded from Github LLMs
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
24 Jan 2025 · Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu ·
We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants: FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder-Adapter-LLM framework leveraging large language model (LLM) capabilities. On public Mandarin benchmarks, FireRedASR-LLM (8.3B parameters) achieves an average Character Error Rate (CER) of 3.05%, surpassing the latest SOTA of 3.33% with an 8.4% relative CER reduction (CERR). It demonstrates superior generalization capability over industrial-grade baselines, achieving 24%-40% CERR in multi-source Mandarin ASR scenarios such as video, live, and intelligent assistant. FireRedASR-AED: Designed to balance high performance and computational efficiency and to serve as an effective speech representation module in LLM-based speech models. It utilizes an Attention-based Encoder-Decoder (AED) architecture. On public Mandarin benchmarks, FireRedASR-AED (1.1B parameters) achieves an average CER of 3.18%, slightly worse than FireRedASR-LLM but still outperforming the latest SOTA model with over 12B parameters. It offers a more compact size, making it suitable for resource-constrained applications. Moreover, both models exhibit competitive results on Chinese dialects and English speech benchmarks and excel in singing lyrics recognition.
Paper: https://arxiv.org/pdf/2501.14350v1.pdf
Code: https://github.com/fireredteam/fireredasr
Datasets: LibriSpeech - AISHELL-1 - AISHELL-2 - WenetSpeech
https://www.tg-me.com/deep_learning_proj
24 Jan 2025 · Kai-Tuo Xu, Feng-Long Xie, Xu Tang, Yao Hu ·
We present FireRedASR, a family of large-scale automatic speech recognition (ASR) models for Mandarin, designed to meet diverse requirements in superior performance and optimal efficiency across various applications. FireRedASR comprises two variants: FireRedASR-LLM: Designed to achieve state-of-the-art (SOTA) performance and to enable seamless end-to-end speech interaction. It adopts an Encoder-Adapter-LLM framework leveraging large language model (LLM) capabilities. On public Mandarin benchmarks, FireRedASR-LLM (8.3B parameters) achieves an average Character Error Rate (CER) of 3.05%, surpassing the latest SOTA of 3.33% with an 8.4% relative CER reduction (CERR). It demonstrates superior generalization capability over industrial-grade baselines, achieving 24%-40% CERR in multi-source Mandarin ASR scenarios such as video, live, and intelligent assistant. FireRedASR-AED: Designed to balance high performance and computational efficiency and to serve as an effective speech representation module in LLM-based speech models. It utilizes an Attention-based Encoder-Decoder (AED) architecture. On public Mandarin benchmarks, FireRedASR-AED (1.1B parameters) achieves an average CER of 3.18%, slightly worse than FireRedASR-LLM but still outperforming the latest SOTA model with over 12B parameters. It offers a more compact size, making it suitable for resource-constrained applications. Moreover, both models exhibit competitive results on Chinese dialects and English speech benchmarks and excel in singing lyrics recognition.
Paper: https://arxiv.org/pdf/2501.14350v1.pdf
Code: https://github.com/fireredteam/fireredasr
Datasets: LibriSpeech - AISHELL-1 - AISHELL-2 - WenetSpeech
https://www.tg-me.com/deep_learning_proj
یکی از ابزارهای خوبی که بنده تونستم توسعه بدم ابزار Stock Ai می باشد. در این ابزار از ۳۶۰ اندیکاتور استفاده کردم. گزارشات back test این ابزار در ویدیو های زیر موجود می باشد.
نفرات ۴ و ۵ از این مقاله باقی مونده است.
🔹 🔹 🔹 🔹
May 2024 :
https://youtu.be/aSS99lynMFQ?si=QSk8VVKhLqO_2Qi3
July 2014:
https://youtu.be/ThyZ0mZwsGk?si=FKPK7Hkz-mRx-752&t=209
از این رو سعی میکنیم مقاله ای این کار رو بنویسیم. شروع مقاله ی این کار ۲۰ اسفند خواهد بود.
دوستانی که می تونن به هر نحوی کمک کنند تا شروع مقاله می تونن نام نویسی کنند.
@Raminmousa
نفرات ۴ و ۵ از این مقاله باقی مونده است.
May 2024 :
https://youtu.be/aSS99lynMFQ?si=QSk8VVKhLqO_2Qi3
July 2014:
https://youtu.be/ThyZ0mZwsGk?si=FKPK7Hkz-mRx-752&t=209
از این رو سعی میکنیم مقاله ای این کار رو بنویسیم. شروع مقاله ی این کار ۲۰ اسفند خواهد بود.
دوستانی که می تونن به هر نحوی کمک کنند تا شروع مقاله می تونن نام نویسی کنند.
@Raminmousa
Please open Telegram to view this post
VIEW IN TELEGRAM
YouTube
May 2024 Backtest Smart AI Signal Telegram Channel #telegram_to_mt4 #telegramsignals
-------------------------------------------------------------------------------------
For the next 30 days, you can USE PROMO CODE LAUNCH70 to get 70% off your subscription of the mltiplai.com database.
FOR MORE INFO VISIT US AT
✅ https://mltiplai.com
✅…
For the next 30 days, you can USE PROMO CODE LAUNCH70 to get 70% off your subscription of the mltiplai.com database.
FOR MORE INFO VISIT US AT
✅ https://mltiplai.com
✅…
Machine learning books and papers pinned «یکی از ابزارهای خوبی که بنده تونستم توسعه بدم ابزار Stock Ai می باشد. در این ابزار از ۳۶۰ اندیکاتور استفاده کردم. گزارشات back test این ابزار در ویدیو های زیر موجود می باشد. نفرات ۴ و ۵ از این مقاله باقی مونده است. 🔹 🔹 🔹 🔹 May 2024 : https://youtu.b…»
⭐️ Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
🖥 Github: https://github.com/bcmi/Light-A-Video
📕 Paper: https://arxiv.org/abs/2502.08590v1
🌟 Dataset: https://paperswithcode.com/task/image-relighting
@Machine_learn
🖥 Github: https://github.com/bcmi/Light-A-Video
📕 Paper: https://arxiv.org/abs/2502.08590v1
🌟 Dataset: https://paperswithcode.com/task/image-relighting
@Machine_learn
Forwarded from Github LLMs
⚡️ LLM4Decompile .
🟡 Github
🟡 Models
🟡 Paper
🟡 Colab
https://www.tg-me.com/deep_learning_proj
git clone https://github.com/albertan017/LLM4Decompile.git
cd LLM4Decompile
conda create -n 'llm4decompile' python=3.9 -y
conda activate llm4decompile
pip install -r requirements.txt
https://www.tg-me.com/deep_learning_proj
Please open Telegram to view this post
VIEW IN TELEGRAM
LIMO: Less is More for Reasoning
5 Feb 2025 · Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, PengFei Liu ·
We present a fundamental discovery that challenges our understanding of how complex reasoning emerges in large language models. While conventional wisdom suggests that sophisticated reasoning tasks demand extensive training data (>100,000 examples), we demonstrate that complex mathematical reasoning abilities can be effectively elicited with surprisingly few examples. Through comprehensive experiments, our proposed model LIMO demonstrates unprecedented performance in mathematical reasoning. With merely 817 curated training samples, LIMO achieves 57.1% accuracy on AIME and 94.8% on #MATH, improving from previous SFT-based models' 6.5% and 59.2% respectively, while only using 1% of the training data required by previous approaches. LIMO demonstrates exceptional out-of-distribution generalization, achieving 40.5% absolute improvement across 10 diverse benchmarks, outperforming models trained on 100x more data, challenging the notion that SFT leads to memorization rather than generalization. Based on these results, we propose the Less-Is-More Reasoning Hypothesis (#LIMO Hypothesis): In foundation models where domain knowledge has been comprehensively encoded during pre-training, sophisticated reasoning capabilities can emerge through minimal but precisely orchestrated demonstrations of cognitive processes. This hypothesis posits that the elicitation threshold for complex reasoning is determined by two key factors: (1) the completeness of the model's encoded knowledge foundation during pre-training, and (2) the effectiveness of post-training examples as "cognitive templates" that show the model how to utilize its knowledge base to solve complex reasoning tasks. To facilitate reproducibility and future research in data-efficient reasoning
Paper: https://arxiv.org/pdf/2502.03387v1.pdf
Codes:
https://github.com/gair-nlp/limo
https://github.com/zhaoolee/garss
@Machine_learn
5 Feb 2025 · Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, PengFei Liu ·
We present a fundamental discovery that challenges our understanding of how complex reasoning emerges in large language models. While conventional wisdom suggests that sophisticated reasoning tasks demand extensive training data (>100,000 examples), we demonstrate that complex mathematical reasoning abilities can be effectively elicited with surprisingly few examples. Through comprehensive experiments, our proposed model LIMO demonstrates unprecedented performance in mathematical reasoning. With merely 817 curated training samples, LIMO achieves 57.1% accuracy on AIME and 94.8% on #MATH, improving from previous SFT-based models' 6.5% and 59.2% respectively, while only using 1% of the training data required by previous approaches. LIMO demonstrates exceptional out-of-distribution generalization, achieving 40.5% absolute improvement across 10 diverse benchmarks, outperforming models trained on 100x more data, challenging the notion that SFT leads to memorization rather than generalization. Based on these results, we propose the Less-Is-More Reasoning Hypothesis (#LIMO Hypothesis): In foundation models where domain knowledge has been comprehensively encoded during pre-training, sophisticated reasoning capabilities can emerge through minimal but precisely orchestrated demonstrations of cognitive processes. This hypothesis posits that the elicitation threshold for complex reasoning is determined by two key factors: (1) the completeness of the model's encoded knowledge foundation during pre-training, and (2) the effectiveness of post-training examples as "cognitive templates" that show the model how to utilize its knowledge base to solve complex reasoning tasks. To facilitate reproducibility and future research in data-efficient reasoning
Paper: https://arxiv.org/pdf/2502.03387v1.pdf
Codes:
https://github.com/gair-nlp/limo
https://github.com/zhaoolee/garss
@Machine_learn
📄 How natural language processing derived techniques are used on biological data: a systematic review
📎 Study the paper
@Machine_learn
📎 Study the paper
@Machine_learn
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation
Paper: https://arxiv.org/pdf/2502.05179v1.pdf
Code: https://github.com/foundationvision/flashvideo
@Machine_learn
Paper: https://arxiv.org/pdf/2502.05179v1.pdf
Code: https://github.com/foundationvision/flashvideo
@Machine_learn
Forwarded from Papers
با عرض سلام برای یکی از کارهای پژوهشیمون در wound image classification نیاز به نفر سوم داریم. شخص علاوه بر کار بخشی از هزینه سرور رو هم باید تقبل کنه.
Journal: https://www.nature.com/srep/
جهت هماهنگی می تونین با ایدی بنده در ارتباط باشین.
@Raminmousa
Journal: https://www.nature.com/srep/
جهت هماهنگی می تونین با ایدی بنده در ارتباط باشین.
@Raminmousa
Nature
Scientific Reports
Scientific Reports publishes original research in all areas of the natural and clinical sciences. We believe that if your research is scientifically valid and ...
CapsF: Capsule Fusion for Extracting psychiatric stressors for suicide from Twitter
Author links open overlay panel
Mohammad Ali Dadgostarnia ,
Ramin Mousa , Saba Hesaraki ,
Mahdi Hemmasian
https://www.sciencedirect.com/science/article/pii/S294971912500010X
@Machine_learn
Author links open overlay panel
Mohammad Ali Dadgostarnia ,
Ramin Mousa , Saba Hesaraki ,
Mahdi Hemmasian
https://www.sciencedirect.com/science/article/pii/S294971912500010X
@Machine_learn