This media is not supported in your browser
VIEW IN TELEGRAM
π SEE-2-SOUND - a method for generating complex spatial sound based on images and videos
β pip install see2sound
π₯ GitHub
π‘ Hugging Face
π‘ Arxiv
@Machine_learn
β pip install see2sound
π₯ GitHub
π‘ Hugging Face
π‘ Arxiv
@Machine_learn
Seq2Seq: Sequence-to-Sequence Generator
π₯ Github: https://github.com/fiy2w/mri_seq2seq
π Paper: https://arxiv.org/abs/2407.02911v1
π₯Dataset: https://paperswithcode.com/task/contrastive-learning
@Machine_learn
π₯ Github: https://github.com/fiy2w/mri_seq2seq
π Paper: https://arxiv.org/abs/2407.02911v1
π₯Dataset: https://paperswithcode.com/task/contrastive-learning
@Machine_learn
Ψ³ΩΨ§Ω
Ψ―ΩΨ³ΨͺΨ§ΩΫ Ϊ©Ω Ω
ΩΨ§ΩΩ Ψ―Ψ§Ψ±Ω Ω
Ϋ ΨͺΩΩΩ Ψ¨Ω Ψ§ΫΩ ΪΩΨ±ΩΨ§Ω Ψ¨ΩΨ±Ψ³ΨͺΩ Ω Ω
Ω Ω Ψ¨Ω ΨΉΩΩΨ§Ω Ψ―Ψ§ΩΨ± Ω
ΨΉΨ±ΩΫ Ϊ©ΩΩ
@Machine_learn
@Machine_learn
Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling
π₯ Github: https://github.com/linghuyuhangyuan/m2s
π Paper: https://arxiv.org/abs/2407.05875v1
π₯Dataset: https://paperswithcode.com/task/denoising
@Machine_learn
π₯Dataset: https://paperswithcode.com/task/denoising
@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
πβπ¨ LongVA: Long Context Transfer from Language to Vision
βͺGithub: https://github.com/EvolvingLMMs-Lab/LongVA
βͺPaper: https://arxiv.org/abs/2406.16852
βͺProject: https://lmms-lab.github.io/posts/longva/
βͺDemo: https://longva-demo.lmms-lab.com/
@Machine_learn
βͺGithub: https://github.com/EvolvingLMMs-Lab/LongVA
βͺPaper: https://arxiv.org/abs/2406.16852
βͺProject: https://lmms-lab.github.io/posts/longva/
βͺDemo: https://longva-demo.lmms-lab.com/
@Machine_learn
Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation (ECCV 2024)
π₯ Github: https://github.com/fanghaook/ovformer
π Paper: https://arxiv.org/abs/2407.07427v1
@Machine_learn
@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
Multimodal contrastive learning for spatial gene expression prediction using histology images
π₯ Github: https://github.com/modelscope/data-juicer
π Paper: https://arxiv.org/abs/2407.08583v1
π Dataset: https://paperswithcode.com/dataset/coco
@Machine_learn
π Dataset: https://paperswithcode.com/dataset/coco
@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
π An Empirical Study of Mamba-based Pedestrian Attribute Recognition
π₯ Github: https://github.com/event-ahu/openpar
π Paper: https://arxiv.org/pdf/2407.10374v1.pdf
π Dataset: https://paperswithcode.com/dataset/peta
@Machine_learn
π Dataset: https://paperswithcode.com/dataset/peta
@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
π₯ Github: https://github.com/kaistmm/SSLalignment
π Paper: https://arxiv.org/abs/2407.13676v1
π Dataset: https://paperswithcode.com/dataset/is3-interactive-synthetic-sound-source
@Machine_learn
π Dataset: https://paperswithcode.com/dataset/is3-interactive-synthetic-sound-source
@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
π MG-LLaVA - multimodal LLM with advanced capabilities for working with visual information
Just recently, the guys from Shanghai University rolled out MG-LLaVA - MLLM, which expands the capabilities of processing visual information through the use of additional components: special components that are responsible for working with low and high resolution.
MG-LLaVA integrates an additional high-resolution visual encoder to capture fine details, which are then combined with underlying visual features using the Conv-Gate network.
Trained exclusively on publicly available multimodal data, MG-LLaVA achieves excellent results.
π‘ MG-LLaVA page
π₯ GitHub
@Machine_learn
Just recently, the guys from Shanghai University rolled out MG-LLaVA - MLLM, which expands the capabilities of processing visual information through the use of additional components: special components that are responsible for working with low and high resolution.
MG-LLaVA integrates an additional high-resolution visual encoder to capture fine details, which are then combined with underlying visual features using the Conv-Gate network.
Trained exclusively on publicly available multimodal data, MG-LLaVA achieves excellent results.
π‘ MG-LLaVA page
π₯ GitHub
@Machine_learn
Aligning Sight and Sound: Advanced Sound Source Localization Through Audio-Visual Alignment
π₯ Github: https://github.com/kaistmm/SSLalignment
π Paper: https://arxiv.org/abs/2407.13676v1
π Dataset: https://paperswithcode.com/dataset/is3-interactive-synthetic-sound-source
@Machine_learn
π₯ Github: https://github.com/kaistmm/SSLalignment
π Paper: https://arxiv.org/abs/2407.13676v1
π Dataset: https://paperswithcode.com/dataset/is3-interactive-synthetic-sound-source
@Machine_learn
π Dataset: https://paperswithcode.com/dataset/behave
@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
β‘οΈ EMO-Disentanger
π₯ Github: https://github.com/yuer867/emo-disentanger
π Paper: https://arxiv.org/abs/2407.20955v1
π Dataset: https://paperswithcode.com/dataset/emopia
@Machine_learn
π Dataset: https://paperswithcode.com/dataset/emopia
@Machine_learn
Please open Telegram to view this post
VIEW IN TELEGRAM
How to Think Like a Computer Scientist: Interactive Edition
https://runestone.academy/ns/books/published/thinkcspy/index.html
@Machine_learn
https://runestone.academy/ns/books/published/thinkcspy/index.html
@Machine_learn
No learning rates needed: Introducing SALSA - Stable Armijo Line Search Adaptation
π₯ Github: https://github.com/themody/no-learning-rates-needed-introducing-salsa-stable-armijo-line-search-adaptation
π Paper: https://arxiv.org/abs/2407.20650v1
π Dataset: https://paperswithcode.com/dataset/cifar-10
β
@Machine_learn
π₯ Github: https://github.com/themody/no-learning-rates-needed-introducing-salsa-stable-armijo-line-search-adaptation
π Paper: https://arxiv.org/abs/2407.20650v1
π Dataset: https://paperswithcode.com/dataset/cifar-10
Please open Telegram to view this post
VIEW IN TELEGRAM
https://research.google/blog/scaling-hierarchical-agglomerative-clustering-to-trillion-edge-graphs/
Please open Telegram to view this post
VIEW IN TELEGRAM
Pixart-Sigma, the first high-quality, transformer-based image generation training framework!
π₯ Github: https://github.com/PixArt-alpha/PixArt-sigma
π₯Demo: https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
β
@Machine_learn
π₯Demo: https://huggingface.co/spaces/PixArt-alpha/PixArt-Sigma
Please open Telegram to view this post
VIEW IN TELEGRAM
GitHub
GitHub - PixArt-alpha/PixArt-sigma: PixArt-Ξ£: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
PixArt-Ξ£: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation - PixArt-alpha/PixArt-sigma
Recall-Oriented-CL-Framework
π₯ Github: https://github.com/bigdata-inha/recall-oriented-cl-framework
π Paper: https://arxiv.org/pdf/2403.03082v1.pdf
π₯Dataset: https://paperswithcode.com/dataset/cifar-10
β¨ Tasks: https://paperswithcode.com/task/continual-learning
β
@Machine_learn
π₯Dataset: https://paperswithcode.com/dataset/cifar-10
Please open Telegram to view this post
VIEW IN TELEGRAM
GitHub
GitHub - bigdata-inha/recall-oriented-cl-framework
Contribute to bigdata-inha/recall-oriented-cl-framework development by creating an account on GitHub.