Community - AI Unfiltered

[D]Seeking feedback on an arXiv preprint: Unique Viable-Neighbor based Contour Tracing

Dec 16 via r/MachineLearning community

Hey everyone, I'm an independent researcher working in computer vision and image processing. I have developed a novel algorithm extending the traditional Moore-neighbor tracing method, specifically designed for more robust and efficient boundary delineation in high-fidelity stereo pairs. The...

Finally managed to run Qwen-2.5-7B on a 4GB GTX 1050 without CPU offloading (Surgical Memory Alignment)

Dec 16 via r/LocalLLaMA community

Hey everyone, I wanted to share a weekend project that grew into something bigger. Like many of you, I'm stuck with low-end hardware (a glorious GTX 1050 with 4GB VRAM). Every time I tried to load a modern 7B model (like Llama-3 or Qwen-2.5), I hit the dreaded OOM wall. The files were technically...

I was bored

Dec 16 via r/LocalLLaMA community

Being unemployed and having to much hardware and too much time on my hands I built this..   submitted by   /u/MyLovelyAngelKirino [link]   [comments]

Built a local image hub to organize my 30k+ PNG chaos — v0.10 integrates with A1111, handles ComfyUI workflows & runs 100% offline (v0.10.5 perf update)

Dec 16 via r/LocalLLaMA community

Hey everyone, I posted a while ago on other subs about a tool I built to manage my own mess of AI images, and wanted to share the latest update here since I know this community appreciates local-first software. Quick context: I have over 30k images generated across Invoke, A1111, SwarmUI, etc. My...

Meta announced a new SAM Audio Model for audio editing that can segment sound from complex audio mixtures using text, visual, and time span prompts.

Dec 16 via r/LocalLLaMA community

Source: https://about.fb.com/news/2025/12/our-new-sam-audio-model-transforms-audio-editing/ SAM Audio transforms audio processing by making it easy to isolate any sound from complex audio mixtures using text, visual, and time span prompts.   submitted by   /u/Difficult-Cap-7527 [link]  ...

Allen Institute for AI introduces Molmo 2

Dec 16 via r/LocalLLaMA community

https://allenai.org/molmo I am super impressed by the ability to analyze videos (Video QA, Counting and pointing, Dense captioning), and it's only 8B!! HuggingFace: https://huggingface.co/allenai/Molmo2-8B   submitted by   /u/Agitated_Camel1886 [link]   [comments]

[P] Using a Vector Quantized Variational Autoencoder to learn Bad Apple!! live, with online learning.

Dec 16 via r/MachineLearning community

I wanted to share something I was working on recently to experiment with VQ-VAEs! The goal of the project was to actively learn “Bad Apple!!” and reconstruct the song in the middle of training without seeing the current frame/audio sample. The song is only around 3 minutes so the VQ-VAE needed to...

Denoising Language Models for Speech Recognition

Dec 16 via r/MachineLearning community

We studied denoising language models (error correction models) as an alternative to standard language models. Denoising LMs use an encoder-decoder architecture, and are trained to reconstruct the original text from a corrupted version of it. We test them for speech recognition, and specifically...

[P] Cyreal - Yet Another Jax Dataloader

Dec 16 via r/MachineLearning community

Looking for a JAX dataloader that is fast, lightweight, and flexible? Try out Cyreal! GitHub Documentation Note: This is a new library and probably full of bugs. If you find one, please file an issue. Background JAX is a great library but the lack of dataloaders has been driving me crazy. I find it...

[P] Plotting ~8000 entities embeddings with cluster tags and ontologicol colour coding

Dec 16 via r/MachineLearning community

This is a side project I've been working on for a few months. I've designed a trait based ontology; 32 bits each representating a yes/no question, I've created trait specifications including examples and edge cases for each trait. The user names and describes an entity (anything you can imagine)...