Hey everyone, I'm an independent researcher working in computer vision and image processing. I have developed a novel algorithm extending the traditional Moore-neighbor tracing method, specifically designed for more robust and efficient boundary delineation in high-fidelity stereo pairs. The...
Hey everyone, I wanted to share a weekend project that grew into something bigger. Like many of you, I'm stuck with low-end hardware (a glorious GTX 1050 with 4GB VRAM). Every time I tried to load a modern 7B model (like Llama-3 or Qwen-2.5), I hit the dreaded OOM wall. The files were technically...
Being unemployed and having to much hardware and too much time on my hands I built this..   submitted by   /u/MyLovelyAngelKirino [link]   [comments]
Hey everyone, I posted a while ago on other subs about a tool I built to manage my own mess of AI images, and wanted to share the latest update here since I know this community appreciates local-first software. Quick context: I have over 30k images generated across Invoke, A1111, SwarmUI, etc. My...
Source: https://about.fb.com/news/2025/12/our-new-sam-audio-model-transforms-audio-editing/ SAM Audio transforms audio processing by making it easy to isolate any sound from complex audio mixtures using text, visual, and time span prompts.   submitted by   /u/Difficult-Cap-7527 [link]  ...
https://allenai.org/molmo I am super impressed by the ability to analyze videos (Video QA, Counting and pointing, Dense captioning), and it's only 8B!! HuggingFace: https://huggingface.co/allenai/Molmo2-8B   submitted by   /u/Agitated_Camel1886 [link]   [comments]
I wanted to share something I was working on recently to experiment with VQ-VAEs! The goal of the project was to actively learn “Bad Apple!!” and reconstruct the song in the middle of training without seeing the current frame/audio sample. The song is only around 3 minutes so the VQ-VAE needed to...
We studied denoising language models (error correction models) as an alternative to standard language models. Denoising LMs use an encoder-decoder architecture, and are trained to reconstruct the original text from a corrupted version of it. We test them for speech recognition, and specifically...
Looking for a JAX dataloader that is fast, lightweight, and flexible? Try out Cyreal! GitHub Documentation Note: This is a new library and probably full of bugs. If you find one, please file an issue. Background JAX is a great library but the lack of dataloaders has been driving me crazy. I find it...
This is a side project I've been working on for a few months. I've designed a trait based ontology; 32 bits each representating a yes/no question, I've created trait specifications including examples and edge cases for each trait. The user names and describes an entity (anything you can imagine)...