Tools & Open SourceHacker News Front Page · April 6, 2026

Ghost Pepper Brings 100% Local, Free Speech-to-Text to macOS with Hold-to-Talk Simplicity

More to read

ik_llama.cpp Fork Delivers 26x Faster Prompt Processing for Qwen 3.5 27B via Fused CUDA Kernels

A user running Qwen 3.5 27B Q4_K_M on an NVIDIA RTX PRO 4000 Blackwell GPU documented dramatic performance gains by switching from mainline llama.cpp to the ik_llama.cpp fork. The fork achieves 1,122 tok/sec prompt evaluation (up from ~43 tok/sec) and 26 tok/sec generation (up from ~7.5 tok/sec) by implementing fused Gated Delta Network (GDN) CUDA kernels tailored to Qwen 3.5's hybrid SSM/attention architecture. The key improvement reduces graph splits from 34 to just 2, enabling full GPU utilization during inference instead of heavy CPU involvement.

Reddit r/LocalLLaMA

Tools & Open Source4 months ago

Litesearch Brings Karpathy's AI Autoresearch to Budget Consumer GPUs with a Slick GUI

A developer has forked Andrej Karpathy's "autoresearch" project — an agent that autonomously edits and runs LLM training experiments — to make it accessible on consumer GPUs with as little as 4–8GB of VRAM. The fork, called Litesearch, auto-tunes model size, batch size, and sequence length to fit available memory, and adds a dark-mode GUI dashboard for easy monitoring. It is NVIDIA-only, MIT-licensed, and installable via pip or uv.

Reddit r/LocalLLaMA

Tools & Open Source4 months ago

Skillware: The Open-Source "App Store" for AI Agent Capabilities

Skillware is an open-source Python framework that packages AI agent capabilities — including logic, cognition, governance, and tool interfaces — into modular, self-contained "Skills" installable via pip. It aims to eliminate fragmentation in the AI ecosystem by providing a standardized registry of reusable agent capabilities that work across major LLM providers like Gemini, Claude, GPT, and Llama. Developed by ARPA Hellenic Logical Systems, it positions itself as the "apt-get for AI know-how," decoupling capability from the underlying intelligence model.

Hacker News Front Page

Tools & Open Source4 months ago

Open-Source Benchmark Exposes Gap Between LLMs' Stated and Actual Confidence

A developer has released an open-source benchmark specifically designed to evaluate whether large language models (LLMs) are truly as confident in their outputs as they present themselves to be. The benchmark tests the calibration between LLMs' expressed confidence and their actual accuracy, revealing that models frequently overclaim certainty. The findings suggest a significant and widespread reliability concern across modern LLMs.

Reddit r/MachineLearning

Tools & Open Source4 months ago

Hugging Face's TGI Enters Maintenance Mode — Is It Time to Move to vLLM?

Hugging Face has officially halted new development on its Text Generation Inference (TGI) engine, putting it into maintenance mode. A Reddit user on r/LocalLLaMA highlights this shift, noting their poor experiences with TGI compared to alternatives like llama.cpp and vLLM. This development appears to settle a longstanding debate in the LLM serving community about whether TGI or vLLM is the superior inference engine.

Reddit r/LocalLLaMA

Tools & Open Source4 months ago

Atuin v18.13 Brings AI-Powered Shell Commands, Smarter Search, and a Lightweight PTY Proxy

Atuin v18.13 is a major release for the shell history tool, introducing an AI-powered natural language-to-bash assistant, a significantly faster in-memory search daemon, and a new lightweight PTY proxy called "Hex" that improves terminal rendering. The AI feature allows users to describe what they want in plain English and receive safe, privacy-conscious shell commands powered by frontier LLMs and man page datasets. The release also adds new authentication options including Google and GitHub login for the hosted sync service.

Hacker News Front Page