Research & PapersReddit r/MachineLearning · March 20, 2026

Medical AI Bias Exposed: Automated Training Labels Make Models 66% Worse — And Benchmarks Miss It

Read original source ↗

More to read

All stories
Research & Papersabout 2 months ago

Llama 8B Rivals 70B on Multi-Hop QA Using Structured Prompting — No Fine-Tuning Required

Researcher Greedy-Teach1533 demonstrated that Llama 3.1 8B can match or exceed Llama 3.3 70B on multi-hop question answering benchmarks using two inference-time techniques: structured chain-of-thought prompting and 60% context compression via graph traversal. The experiments, conducted using Graph RAG (KET-RAG), revealed that retrieval is largely a solved problem — the answer is in the context 77–91% of the time — but reasoning remains the critical bottleneck, accounting for 73–84% of failures. The approach was validated across HotpotQA, MuSiQue, and 2WikiMultiHopQA (500 questions each) at roughly 12x lower cost.

Reddit r/LocalLLaMA

Research & Papersabout 2 months ago

Chinese AI Model MiniMax M2.7 Actively Participated in Its Own Development

Chinese AI company MiniMax has released M2.7, a model that played an active role in its own development through autonomous optimization loops — updating its knowledge stores, building capabilities, and refining its own reward-based training. Over 100 autonomous rounds, M2.7 independently analyzed failures, adjusted code, and tested results, achieving a reported 30% performance boost on internal evaluations. The model competes closely with leading Western models like GPT-5.4 and Gemini 3.1 Pro across multiple benchmarks, and MiniMax envisions future AI self-evolution progressing toward full autonomy without human involvement.

The Decoder

Research & Papersabout 2 months ago

AI-Powered Smart Wheelchairs Aim to Autonomously Navigate Real-World Obstacles

Researchers at the German Research Center for Artificial Intelligence (DFKI) have developed prototype sensor-equipped smart wheelchairs capable of both semi-autonomous and fully autonomous navigation, using natural language commands, SLAM mapping, and drone-based cameras for obstacle detection. Presented at the CSUN Assistive Technology Conference in Anaheim, the project — called REXASI-PRO — integrates LiDAR, 3D cameras, and open-source navigation systems to guide users safely through complex environments. Experts in the field caution that cost, reliability in real-world conditions, and diverse user needs remain significant barriers to mainstream adoption.

IEEE Spectrum AI

Research & Papersabout 2 months ago

Qwen3.5-9B on MacBook M5 Pro Scores 93.8% on Home Security AI Benchmark — Just 4 Points Behind GPT-5.4

A new domain-specific benchmark called HomeSec-Bench v1 pits local LLMs against cloud models on 96 real-world home security AI tasks across 15 test suites. Running on a MacBook Pro M5 Pro with 64GB unified memory via llama.cpp, the Qwen3.5-9B model achieved a 93.8% pass rate — only 4.1 points behind GPT-5.4 and surpassing GPT-5.4-nano — all with zero API costs and full data privacy. The benchmark covers critical security workflows including threat classification, tool use, prompt injection resistance, and multi-camera event deduplication.

Hacker News Front Page

Research & Papersabout 2 months ago

Qualcomm Shrinks AI Reasoning Chains by 2.4x to Run Thinking Models Directly on Smartphones

Qualcomm AI Research has developed a modular framework that enables reasoning-capable language models to run locally on smartphones by compressing verbose chain-of-thought outputs by an average factor of 2.4x — and up to 8x in some cases — using reinforcement learning. The system builds on a base model (Qwen2.5-7B-Instruct) enhanced with LoRA adapters and 4-bit weight compression, allowing it to switch between fast chat and deep reasoning modes while keeping sensitive data on-device. Despite the impressive technical achievement, deep system integration with apps like email and calendars still relies on cloud-based models in practice.

The Decoder

Research & Papersabout 2 months ago

Psychedelic Clinical Trials Keep Falling Flat — And the Hype May Be to Blame

Two new studies on psilocybin as a treatment for depression found results that were either inconclusive or no better than traditional antidepressants, reigniting concerns about the overhyping of psychedelics in mental health research. A key challenge is the near-impossibility of "blinding" participants in psychedelic trials, since hallucinations make it obvious who received the drug — a flaw that distorts placebo comparisons. Researchers have coined the term "knowcebo effect" to describe how participants who know they received a placebo experience worse outcomes, artificially inflating the perceived effectiveness of psychedelics.

MIT Technology Review