Guozhen AIGlobal AI field notes and model intelligence

Realtime AI News

Realtime AI News

中文
LLM AgentMemoryTrustworthinessarXiv

TRUSTMEM: Learning Trustworthy Memory Consolidation for LLM Agents with Long-Term Memory

A new arXiv paper introduces the TRUSTMEM framework to address error accumulation and hallucination persistence in LLM agent long-term memory caused by generated write, revise, and delete operations.

LLMMulti-AgentEducationFinancial Literacy

Agentic Knowledge Tracing: A Multi-Agent LLM Architecture for Stealth Assessment of Financial Literacy in Serious Games

Researchers propose the Agentic BKT pipeline, a multi-agent LLM architecture that stealthily assesses financial competencies from open-ended gameplay events without disrupting the learning experience.

AI ResearchReinforcement LearningEnergy

Supervised Reinforcement Learning Tackles Distributed Energy Resource Coordination

Researchers propose a supervised reinforcement learning approach for coordinating distributed energy resources (DERs), achieving more efficient energy management under the uncertainty and complexity that challenge traditional optimization methods.

AI ResearchEdge AINeural Architecture Search

On-Device Neural Architecture Search Enables Edge Devices to Design Their Own Networks

Researchers propose a novel approach that performs lightweight neural architecture search directly on deployment devices, allowing sensor edge devices to redesign tiny neural networks optimized for real-time data.

AI ResearchFinanceBenchmark

MacroLens Benchmark Released: Multi-Task Financial Reasoning Under Macroeconomic Scenarios

Researchers release MacroLens, a multi-task benchmark designed for contextual financial reasoning under macroeconomic scenarios, addressing key challenges like data leakage and reporting lags in time-series evaluation.

AI ResearchLanguage ModelsInterpretability

Study Reveals 'Readout Blind Spot' in Looped Language Models: Dense Supervision Misses Hidden State Variables

A new study shows that dense per-loop cross-entropy loss in looped language models only controls variables exposed by the readout, not all hidden-state variables active in the recurrent transition, creating a systematic supervision blind spot.

AI ResearchAI for ScienceQuantum Computing

Human-AI Collaboration Discovers Quantum Algorithms: From Vague Intuition to Mathematical Discovery

A new paper documents how human-AI co-discovery transformed a vague research intuition into concrete sign-embedding quantum algorithms for matrix equations and matrix functions, showing a new paradigm for AI-assisted mathematics.

arXivAgentsEvaluation

AgentOdyssey: A New Framework for Evaluating Test-Time Continual Learning in AI Agents

AgentOdyssey procedurally generates open-ended text games to benchmark agents on exploration, knowledge acquisition, memory retention, and long-horizon planning.

OpenAIAgentResearch

How agents are transforming work

OpenAI publishes a new research paper examining how AI agents are transforming work by handling longer, more complex tasks and expanding productivity across roles.

ResearchARCReasoning

DiARC Paper: Distinguishing Positive and Negative Samples Improves LLM Reasoning on ARC Tasks

A new arXiv paper introduces DiARC, a method that improves large language models' performance on the Abstraction and Reasoning Corpus (ARC) by distinguishing positive and negative samples.

CerebrasStockAI ChipsEarnings

Cerebras Stock Plunges After First Earnings Since IPO as CEO Says Margin Outlook Misunderstood

AI chipmaker Cerebras saw its stock plummet after its first earnings report since going public, with a narrower gross margin forecast spooking investors.

Enterprise AIIndustry

Companies scramble to stop employees from burning through AI budgets with small tasks

TechCrunch reports that companies are rushing to stop employees from exhausting AI budgets on low-value small tasks, marking a shift from the 'tokenmaxxing' era to an era of 'token rationing'.

GoogleResearchReasoningLLM

Thinking to recall: How reasoning unlocks parametric knowledge in LLMs

Google Research explores how the reasoning process activates and retrieves parametric knowledge stored within large language models.

Daily Briefs