RAG Initiative

Knowledge-Oriented Retrieval-Augmented Generation

We move from competition-winning systems to reusable frameworks, datasets, and surveys that improve robustness, efficiency, and safety of Retrieval-Augmented Generation pipelines.

5+ Active RAG Tracks
Meta KDD '24 🥈 Competition Finish
ACL & CIKM 2025 Acceptances
900★ Agent-R1 Community

Flagship Projects

Systems, Benchmarks, and Knowledge Resources

Each project pushes Retrieval-Augmented Generation forward—from competition-grade systems to public benchmarks and surveys.

KDD Cup CRAG 2024 · Silver Medal

Second-place solution for Meta's CRAG challenge, delivering reliable multi-hop retrieval and controlled generation.

Competition Repo →

PruningRAG · CIKM 2025

Plug-and-play framework with multi-source pruning strategies that filter noisy knowledge before generation; accepted to CIKM 2025.

Paper & Framework →

HoH Benchmark · ACL 2025

Measures how outdated information hurts RAG reliability and now part of the ACL 2025 Main Conference program.

ArXiv →

Knowledge-Oriented Survey

Comprehensive taxonomy and evaluation review for RAG systems, covering mechanisms, applications, and open problems.

Survey Repo →

Agent-R1 · 900★

End-to-end reinforcement learning agent that couples retrieval, planning, and execution for complex tasks; community support now exceeds 900 GitHub stars.

Framework →

Timeline

Milestones in Retrieval-Augmented Generation

2024

Meta KDD Cup · CRAG Track

Multi-stage retrieval, reranking, and controllable decoding achieved a silver medal among 1,400+ teams.

Details →

2024

PruningRAG · CIKM 2025

Multi-granularity pruning policies for source selection, now an accepted paper at CIKM 2025, reducing hallucinations in deployed RAG agents.

Framework →

2025

HoH Benchmark · ACL 2025

First dataset to stress outdated knowledge, now accepted by the ACL 2025 Main Conference, quantifying freshness effects on downstream reasoning accuracy.

Paper →

2025

Knowledge-Oriented RAG Survey

Comprehensive taxonomy that catalogs retrieval strategies, evaluation lenses, and safety considerations for production RAG systems.

Survey Repo →

2025

Agent-R1 · 900★ Community

Open-source Agent-R1 crosses 900 stars as researchers adopt its retrieval-aware RL training loop for trustworthy autonomous agents.

Framework →

Resources

Toolchains, Data, and Community

Competition Toolkit

Modular retrievers, rerankers, and decoders extracted from the KDD Cup pipeline for rapid experimentation.

Pruning Policies

Multi-source heuristics that cull redundant passages before feeding them into the generator.

Temporal Benchmarks

HoH-style datasets that capture changing facts and measure model freshness.

Awesome RAG Papers

Curated reading list and taxonomy for anyone building RAG systems in production.

Why Retrieval Still Matters

We prioritize grounded reasoning by constraining large language models with curated, verifiable context. Our work spans pruning, benchmark design, and agent training to keep answers safe and up-to-date.

Responsible Systems

Multi-source pruning and audits reduce hallucinations and encourage evidence-backed responses.

Continuous Evaluation

Automated freshness tests and safety audits keep RAG deployments aligned with evolving knowledge and risk requirements.

Open Collaboration

Repositories, dataset releases, and surveys lower the barrier for the broader RAG community.