Build2026
RefBib
Drop a PDF, get .bib — extract real BibTeX from academic references with zero AI hallucinations, verified by CrossRef, Semantic Scholar, and DBLP
Academic ToolsPythonNext.jsGROBID

Overview
RefBib extracts references from academic PDFs and resolves them into standardized BibTeX entries — in one click. No LLMs, no hallucinated metadata. Every citation is verified against real academic databases: CrossRef, Semantic Scholar, DBLP, or GROBID's structured parsing.


Pipeline
- Upload — Drop up to 20 PDFs with sequential batch processing. Append more at any stage without losing results.
- Parse — GROBID extracts structured references from each PDF with multi-instance fallback.
- Resolve — Waterfall resolution: CrossRef → Semantic Scholar → DBLP → GROBID fallback. Each entry gets a match status (Matched / Fuzzy / Unmatched).
- Deduplicate — DOI, fingerprint, and bigram similarity detect duplicates across papers. Conflicts enter a resolution queue.
- Export — Download deduplicated
.bibor occurrence-preserving BibTeX. Copy-to-clipboard for quick use.


Key Features
- Workspace — Local-first reference management with automatic deduplication, conflict resolution, search, and status filtering.
- Analytics — Citation year distribution, venue breakdown, match quality donut chart, and most-cited references ranking.
- Manual Override — BibTeX editor for individual entries, DOI resolution for unmatched refs, Google Scholar links for every reference.
- Light/Dark Theme — Full theme support across all views.
- Self-Hostable — One-command setup (
./start.sh), optional password protection, configurable GROBID instances.
Tech Stack
- Frontend — Next.js (App Router), shadcn/ui, Tailwind CSS, Recharts
- Backend — Python, FastAPI, httpx, lxml
- PDF Processing — GROBID with TEI XML output (local Docker or remote instances)
- Data Sources — CrossRef, Semantic Scholar, DBLP
- Deployment — Fly.io (backend) + Vercel (frontend)
My Role
Everything — parsing pipeline, waterfall resolution strategy, scoring heuristics, workspace UX, analytics visualizations, and deployment infrastructure. Built entirely through Claude Code.