arXiv to JSON Converter
Convert any arXiv paper to structured JSON. Full document AST with one arXiv ID.
Free — just enter an arXiv ID. Plus subscription required for JSON export.
Free account required. See pricing for high-volume use.
Built for your workflow
arXiv to JSON conversion that actually works for academic papers.
LLM Pipelines
Build RAG systems over arXiv - query specific sections, equations, or citations
Paper Analysis
Extract all equations, count citations, analyze document structure programmatically
Research Tools
Build tools that understand paper structure - literature reviews, summarizers, etc.
Data Collection
Collect structured paper data for ML training or research datasets
Semantic parsing
We understand LaTeX structure, not just text. That's why our output preserves what matters.
Semantic Parsing
Understands LaTeX structure - sections, equations, theorems, figures, tables as semantic elements
Cross-Reference Resolution
Automatically resolves \ref, \cite, and other cross-references to readable formats
Macro Expansion
Expands custom macros and commands so the output is self-contained
Bibliography Support
Includes formatted references with proper numbering and citation links
Instant Conversion
Just enter the arXiv ID - we fetch and convert the source automatically
Type-Safe Schema
Every element has a type field - section, equation, figure, table, citation, etc.
See what you get
Real output from converting the “Attention Is All You Need” paper.
{
"by": "sciencestack.ai",
"arxivId": "1706.03762",
"title": "Attention Is All You Need",
"abstract": "The dominant sequence transduction models...",
"authors": ["Ashish Vaswani", "Noam Shazeer", "..."],
"document": [
{
"type": "section",
"title": "Introduction",
"content": [
"Recurrent neural networks, long short-term memory \\cite{bib:1}...",
{
"type": "figure",
"src": "assets/transformer.png",
"caption": "The Transformer model architecture."
}
]
},
{
"type": "section",
"title": "Attention",
"content": [
"An attention function maps a query and key-value pairs...",
{
"type": "equation",
"content": "\\text{Attention}(Q,K,V) = \\text{softmax}(\\frac{QK^T}{\\sqrt{d_k}})V"
}
]
}
],
"bibliography": [...]
}Equations, cross-references, and structure — all preserved.
How it works
Enter arXiv ID
Enter an arXiv ID (e.g., 2301.07041 or 2301.07041v2)
Process
We parse the LaTeX semantically — understanding sections, equations, and references
Download
Structured JSON with document hierarchy, equations as LaTeX strings, and rich metadata
Simple pricing
For arXiv conversions
- Full LaTeX parsing
- Equation preservation
- Cross-reference resolution
- Bibliography included
- Structured document tree
JSON export requires a Plus subscription.