Free, no authentication required

Smarkdown API

REST API for converting documents to clean Markdown. Same engine that powers smarkdown.xyz, available to your scripts, CI pipelines, and AI agents.

Quick start

Send a file to /api/convert. Get Markdown back. That is the whole API.

curl -X POST https://api.smarkdown.xyz/api/convert \
  -F "file=@document.pdf" \
  -F "clean_mode=basic"

No API key, no signup, no SDK to install. Just HTTP.

Base URL

https://api.smarkdown.xyz

Authentication

None. The API is free and unauthenticated.

Rate limited per source IP (default: 30 requests per minute) to keep things fair. Get in touch if your use case needs higher limits.

Endpoints

GET /health

Liveness check. Returns 200 OK with a JSON body if the service is up.

curl https://api.smarkdown.xyz/health

GET /api/formats

List all supported file formats with metadata (extension, MIME type, processing notes).

curl https://api.smarkdown.xyz/api/formats

POST /api/convert

Convert a single file. Returns JSON with the Markdown content and metadata.

Form fields:

  • file (required): the file to convert
  • clean_mode (optional): none, basic (default), or full
curl -X POST https://api.smarkdown.xyz/api/convert \
  -F "file=@report.xlsx" \
  -F "clean_mode=full"

POST /api/convert/raw

Same as /api/convert but returns plain Markdown text in the response body (no JSON wrapper). Convenient for piping straight into a file.

curl -X POST https://api.smarkdown.xyz/api/convert/raw \
  -F "file=@notes.docx" \
  -o notes.md

POST /api/convert/batch

Convert multiple files in a single request. Up to 50 files per batch.

curl -X POST https://api.smarkdown.xyz/api/convert/batch \
  -F "files=@doc1.pdf" \
  -F "files=@doc2.docx" \
  -F "files=@doc3.xlsx"

POST /api/compress

Compress a PDF (synchronous, files up to 50MB). Returns the compressed PDF binary.

curl -X POST https://api.smarkdown.xyz/api/compress \
  -F "file=@huge.pdf" \
  -F "preset=web-balanced" \
  -o compressed.pdf

GET /api/compress/presets

List the compression presets and their tradeoffs (quality vs size).

GET /api/system/ocr-status

OCR resource pool status. Useful for batch image-OCR consumers to check capacity before submitting many jobs.

Clean modes

Spreadsheet and CSV conversions accept a clean_mode parameter:

  • none: Raw MarkItDown output. No post-processing.
  • basic (default): Strip NaN markers, add sheet separators between Excel sheets.
  • full: All basic cleanup plus header detection (handles spreadsheets where company name is in row 0), section header recognition (ALL CAPS rows become Markdown headings), acronym preservation (YTD, AP, AR, GL stay in title case), and "Unnamed: X" column repair.

Limits

  • Maximum file size: 256MB
  • Batch size: 50 files
  • Rate limit: 30 requests per minute per IP
  • PDF compression: 50MB max (sync endpoint)

Code examples

Python

import requests

with open('report.xlsx', 'rb') as f:
    response = requests.post(
        'https://api.smarkdown.xyz/api/convert',
        files={'file': f},
        data={'clean_mode': 'full'},
    )

result = response.json()
print(result['markdown'])

Node.js (fetch)

import fs from 'fs';

const formData = new FormData();
formData.append('file', new Blob([fs.readFileSync('report.xlsx')]), 'report.xlsx');
formData.append('clean_mode', 'full');

const response = await fetch('https://api.smarkdown.xyz/api/convert', {
  method: 'POST',
  body: formData,
});

const result = await response.json();
console.log(result.markdown);

Shell (one-liner)

curl -sf -X POST https://api.smarkdown.xyz/api/convert/raw \
  -F "file=@report.xlsx" \
  -F "clean_mode=full" \
  > report.md

Use cases

  • RAG pipelines: convert source documents to Markdown before chunking and embedding. Cleaner inputs produce better retrieval.
  • AI agents: give your agent the ability to "read" any uploaded file format by converting to Markdown on demand.
  • CI / build automation: convert documentation source files (Word, PDF) to Markdown as part of a static-site generator build.
  • ETL for analytics: extract text from a heap of legacy formats into a uniform Markdown corpus for full-text search or NLP.
  • Personal scripts: pipe shell output and downloaded files through the API to get clean text wherever you need it.

Privacy and retention

Files are processed in memory and deleted immediately after the response is sent. We do not log file contents. We do not retain copies. See the Privacy Policy for the full picture.

Source

The API is built on Microsoft's MarkItDown library, with a custom pipeline for spreadsheet cleanup, OCR (Tesseract), and PDF compression (Ghostscript). Source is on GitHub.

Support

Email hello@smarkdown.xyz or use the contact form. Bug reports and feature requests welcome via the GitHub issue tracker.