Smarkdown API Documentation - Free Document Conversion API

Quick start

Send a file to /api/convert. Get Markdown back. That is the whole API.

curl -X POST https://api.smarkdown.xyz/api/convert \
  -F "file=@document.pdf" \
  -F "clean_mode=basic"

No API key, no signup, no SDK to install. Just HTTP.

Base URL

https://api.smarkdown.xyz

Authentication

None. The API is free and unauthenticated.

Rate limited per source IP (default: 30 requests per minute) to keep things fair. Get in touch if your use case needs higher limits.

Endpoints

`GET /health`

Liveness check. Returns 200 OK with a JSON body if the service is up.

curl https://api.smarkdown.xyz/health

`GET /api/formats`

List all supported file formats with metadata (extension, MIME type, processing notes).

curl https://api.smarkdown.xyz/api/formats

`POST /api/convert`

Convert a single file. Returns JSON with the Markdown content and metadata.

Form fields:

file (required): the file to convert
clean_mode (optional): none, basic (default), or full

curl -X POST https://api.smarkdown.xyz/api/convert \
  -F "file=@report.xlsx" \
  -F "clean_mode=full"

`POST /api/convert/raw`

Same as /api/convert but returns plain Markdown text in the response body (no JSON wrapper). Convenient for piping straight into a file.

curl -X POST https://api.smarkdown.xyz/api/convert/raw \
  -F "file=@notes.docx" \
  -o notes.md

`POST /api/convert/batch`

Convert multiple files in a single request. Up to 50 files per batch.

curl -X POST https://api.smarkdown.xyz/api/convert/batch \
  -F "files=@doc1.pdf" \
  -F "files=@doc2.docx" \
  -F "files=@doc3.xlsx"

`POST /api/compress`

Compress a PDF (synchronous, files up to 50MB). Returns the compressed PDF binary.

curl -X POST https://api.smarkdown.xyz/api/compress \
  -F "file=@huge.pdf" \
  -F "preset=web-balanced" \
  -o compressed.pdf

`GET /api/compress/presets`

List the compression presets and their tradeoffs (quality vs size).

`GET /api/system/ocr-status`

OCR resource pool status. Useful for batch image-OCR consumers to check capacity before submitting many jobs.

Clean modes

Spreadsheet and CSV conversions accept a clean_mode parameter:

none: Raw MarkItDown output. No post-processing.
basic (default): Strip NaN markers, add sheet separators between Excel sheets.
full: All basic cleanup plus header detection (handles spreadsheets where company name is in row 0), section header recognition (ALL CAPS rows become Markdown headings), acronym preservation (YTD, AP, AR, GL stay in title case), and "Unnamed: X" column repair.

Limits

Maximum file size: 256MB
Batch size: 50 files
Rate limit: 30 requests per minute per IP
PDF compression: 50MB max (sync endpoint)

Code examples

Python

import requests

with open('report.xlsx', 'rb') as f:
    response = requests.post(
        'https://api.smarkdown.xyz/api/convert',
        files={'file': f},
        data={'clean_mode': 'full'},
    )

result = response.json()
print(result['markdown'])

Node.js (fetch)

import fs from 'fs';

const formData = new FormData();
formData.append('file', new Blob([fs.readFileSync('report.xlsx')]), 'report.xlsx');
formData.append('clean_mode', 'full');

const response = await fetch('https://api.smarkdown.xyz/api/convert', {
  method: 'POST',
  body: formData,
});

const result = await response.json();
console.log(result.markdown);

Shell (one-liner)

curl -sf -X POST https://api.smarkdown.xyz/api/convert/raw \
  -F "file=@report.xlsx" \
  -F "clean_mode=full" \
  > report.md

Use cases

RAG pipelines: convert source documents to Markdown before chunking and embedding. Cleaner inputs produce better retrieval.
AI agents: give your agent the ability to "read" any uploaded file format by converting to Markdown on demand.
CI / build automation: convert documentation source files (Word, PDF) to Markdown as part of a static-site generator build.
ETL for analytics: extract text from a heap of legacy formats into a uniform Markdown corpus for full-text search or NLP.
Personal scripts: pipe shell output and downloaded files through the API to get clean text wherever you need it.

Privacy and retention

Files are processed in memory and deleted immediately after the response is sent. We do not log file contents. We do not retain copies. See the Privacy Policy for the full picture.

Source

The API is built on Microsoft's MarkItDown library, with a custom pipeline for spreadsheet cleanup, OCR (Tesseract), and PDF compression (Ghostscript). Source is on GitHub.

Support

Email hello@smarkdown.xyz or use the contact form. Bug reports and feature requests welcome via the GitHub issue tracker.