Token counter for URLs

Paste any URL to count the tokens in the page. See the raw HTML cost, the Markdown version, and how much an AI agent saves when a site is Markdown-ready.

Try:

How to count the tokens in a URL

  1. Paste the URL into the box above and press Count tokens.
  2. Keep fetches the page twice. Once with Accept: text/html to see what an LLM crawling the raw site pays for. Once with Accept: text/markdown to see whether the site serves Markdown natively.
  3. Compare the two columns. The headline number is how many tokens go away when the same content is served as Markdown instead of HTML.

Why count tokens at the URL level

An agent fetching a webpage pays for every byte the server returns. Nav menus, inlined CSS, tracking scripts, SVG icons, social widgets. None of it is useful for answering a question, all of it counts towards the context window. Measuring tokens on the raw HTML is the closest thing there is to a real bill. Measuring tokens on the Markdown version is the closest thing there is to what the same page should cost.

Markdown for Agents, in one paragraph

Markdown for Agents is a set of four conventions that let a website serve clean Markdown to AI agents. A site can honour an Accept: text/markdown header, publish a .md variant at the same path, add a link rel="alternate" type="text/markdown" tag to the HTML head, or send a Link response header pointing to the Markdown version. A site that implements any of the four shows up in this tool with a green Served natively badge. A site that does not gets an amber Simulated badge and the token saving you would see if it did.

Which tokenizer the tool uses

OpenAI's o200k_base encoding, which is shared across every current OpenAI model including GPT-4o, the o-series reasoning models, and the GPT-5 family. The tokenizer runs locally on the server and matches the OpenAI Tokenizer playground to within a token or two on any page.

Claude and Gemini token counts

Anthropic and Google do not ship a JavaScript tokenizer that runs on Cloudflare Workers, so the tool does not give you a precise Claude or Gemini number yet. The OpenAI count is a solid proxy because the major tokenizers fall within about 10 to 15 percent of each other on English prose. When a portable tokenizer is available for Claude or Gemini the tool will show those columns too.

Frequently asked questions

How do I count the tokens in a webpage?

Paste the URL into the box above and press Count tokens. Keep fetches the page server-side, counts tokens on the raw HTML, and counts tokens again on the Markdown version so you can see the difference. Nothing is uploaded from your machine.

Which tokenizer is this using?

The o200k_base encoding that OpenAI ships with every current model, from GPT-4o through the GPT-5 family and the o-series reasoning models. It runs locally on the server and is the same tokenizer OpenAI publishes. The numbers should match the OpenAI Tokenizer playground to within a token or two on any page.

Does the number apply to Claude and Gemini too?

Close, but not exact. Anthropic and Google do not ship pure-JavaScript tokenizers that run on Cloudflare Workers, so the tool cannot give you a precise Claude or Gemini count. The number it does give you is a solid proxy because the major tokenizers fall within roughly 10 to 15 percent of each other on English prose. Model-specific counts will land once portable tokenizers are available.

What is the Markdown column showing me?

Two things happen at once. The tool sends a second request to the same URL with Accept: text/markdown. If the site honours that header and returns real Markdown, the column is green and you see the actual tokens agents get today. If the site returns HTML anyway, the column is amber and the number is the Markdown version Keep would extract from the HTML. That is the hypothetical saving if the site added Markdown support.

Why is Markdown so much smaller than HTML?

HTML carries the page chrome along with the article. Scripts, style tags, nav menus, cookie banners, tracking pixels, SVG icons, inlined social share widgets. None of that is useful to an AI agent. Markdown drops all of it and keeps just the headings, paragraphs, links, and lists. On most content pages that is an 80 to 95 percent reduction.

Does the site I check know I ran this?

It sees a normal server-side fetch with a regular browser user agent. Two requests per check, one for HTML, one for Markdown. Nothing identifies the request as coming from Keep. The URL you paste is not stored or tied to an account.

Why does the HTML column look bigger than the page feels?

A page that looks short in your browser often has 50 to 100 kilobytes of inlined React, Tailwind, analytics, and CSS keyframes. The tool counts every byte the server returned. If the site is entirely client-rendered, the initial HTML may even be an empty shell and the real content arrives later via JavaScript, in which case the HTML tokens are mostly scripts.

What counts as a token?

A token is roughly three to four characters of English text. Short common words are often one token each. Rare words and code fragments can take three or more tokens. Whitespace, punctuation, and capitalisation all shift the count. The tool uses the official BPE tokenizer so the number matches what the model actually charges.

Can I count tokens in raw text instead of a URL?

The OpenAI Tokenizer playground is the right tool for pasted text. This page specialises in the URL case because that is where the HTML-vs-Markdown comparison matters. If you already have the Markdown and just want a count, paste it into any tokenizer and the number will match.

What is Markdown for Agents?

A set of conventions that let a site serve clean Markdown to AI agents instead of HTML. The four mechanisms are: honouring Accept: text/markdown on the request, publishing a .md variant at the same path, declaring <link rel="alternate" type="text/markdown"> in the HTML head, and adding a Link header pointing to the Markdown version. A site that implements any of the four shows up with a green Served natively badge in this tool.

Is there a Keep API for token counting?

Keep is the upstream product. Every page you bookmark gets stored as clean Markdown in a searchable library, which means you get the token-efficient version without having to check each site yourself. This tool is the public lookup for one URL at a time.

How long a page will it handle?

The tool truncates the Markdown at roughly 10,000 characters for the preview and savings math. Very long articles are handled fine, they just show a truncated marker in the preview. The token numbers are computed on the truncated Markdown so the savings percentage is honest even when the underlying page is huge.

Save any page as clean Markdown with Keep

Keep turns every URL you bookmark into clean, searchable Markdown in a persistent library you can query or hand to an AI agent. If this tool is useful for checking one page, Keep is the version that does it for every page you save, automatically, without you having to think about it.

Learn more about Keep

Other tools

Similar popular tools for cleaning, formatting, and working with content.