You’re in the middle of drafting a brief summary using an AI tool. You paste in a lengthy contract, ask a few questions — and the response comes back truncated, confused, or oddly incomplete. Nothing broke. Your session is fine. What happened?
You likely hit a token limit.
Understanding how AI language models measure and consume context is one of the most practical things a legal professional can know. It helps you get better results, avoid frustrating dead ends, and make smarter decisions about how your firm uses these tools.
What Is a Token, Exactly?
AI language models don’t read text the way humans do. They process it in chunks called tokens — roughly four characters of text each, or about three-quarters of a word. A typical paragraph might be 100–150 tokens. A full legal brief could be tens of thousands.
Every AI model has a context window — a ceiling on how many tokens it can hold in a single conversation or request at one time. This window includes everything: your instructions, the documents you’ve pasted, the conversation history, and the model’s responses. When you reach that ceiling, the model either stops, begins ignoring earlier parts of the conversation, or produces lower-quality output.
Think of it like a whiteboard. Once it’s full, you have to erase something before you can write more.
Why This Matters for Legal Work
Legal work involves a lot of text. Contracts run long. Case files have history. Discovery documents pile up. That’s precisely why token limits hit lawyers and legal administrators harder than most other users.
Here are the situations where you’re most likely to run into problems:
Document review. Pasting a full 40-page agreement into a chat window and asking for a summary often exceeds what smaller models can handle in one pass. You may receive a partial response that reflects only the beginning of the document.
Long conversations. A session that started with a simple question and evolved into a deep-dive research thread will eventually push older context out of the model’s working memory. The AI may “forget” what you discussed earlier in the same chat — not because it’s broken, but because that earlier text is no longer in the active window.
Complex multi-part prompts. When you include detailed instructions, background context, multiple questions, and a large document all in one request, you’re consuming tokens fast. Less room remains for the response.
Iterative drafting. If you’re refining a document through back-and-forth revisions within the same session, the accumulated conversation history consumes tokens with every exchange.
Practical Tips for Getting More Out of Every Session
You don’t need to become a technical expert. A few simple habits will get you significantly better results.
Start fresh for new tasks. Don’t carry unrelated conversations across long sessions. Each new, focused task benefits from a clean context window. Open a new chat when you’re switching from one matter to another.
Summarize before you paste. If you need the AI to work with a long document, consider providing a condensed version — key clauses, relevant sections — rather than the full text. For many tasks, the model doesn’t need every word to give you something useful.
Break large documents into sections. Instead of asking the AI to analyze an entire agreement at once, work through it in logical parts. Ask about the indemnification clause. Then the limitation of liability. Then the termination provisions. Focused inputs produce focused outputs.
Front-load your most important instructions. Place your key context and instructions at the start of your prompt. If you’re running close to a limit, the model will handle the beginning of your input more reliably than the end.
Be explicit about what you need. Vague, open-ended prompts often produce longer, less useful responses that consume more tokens. Specific questions get more targeted answers — and leave more room in the context window for what matters.
Use the tool for one job at a time. Asking five questions in a single prompt splits the model’s attention and inflates the response length. Ask one question, get a clear answer, then ask the next.
Watch for drift in long sessions. If a conversation has been going for a while and the AI seems to be losing track of earlier context, that’s your signal. Either summarize the thread yourself and paste it in as a fresh starting point, or begin a new session with a clean summary of where you are.
A Note on AI Governance at Your Firm
Token limits are just one piece of the puzzle when it comes to using AI responsibly in a legal environment. Data confidentiality, approved tool selection, staff training, and acceptable use policies all matter — especially when client information is involved.
Before pasting sensitive client documents into any AI tool, make sure your firm has clear policies in place about what’s permitted, what isn’t, and how AI outputs should be reviewed before use. This isn’t about slowing adoption — it’s about protecting your clients and your firm.
If your firm doesn’t yet have a formal AI policy, that’s worth addressing soon. AI adoption in the legal sector is accelerating, and the firms that establish governance early will be better positioned to use these tools safely and effectively.
The Bottom Line
AI chatbots are genuinely useful for legal work. But like any tool, they work best when you understand how they operate. Token limits aren’t a flaw — they’re simply a characteristic of how these models function. Once you know what to watch for, you can structure your work to stay inside those boundaries and get consistently better results.
A little awareness goes a long way.
Innovative Computing Systems has supported law firms with technology strategy, security, and IT management for more than 35 years. Our Managed Intelligence service helps firms adopt AI in a structured, secure, and practical way — including policy development, tool governance, and staff training. Learn more by contacting us today.
