About Verbit
Hi, I'm Shinjan — 1st Year IPM student at IIM Indore. I've always guided aspirants to use carefully curated prompts to leverage the full power of LLMs for verbal ability practice — and it genuinely works well. But the process is inefficient: crafting the right prompt every time, copy-pasting outputs, manually tracking what you've already done. A lot of time gets wasted on logistics instead of actual practice. So I decided to automate the entire workflow into a single platform — one that generates unlimited fresh questions, adapts to your skill level in real time, and costs you absolutely nothing. That's Verbit.
Why It's Free (and will stay free)
This platform is 100% automated — every question, every evaluation, every difficulty adjustment is done by AI. There is no human curation, no editorial board, no team of content writers. Because of that, I don't want personal accountability for the quality of every single output. That's why it's free. I am not going to be another one of those “cracked IPMAT, now let me monetize my rank” type mentors. I have no interest in building a coaching brand off this.
The AI credits (OpenAI API calls) that power every question generation and evaluation are paid out of my own pocket, and I'm fine bearing that cost for as long as I can. When I eventually run out of credits, I'll put up a small donation link — every rupee collected will go directly toward purchasing more API credits so the platform keeps running. No profit, no middlemen.
Daily Limits on RC & Conversation Sets
Reading Comprehension and Conversation Sets are the most token-heavy features on the platform — each set involves generating a full passage plus 6 questions with explanations. To keep costs sustainable, each user is limited to 3 sets per day for each of these two topics. The limit resets at 12:00 AM IST.
Honestly, 3 sets a day is a lot more than most people will bother doing in a single sitting. If you're consistently hitting the cap, you're already putting in serious work.
There Will Be Errors
Let me be upfront: there will be mistakes. AI-generated questions are not perfect. You'll encounter questions with ambiguous options, debatable answers, or occasional factual slips. That's the nature of a fully automated system. But I've built a whole ML pipeline to deal with it.
When you hit the “Report bad question” button and describe what's wrong, the system doesn't just blindly remove the question. It sends your report and the full question to another AI evaluator that independently assesses whether the question is actually flawed. If your report is valid — say the correct answer is wrong, there are multiple correct options, or the passage contradicts the question — the question gets flagged and permanently removed from the database. The AI's analysis of what went wrong then gets stored as an instruction for subsequent question generation, so the same type of mistake is less likely to happen again. If the AI determines the question is actually fine and your report isn't valid, the question is retained. It's a self-correcting feedback loop.
How VerScore Works
Your VerScore is a per-topic adaptive rating on a 0–100 scale. Under the hood, it's mapped to a percentile using a new anchor-based scale:
VerScore → Percentile anchors:
- 0 → 50th percentile
- 50 → 90th percentile
- 65 → 95th percentile
- 75 → 98th percentile
- 85 → 99th percentile
- 95 → 99.8th percentile
- 100 → 100th percentile
Mapping is piecewise linear between anchors.
A VerScore of 0 maps to the 50th percentile (average), 50 to the 90th, 65 to the 95th, 75 to the 98th, 85 to the 99th, 95 to the 99.8th, and 100 to the 100th percentile. This new mapping more closely matches real exam percentiles and makes progress at the top end much harder.
Adaptive Difficulty (Elo-inspired)
After every question, your VerScore is updated using an Elo-like system. Your current score and the question's difficulty are both converted to percentiles, and an expected probability of success is computed:
E = 1 / (1 + 10^((Q_percentile − U_percentile) / 10))
The delta is then K × gapScale × (actual − expected), where K = 4.5, and gapScale amplifies updates when the gap between your level and the question's level is large. A speed factor (ratio of ideal time to your actual time, clamped between 0.6 and 1.4) further adjusts the update — solving faster than expected boosts you more, solving slower dampens the gain.
Calibration Phase
When you first start a topic, Verbit doesn't know your level. Instead of starting you at zero and making you grind through easy questions, it runs a calibration phase — a fixed sequence of questions at predetermined difficulty levels spanning the full range.
For most topics, that's 10 questions at difficulties [10, 20, 30, …, 100]. For RC and Conversation Sets, it's 3 sets at [30, 60, 90] (because each set is itself 4–5 questions, so you're still answering 12–15 questions total).
Your initial VerScore is computed using a blend of difficulty-weighted accuracy (60%) and raw accuracy (40%), with speed adjustments applied. Getting hard questions right yields a higher initial score than getting only easy ones right.
Dynamic Question Difficulty (IRT-Bayesian)
Questions aren't static either. Every question in the database has its own difficulty rating that evolves over time based on how users perform on it. This uses an Item Response Theory (IRT)-inspired Bayesian update:
P(success) = 1 / (1 + exp(−(θ − b) / s))
surprise = actual − P(success)
new_difficulty = old − learningRate × surprise × speedFactor
Here θ is the solver's VerScore, b is the question's current difficulty, and s is a scale parameter. If strong users consistently get a question wrong, its difficulty drifts upward. If weak users consistently get it right, it drifts down. The learning rate decays with √(attemptCount), so well-tested questions stabilise over time.
RAG Pipeline (Retrieval-Augmented Generation)
Questions aren't generated from thin air. The 274 official IPMAT Indore Verbal Ability PYQs used to train Verbit's RAG model were sourced from AfterBoards, who generously provided them for this project. These PYQs are not distributed, but serve as reference for generating new, exam-style practice. I scanned and extracted these previous year question papers and stored them as reference documents in MongoDB. When generating a new question, the system retrieves relevant past questions as few-shot examples and feeds them to the LLM alongside detailed topic-specific prompts. This grounds the output in real exam patterns — the sentence structures, option styles, and difficulty curves all mirror actual IPMAT/CAT verbal sections.
Deduplication
Nobody wants to see the same vocabulary word or idiom twice. The system tracks every word/idiom you've already been tested on and actively avoids repeating them — both when sampling from the existing question pool and when generating new ones. For RC and Conversation Sets, it deduplicates at the passage level, tracking passage titles/themes to ensure you get diverse reading material instead of seeing the same domain repeated.
Topic Coverage
Verbit covers 8 verbal aptitude topics modelled after the IPMAT and CAT exam patterns:
- Reading Comprehension Sets — full passage + 4–5 MCQs
- Conversation Sets — dialogue-based passage + questions
- Parajumbles — rearrange jumbled sentences
- Vocabulary Usage — contextual word usage
- Paracompletions — complete a paragraph
- Sentence Completions — fill-in-the-blank
- Sentence Correction — identify and fix errors
- Idioms & Phrases — meaning and usage
Each topic has its own detailed prompt engineered to match the exact format observed in real PYQ papers, with 20+ diverse domain examples to ensure variety.
Tech Stack
- Next.js (App Router, TypeScript)
- MongoDB Atlas
- OpenAI GPT-4o-mini
- NextAuth (Google OAuth + credentials)
- Tailwind CSS
- Deployed on Vercel
Built with a lot of caffeine and a little bit of spite for overpriced coaching. If you find a bug, hit the report button. If you like it, tell a friend.