- Published on
Building Resume Compact: My First AI Resume Screening System
- Authors
- Name
- Niveth Saran
- @nivethsaran
- Name
- Anjali Thomas
Building Resume Compact: My First AI Resume Screening System
I built this during a weekend hackathon with Anjali Thomas (https://github.com/walkingacorn) because I wanted to see how far a local LLM could get with a real PDF resume.
The core idea is annoying in practice: make a PDF readable, send the text to a model, and parse back HR-friendly JSON. In reality the code ended up spending more lines on extraction, parsing, and validation than on the model call itself.
Why I built it
I needed a way to test a resume screening workflow without depending on OpenAI or a cloud API. This was the experiment: could I wire a Streamlit front end, a FastAPI backend, and a local vLLM server together so that a resume upload actually produced something useful?
The actual bug that woke me up at 2 a.m. was a PDF that returned zero text from fitz, even though it looked fine in Preview. That forced the OCR branch and exposed the first messy part of the pipeline.
The initial idea
Originally I sketched this as a small app with a job page, upload button, and a deterministic model call.
I figured that the plumbing would be simple.
- receive a PDF upload in FastAPI
- try direct PDF text extraction first
- fall back to OCR only when necessary
- send resume text plus job description to the model
- parse a structured JSON object and store it
That’s still the end state, but the implementation has a few rough edges.
The first major reality check was the data surface: resumes contain arbitrary text, the extracted text can be garbage, and the LLM can still wrap the answer in markdown or extra prose. So the backend became defensive by accident.
Project architecture
The repo ships as two rough pieces.
- Frontend: Streamlit app in
frontend/ - Backend: FastAPI service in
main.py - Persistence: Supabase tables for
jobsandapplications - Model inference: local OpenAI-compatible endpoint served by
vLLM - OCR / PDF handling:
PyMuPDF,pdf2image,pytesseract
I chose Streamlit because I needed the UI done fast and I did not want to implement an SPA with auth flows. It is not pretty, but it is enough to exercise the workflow.
The backend is just one service with one main endpoint.
/analyze_resume: accepts a PDF resume and a job description/health: basic health check
The actual workflow is: upload resume, extract text, build a prompt, call the model, parse the response, and store it.
flowchart LR
subgraph frontend
A[Streamlit login UI]
A --> B[Applicant dashboard]
A --> C[Recruiter dashboard]
end
B --> D[/analyze_resume POST/]
D --> E[PDF text extraction]
E --> F[LLM prompt build]
F --> G[vLLM OpenAI-compatible API]
G --> H[JSON parsing + Pydantic validation]
H --> I[Supabase application insert]
C --> J[Jobs CRUD via Supabase]
C --> K[Applications query via Supabase]
Frontend stack
The Streamlit app is split into three pieces:
frontend/app.py: login flow and role-based dashboard routingfrontend/applicant.py: resume upload and apply flowfrontend/recruiter.py: job posting and application review
It uses a simple Supabase client in frontend/utils/supabase_client.py to fetch jobs and persist application records.
The frontend also contains a mocked Kafka publisher in frontend/utils/api.py, which is a signal that the architecture was intended to be event-driven but the integration was deferred.
Backend stack
The backend uses FastAPI and several document-processing libraries:
fitzfromPyMuPDFfor rapid text extraction from PDF objectspdf2imageto render pages as images when direct extraction failspytesseractfor OCR fallbacklangchainStructuredPrompt for prompt constructionpydanticfor schema validation
The model is not accessed as a Python library. Instead, the app calls an HTTP endpoint at http://localhost:8000/v1/chat/completions.
That was also one of the dumbest practical choices: I could have used the model directly in Python, but I wanted to keep the backend decoupled enough to swap the model server later.
Deployment and inference
There is no managed LLM in this repo. cloudsetup.md is basically the ops runbook for the local vLLM server.
The actual startup steps are:
- SSH into the droplet
- attach to the container
- run
python -m vllm.entrypoints.openai.api_server --model Qwen/Qwen2.5-1.5B-Instruct --host 0.0.0.0 --port 8000 --dtype float16 - occasionally, run the same with
meta-llama/Llama-3.2-3B-Instruct
That means the app is only useful if the inference service is up and the GPU container is healthy. For a hackathon demo, that is fine. For anything else, it is more deployment work than app work.
How the AI system works
The messy part is the prompt pipeline in main.py.
Resume text extraction first
extract_text_from_pdf is the part that made the demo stop working on actual PDFs.
It tries two strategies:
fitz.open(stream=pdf_bytes, filetype='pdf')pdf2image.convert_from_bytes+pytesseract.image_to_string
This matters because a lot of resumes are either scanned or generated from a tool that strips text. I hit a case where fitz returned an empty string for a 12-page resume. The app did not fail elegantly; it just dropped the text and the first few runs were worthless.
So the code does this:
- use direct extraction if any page has text
- otherwise render images and OCR them
It logs both branches, which is why there is a call like:
logger.info("Direct PDF extraction failed or produced no text. Attempting OCR-based extraction...")
That logging saved me from chasing the wrong bug for a while.
Sanitizing untrusted input
I added sanitize_input_text because one of the sample resumes had what looked like prompt injection junk in the candidate’s cover letter.
It is not a robust defense. It strips lines matching regexes like:
ignore all previous instructionsdo not followsystem promptprompt injection
That is a cheap fix. It still leaves a lot of attack surface, but it was enough to stop obvious garbage from reaching the model in the early demo.
Structured prompt engineering
The prompt gets built with StructuredPrompt.from_messages_and_schema.
Instead of hand-writing the JSON shape, I serialize the Pydantic schema from ResumeOutput and include a template example. The model prompt basically says:
- return one JSON object
- no markdown
- no code fences
- match the schema exactly
That was the main guardrail for the LLM.
Sending to the LLM
The API call is a plain HTTP POST to the local vLLM server.
The payload is locked down:
temperature: 0top_p: 1seed: 42
I wanted the outputs to be repeatable while the rest of the pipeline was still fragile.
The backend also writes the raw response to llm_response_*.json. That was not theoretical — I used those files to debug the first time the model returned Here is the JSON: followed by a paragraph, and then the object.
Parsing and validating the model output
parse_json_from_text exists because choices[0].message.content is not always a clean JSON string.
The code tries to:
- strip markdown fences
- remove a leading
jsonlabel - pull the first
{...}block out of the text
I saw results like:
Here is the JSON you asked for:
{"full_name":"Alice"...}
so this recovery logic was necessary.
After that, the result gets validated with:
validated = ResumeOutput.model_validate(structured)
That is the core contract: if the model response does not match the schema, the endpoint fails rather than returning garbage to the frontend.
The end-to-end flow
The applicant UI is basically a resume uploader and a score display.
The recruiter UI is a crude admin panel for jobs and applications. It is enough to see the structured output, but it is not a real product.
The demo path is:
- upload a PDF
- send it to
/analyze_resume - write the structured data back to Supabase
- render the recommendation and score in Streamlit
That is the actual value proposition, even if the UI is rough.
Challenges and mistakes
This code has a few honest scraps in it.
1. PDF extraction is the most fickle part
I assumed resumes would arrive as text PDFs. That was wrong.
The first real failure mode was a resume that had fitz return an empty string for every page. The app still accepted the file, then the model got nothing useful and the result was garbage.
That is why the repo has a fallback to OCR.
The downside is that this adds real dependency pain:
- you need
tesseractinstalled - you need
popplerforpdf2image - OCR output is noisy and sometimes worse than no output
That bug taught me the model is the easy part if the input is broken.
2. Prompt injection defense is a bandaid
I added sanitize_input_text after seeing sample resume text that looked like a poor prompt injection attempt.
The function strips lines matching regexes like ignore all previous instructions and system prompt. It also turns ``` into ``.
This is a weak fix. It helped with obvious bad examples, but it is not a real security control.
3. Parsing model output is the messy part
parse_json_from_text exists because the LLM does not consistently return clean JSON.
I saw outputs like:
Here is the JSON you asked for:
{"full_name":"Alice" ... }
or the model would add a trailing Thanks! after the object.
So the code has to strip fences, trim a leading json label, and then extract the first {...} block.
If the model still fails validation, the endpoint returns 502 rather than letting junk flow into the UI.
4. Frontend auth is shallow
The Streamlit app uses st.session_state and a Supabase client. It is enough to demo recruiter vs applicant flows, but there is no real session management.
The schema even includes this note:
If you are using the anon key without user contexts, leave RLS disabled.
That was a hackathon shortcut. It is okay for a prototype, but not for anything that needs basic security.
5. Ops is heavier than the app
This repo is not just Python code. It is a service that only works if a local vLLM server is running.
cloudsetup.md is basically the deploy checklist. It tells you to SSH into a droplet, exec into a container, and run the model server manually.
That is okay for a weekend project, but it means the deployment surface is much larger than a single Next.js or FastAPI app.
6. The repo is still in iteration mode
README.md says the LLM endpoint is http://localhost:8000/v1/chat/completions, while main.py defaults to http://[IP_ADDRESS]/v1/chat/completions.
That mismatch is the sort of thing you get when you keep moving the demo between machines and forget to clean up placeholders.
7. Kafka is not actually wired up
frontend/utils/api.py contains a publish_to_kafka stub with the real producer code commented out.
That is a strong signal that the event-stream architectural idea was shelved early in favor of getting the resume flow working.
What I learned
This project taught me a few blunt lessons.
The model is not the hardest part
That sounds cliché, but it is still true here.
Once the PDF text is extracted and the output schema is defined, the remaining code is about making the result usable. Everything from fitz failures to pytesseract garbage to model responses wrapped in markdown is more painful than the model inference itself.
Metadata shape is worth enforcing
I am glad I used ResumeOutput for both the prompt and the backend validation.
The model may still produce junk, but at least the service has a contract. If the JSON does not validate, the endpoint fails fast instead of passing broken objects to the frontend.
Hacks are fine for early demos
sanitize_input_text is ugly. The Kafka stub is obvious. The frontend auth is weak.
Those are all okay for a hackathon if they let you keep moving. The problem is when you forget that they are hacks and start treating them as architecture.
Ops is real work
Deploying this meant more than running uvicorn.
cloudsetup.md is proof that the demo only works if the model server is up, the GPU container is healthy, and the droplet is reachable.
That is the part I would spend more time on next time.
Determinism is useful here
The temperature: 0 + seed: 42 choice was deliberate.
For a structured output task, I wanted as little model randomness as possible while the rest of the codebase was still brittle.
Future improvements
If I came back to this, I would fix the stuff that is actually broken first:
- enforce real auth and stop using anon Supabase access
- store raw resumes separately from the extracted text
- replace the regex-based prompt sanitization with a real input-normalization layer
- add retries and a fallback on model parse failures
- add a small evaluation set so I can measure whether the LLM output is improving or degenerating
- wire Kafka or another event stream properly instead of keeping it commented out
- add some simple observability around failed LLM responses and OCR errors
The next useful change is not another prompt wording tweak. It is the JSON parser and the validation loop.
Conclusion
This is not a polished HR product.
It is a first pass on a flow where a resume PDF becomes a structured HR assessment, and most of the work was in getting the input and output to behave.
The model is only one part of the system. If nothing else, this repo is a reminder that a working AI app is still mostly brittle glue and operational detail.