The problem multimodal capture solves
You jotted something on paper, photographed a receipt, recorded a voice reminder while driving, and later saved a PDF contract from email. Four different formats, four different apps, four places to search later.
CleverNote solves this with a single entry point for any content type. Text, photo, audio and PDF go in through the same door and come out organized, searchable and ready to ask questions about.
How it works internally
When you submit a note, CleverNote acts in two steps:
Step 1, immediate conversion: regardless of format, the system converts everything to text. Photos go through vision AI (semantic OCR), audio is transcribed by Whisper, and PDFs are read page by page, with OCR fallback for scanned files. The result is clean text representing the original content.
Step 2, AI enrichment: with text in hand, the AI creates a title (if you didn’t provide one), classifies the category, generates tags and creates vector embeddings. These embeddings power semantic search later. Everything happens in the background without blocking your flow.
The note is visible and usable immediately after submission, with a subtle “processing” indicator while the AI works.
Capturing text
In the web or mobile app, click or tap the capture field at the top of the screen. Type or paste any text. No title or formatting required. Can be a loose thought, an address, an idea or a task.
When you submit, the AI will:
- Create a descriptive title automatically
- Classify the content type (idea, task, information, event, etc.)
- Extract mentioned entities (people, companies, dates)
- Create embeddings for semantic search
You can always edit the content later. The original is preserved in the version history.
Capturing a photo
On mobile, tap the camera icon in the capture bar. You can take a photo immediately or choose one from your gallery. On web, drag an image file to the capture area or click to select.
CleverNote’s vision AI goes beyond simple OCR: it understands image context. For receipts, it extracts amount, date, establishment and direction (income or expense). For business cards, it extracts name, phone and company. For screenshots of other apps, it discards the UI “chrome” and extracts only the useful content.
Capturing audio
On mobile, tap the microphone icon and record your voice note. Transcription happens automatically via Whisper.
Good use cases for voice notes:
- Remembering something while you drive
- Logging an observation after a meeting
- Capturing an idea before it slips away
The AI extracts reminders, dates and people’s names from the transcription, just like any text note.
Capturing a PDF
Drag the PDF to the capture area or select via the file button. CleverNote uses PdfPig to read digital PDFs text-by-text. For scanned PDFs (images inside the PDF), the system rasterizes the pages and applies the same vision process used for photos.
Bank statement PDFs are especially useful: CleverNote can extract transactions, amounts and dates, with smart deduplication to prevent registering the same expense twice when importing overlapping statements.
Capturing from other channels
Beyond the app, you can capture content from three additional places:
- Web Clipper: browser extension to save pages and excerpts from websites
- Email: forward any email to your personal
@in.clevernote.netaddress - iOS Share Sheet: share anything directly to CleverNote from the iPhone share menu
What to do after capturing
With the note enriched, you can:
- Ask: “what was the receipt amount from the restaurant yesterday?”
- Search: find by meaning without remembering exact words
- Review: see all AI extractions (expenses, entities, reminders) and correct in one sentence if anything is wrong
- Connect: see related notes the system identified automatically
See also: How to use the Web Clipper, How to forward emails, Semantic search
Frequently asked questions
- Do I need to organize my notes manually?
- No. CleverNote automatically classifies, extracts data and creates tags for every captured note.
- Does CleverNote transcribe audio in English?
- Yes. The system uses OpenAI's Whisper model, which supports English and detects language automatically.
- How long does AI processing take?
- The note is available immediately after submission. AI enrichment happens in the background, usually within seconds.
Ready to try? CleverNote is free to start — no credit card required.
Try for free