# speakr **Repository Path**: tim-tech/speakr ## Basic Information - **Project Name**: speakr - **Description**: No description available - **Primary Language**: Unknown - **License**: AGPL-3.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-12 - **Last Updated**: 2025-12-12 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

Speakr

Self-hosted AI transcription and intelligent note-taking platform

Documentation • Quick Start • Screenshots • Docker Hub • Releases

--- ## Overview Speakr transforms your audio recordings into organized, searchable, and intelligent notes. Built for privacy-conscious groups and individuals, it runs entirely on your own infrastructure, ensuring your sensitive conversations remain completely private.

## Key Features ### Core Functionality - **Smart Recording & Upload** - Record directly in browser or upload existing audio files - **AI Transcription** - High-accuracy transcription with speaker identification - **Voice Profiles** - AI-powered speaker recognition with voice embeddings (requires WhisperX ASR service) - **Audio-Transcript Sync** - Click transcript to jump to audio, auto-highlight current text, follow mode for hands-free playback - **Interactive Chat** - Ask questions about your recordings and get AI-powered answers - **Inquire Mode** - Semantic search across all recordings using natural language - **Internationalization** - Full support for English, Spanish, French, German, and Chinese - **Beautiful Themes** - Light and dark modes with customizable color schemes ### Collaboration & Sharing - **Internal Sharing** - Share recordings with specific users with granular permissions (view/edit/reshare) - **Group Management** - Create groups with automatic sharing via group-scoped tags - **Public Sharing** - Generate secure links to share recordings externally (admin-controlled) - **Group Tags** - Tags that automatically share recordings with all group members ### Organization & Management - **Smart Tagging** - Organize with tags that include custom AI prompts and ASR settings - **Tag Prompt Stacking** - Combine multiple tags to layer AI instructions for powerful transformations - **Tag Protection** - Prevent specific recordings from being auto-deleted - **Group Retention Policies** - Set custom retention periods per group tag - **Auto-Deletion** - Automatic cleanup of old recordings with flexible retention policies ## Real-World Use Cases Different people use Speakr's collaboration and retention features in different ways: | Use Case | Setup | What It Does | |----------|-------|-------------| | **Family memories** | Create "Family" group with protected tag | Everyone gets access to trips and events automatically, recordings preserved forever | | **Book club discussions** | "Book Club" group, tag monthly meetings | All members auto-share discussions, can add personal notes about what resonated | | **Work project group** | Share individually with 3 teammates | Temporary collaboration, easy to revoke when project ends | | **Daily group standups** | Group tag with 14-day retention | Auto-share with group, auto-cleanup of routine meetings | | **Architecture decisions** | Engineering group tag, protected from deletion | Technical discussions automatically shared, preserved permanently as reference | | **Client consultations** | Individual share with view-only permission | Controlled external access, clients can't accidentally edit | | **Research interviews** | Protected tag + Obsidian export | Preserve recordings indefinitely, transcripts auto-import to note-taking system | | **Legal consultations** | Group tag with 7-year retention | Automatic sharing with legal group, compliance-based retention | | **Sales calls** | Group tag with 1-year retention | Whole sales group learns from each call, cleanup after sales cycle | ### Creative Tag Prompt Examples Tags with custom prompts transform raw recordings into exactly what you need: - **Recipe recordings**: Record yourself cooking while narrating - tag with "Recipe" to convert messy speech into formatted recipes with ingredient lists and numbered steps - **Lecture notes**: Students tag lectures with "Study Notes" to get organized outlines with concepts, examples, and definitions instead of raw transcripts - **Code reviews**: "Code Review" tag extracts issues, suggested changes, and action items in technical language developers can use directly - **Meeting summaries**: "Action Items" tag ignores discussion and returns just decisions, tasks, and deadlines ### Tag Stacking for Combined Effects Stack multiple tags to layer instructions: - "Recipe" + "Gluten Free" = Formatted recipe with gluten substitution suggestions - "Lecture" + "Biology 301" = Study notes format focused on biological terminology - "Client Meeting" + "Legal Review" = Client requirements plus legal implications highlighted The order can matter - start with format tags, then add focus tags for best results. ### Integration Examples - **Obsidian/Logseq**: Enable auto-export to write completed transcripts directly to your vault using your custom template - no manual export needed - **Documentation wikis**: Map auto-export to your wiki's import folder for seamless transcript publishing - **Content creation**: Create SRT subtitle templates from your audio recordings for podcasts or video content - **Project management**: Extract action items with custom tag prompts, then auto-export for automated task creation ## Quick Start ### Using Docker (Recommended) ```bash # Create project directory mkdir speakr && cd speakr # Download docker-compose configuration: wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/docker-compose.example.yml -O docker-compose.yml # Choose your transcription method and download the corresponding .env file: # Option 1: Standard Whisper API (no speaker diarization): wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/env.whisper.example -O .env # Option 2: WhisperX ASR with voice profiles (recommended for speaker features): wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/env.whisperx.example -O .env # Option 3: Basic ASR with diarization (no voice profiles): wget https://raw.githubusercontent.com/murtaza-nasir/speakr/master/config/env.asr.example -O .env # Configure your service endpoints and API keys nano .env # Set API endpoints (Local/OpenAI/OpenRouter/etc) and add your API keys # Launch Speakr docker compose up -d # Access at http://localhost:8899 ``` **Note:** ASR option requires running an additional ASR service container alongside Speakr: - **For voice profiles & speaker embeddings:** Use [WhisperX ASR Service](https://github.com/murtaza-nasir/whisperx-asr-service) (recommended) - **For basic speaker diarization:** Use [OpenAI Whisper ASR Webservice](https://github.com/ahmetoner/whisper-asr-webservice) See [installation guide](https://murtaza-nasir.github.io/speakr/getting-started/installation#running-asr-service-for-speaker-diarization) for complete setup instructions. **[View Full Installation Guide →](https://murtaza-nasir.github.io/speakr/getting-started)** ## Documentation Complete documentation is available at **[murtaza-nasir.github.io/speakr](https://murtaza-nasir.github.io/speakr)** - [Getting Started](https://murtaza-nasir.github.io/speakr/getting-started) - Quick setup guide - [User Guide](https://murtaza-nasir.github.io/speakr/user-guide/) - Learn all features - [Admin Guide](https://murtaza-nasir.github.io/speakr/admin-guide/) - Administration and configuration - [Troubleshooting](https://murtaza-nasir.github.io/speakr/troubleshooting) - Common issues and solutions - [FAQ](https://murtaza-nasir.github.io/speakr/faq) - Frequently asked questions ## Latest Release (v0.6.5) **New Feature** - Separate Chat Model Configuration - **Separate Chat Model** - Configure different AI models for chat vs background tasks (#143) - **Custom Datetime Picker** - New themed calendar and time selection modal - **Bug Fixes** - Audio chunking after refactor (#140), username display (#138) Fully backward compatible. Optional `CHAT_MODEL_*` environment variables. ### Previous Release (v0.6.3) - API Token Authentication for programmatic access - Multiple auth methods: Bearer token, X-API-Token, query parameter - Token management UI with expiration settings ### Previous Release (v0.6.2) - Standardized modal UX with backdrop click and consistent X button - Recording disclaimer markdown support - IndexedDB crash recovery and queue cleanup fixes ### v0.5.9 - Major Release > **⚠️ IMPORTANT:** v0.5.9 introduced significant architectural changes. If upgrading from earlier versions, backup your data first and review the [configuration guide](https://murtaza-nasir.github.io/speakr/getting-started/installation#configuration-updates). #### Highlights - **Complete Internal Sharing System** - Share recordings with users with granular permissions (view/edit/reshare) - **Group Management & Collaboration** - Create groups with auto-sharing via group tags and custom retention policies - **Speaker Voice Profiles** - AI-powered speaker identification with 256-dimensional voice embeddings - **Audio-Transcript Synchronization** - Click-to-jump, auto-highlight, and follow mode for interactive navigation - **Auto-Deletion & Retention System** - Flexible retention policies with global and group-level controls - **Automated Export** - Auto-export transcriptions to markdown for Obsidian, Logseq, and other note-taking apps - **Permission System** - Fine-grained access control throughout the application - **Modular Architecture** - Backend refactored into blueprints, frontend composables for maintainability - **UI/UX Enhancements** - Compact controls, inline editing, unified toast notifications, improved badges - **Enhanced Internationalization** - 29 new tooltip translations across all supported languages ## Screenshots

Main Dashboard with Chat	AI-Powered Semantic Search
Interactive Transcription & Chat	Full Internationalization

**[View Full Screenshot Gallery →](https://murtaza-nasir.github.io/speakr/screenshots)** ## Technology Stack - **Backend**: Python/Flask with SQLAlchemy - **Frontend**: Vue.js 3 with Tailwind CSS - **AI/ML**: OpenAI Whisper, OpenRouter, Ollama support - **Database**: SQLite (default) or PostgreSQL - **Deployment**: Docker, Docker Compose ## Roadmap ### Completed - ✅ Speaker voice profiles with AI-powered identification (v0.5.9) - ✅ Group workspaces with shared recordings (v0.5.9) - ✅ PWA enhancements with offline support and background sync (v0.5.10) - ✅ Multi-user job queue with fair scheduling (v0.6.0) ### Near-term - Bulk operations for recordings (mass delete, export, tagging) - Quick language switching for transcription - Automated workflow triggers ### Long-term - Plugin system for custom integrations - End-to-end encryption option - Enterprise SSO integration ### Reporting Issues - [Report bugs](https://github.com/murtaza-nasir/speakr/issues) - [Request features](https://github.com/murtaza-nasir/speakr/discussions) ## License This project is **dual-licensed**: 1. **GNU Affero General Public License v3.0 (AGPLv3)** [![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0) Speakr is offered under the AGPLv3 as its open-source license. You are free to use, modify, and distribute this software under the terms of the AGPLv3. A key condition of the AGPLv3 is that if you run a modified version on a network server and provide access to it for others, you must also make the source code of your modified version available to those users under the AGPLv3. * You **must** create a file named `LICENSE` (or `COPYING`) in the root of your repository and paste the full text of the [GNU AGPLv3 license](https://www.gnu.org/licenses/agpl-3.0.txt) into it. * Read the full license text carefully to understand your rights and obligations. 2. **Commercial License** For users or organizations who cannot or do not wish to comply with the terms of the AGPLv3 (for example, if you want to integrate Speakr into a proprietary commercial product or service without being obligated to share your modifications under AGPLv3), a separate commercial license is available. Please contact **speakr maintainers** for details on obtaining a commercial license. **You must choose one of these licenses** under which to use, modify, or distribute this software. If you are using or distributing the software without a commercial license agreement, you must adhere to the terms of the AGPLv3. ## Contributing We welcome contributions to Speakr! There are many ways to help: - **Bug Reports & Feature Requests**: [Open an issue](https://github.com/murtaza-nasir/speakr/issues) - **Discussions**: [Share ideas and ask questions](https://github.com/murtaza-nasir/speakr/discussions) - **Documentation**: Help improve our docs - **Translations**: Contribute translations for internationalization ### Code Contributions All code contributions require signing a [Contributor License Agreement (CLA)](CLA.md). This one-time process ensures we can maintain our dual-license model (AGPLv3 and Commercial). **See our [Contributing Guide](CONTRIBUTING.md) for complete details on:** - How the CLA works and why we need it - Step-by-step contribution process - Development setup instructions - Coding standards and best practices The CLA is automatically enforced via GitHub Actions. When you submit your first PR, our bot will guide you through signing.