The project overview for Term Extraction Engine (TEE)

2 min readBy Ashutosh Rana
term extractionterm extraction engingteenlp

Lab Entry #2

   

Term Extraction Engine (TEE)

   

├── Project Overview
│ ├── Modular, Scalable, Domain-Adaptive
│ ├── Extracts Single-word Terms
│ ├── Extracts Multi-word Expressions
│ ├── Extracts Technical Verbs/Adjectives
│ ├── Extracts Acronyms
│ ├── Target Domains: CS, Blockchain, Web3
│ ├── Offline-First Capability
│ ├── Future LLM Integration
│ └── Future SaaS Deployment
├── Objectives
│ ├── Extract Terms & Acronyms
│ ├── Offline & Efficient Processing
│ ├── Modular/Pluggable Architecture
│ ├── Bootstrapping from Glossaries
│ └── API & CLI Access
├── Input Specifications
│ ├── Supported Formats
│ │ ├── PDF
│ │ ├── .txt
│ │ ├── Markdown (.md)
│ │ └── HTML
│ ├── User-Defined Domain
│ └── Glossary Bootstrapping
├── Core Features
│ ├── PDF Parsing (Metadata)
│ ├── NLP Pipeline
│ │ ├── Tokenization
│ │ ├── POS Tagging
│ │ └── Dependency Parsing
│ ├── Multi-Strategy Candidate Extraction
│ │ ├── N-gram
│ │ └── Patterns
│ ├── Acronym Extraction & Expansion
│ ├── Statistical Scoring
│ │ ├── TF-IDF
│ │ └── C-value
│ ├── Semantic Scoring (Embeddings)
│ │ └── Weirdness
│ ├── Optional LLM Validation/Refinement
│ └── JSON Output (Rich Metadata)
│ ├── Term
│ ├── Type
│ ├── Source
│ ├── Score
│ └── Context
├── Storage and Output
│ ├── Structured JSON Output
│ ├── Local Database
│ │ ├── SQLite
│ │ └── Extensible (MongoDB/PostgreSQL)
│ └── Indexing Support (Filtering/Search)
├── Interface Requirements
│ ├── CLI for Extraction
│ ├── REST/GraphQL API
│ └── Optional Web Interface
├── Future Extensions
│ ├── Knowledge Graph Generation
│ ├── Mind Map Generation
│ ├── Flash Cards Genenration
│ ├── Interactive Curation Interface
│ ├── Multilingual Support
│ ├── Blockchain Metadata Anchoring (Scrapychain)
│ └── SaaS Deployment Model
└── Design Principles
├── Built in Rust (with Python Interop)
├── Local-First, Cloud-Scalable
├── Modular
├── Testable
├── Extensible
└── Domain-Aware NLP Pipeline

   

Follow me on X where I share my raw thoughts. ↗