Anomaly Detection & Operational Intelligence
honey includes a state-of-the-art log anomaly detection and operational intelligence pipeline. Inspired by top-tier VLDB 2025 research (CoLA and LLMLog), our architecture combines ultra-fast heuristic and neural pre-screeners with Large Language Models (LLMs) to deliver high-timeliness, context-aware anomaly detection at scale.
┌────────────────────────────────┐
│ Raw Log Stream Tail │
└───────────────┬────────────────┘
│
▼
┌────────────────────────────────┐
│ LogLSHD Template Preprocessor │ <--- Collapse Parameter Noise
└───────────────┬────────────────┘
│
▼
┌────────────────────────────────┐
│ Two-Tier CoLA Filter │ <--- Skip LLM if obvious (e.g. < 0.40)
└───────────────┬────────────────┘
│
▼
┌────────────────────────────────┐
│ LLMLog Adaptive Selector │ <--- Dynamic Few-Shot (Postgres/Nginx/Node)
└───────────────┬────────────────┘
│
▼
┌────────────────────────────────┐
│ Local/Cloud LLM │ <--- Deep Semantic Classification & RCA
└────────────────────────────────┘
🚀 Intelligent Preprocessing: LogLSHD
Dealing with heterogeneous, multi-source industrial logs (PostgreSQL, Nginx, Node.js) presents high lexical variance. Traditional regular expressions (RE) are rigid and fail to handle changing formats.
honey integrates LogLSHD (Locality-Sensitive Hashing with Sequence-Alignment Clustering) to convert raw, unstructured logs into clean templates in real-time (less than 1ms latency).
Why use LogLSHD?
- Parameter Stripping: It automatically collapses variable segments (IPs, timestamps, UUIDs, hex literals) into a single, clean template (e.g.
interface <*> is flapping). - Massive Caching Performance: Stripping parameters enables our internal LRU Cache (up to 10,000 entries) to hit at a 95%+ rate for high-volume logs, bypassing the slow LLM completely and boosting throughput 10×.
How to enable it:
Add the --anomaly-preprocessor flag:
honey logs myapp --anomaly --anomaly-preprocessor lshd --anomaly-only
🧠 Dynamic Few-Shot Selection: LLMLog (VLDB 2025)
Instead of using static, generic prompts, honey implements LLMLog's Adaptive Demonstration Selection (Algorithm 3) to guide the LLM with contextually relevant, vendor-specific few-shot examples.
How it works:
- Semantic Keyword Coverage: When a new log line arrives,
honeyruns the greedy token-cover algorithm to scan its local seed pool. - Context Assembly: It selects the 2 best examples matching the log source's vocabulary (e.g., if you are tailing a PostgreSQL log, the LLM will receive historical PostgreSQL-specific normal and anomalous query demonstrations).
- Near-100% Accuracy: This dynamic guiding prevents LLM hallucinations, completely eliminates false positives, and aligns the model perfectly with system semantics.
🔌 Integrating Local Feedback & Training
honey features a closed feedback loop. You can correct or refine LLM classifications directly, and honey will instantly load them to retrain your model.
1. Web UI Logs Feedback Loop
Navigate to the Logs Feedback tab in the Web UI:
- Review Classifications: View a complete table of logged entries with their timestamp, source, raw line, score, and reason.
- Direct Toggles: Flip the Status Switch (
AnomalyvsNormal) or fine-tune the decimal score directly. - Write Reasons: Type in a custom semantic explanation to teach the LLM why this line is benign or critical.
- Save Changes: Saves directly back to your configured feedback file.
2. 💡 AI Suggestions
Unsure if a specific log line represents a security threat or a benign error?
- Click the AI Suggest (Bulb 💡) button in the actions column.
honeyqueries your configured LLM, which analyzes the log line and automatically updates the toggle, score, and writes a detailed semantic explanation for you!
3. CLI Learning
To launch honey and load your accumulated feedback definitions:
honey logs myapp --anomaly-feedback-file ~/.config/honey/feedback.jsonl
At startup, your manual corrections are prepended to the seed pool and take immediate precedence during LLMLog selection.