Show HN: Codebased, an AI Search Engine for Code
(codebased.sh)16 points by maxconradt 3 days ago | 9 comments
Codebased combines Tree Sitter for code awareness (find functions, data structures, constants, etc. not just lines of code), full-text search using SQLite, and semantic search using OpenAI embeddings + FAISS. Despite being implemented in Python, supporting semantic search, making multiple API calls for embedding and re-ranking, it is faster than ripgrep for runng searches against the Linux kernel (takes ~1 second vs. ~2 seconds, obviously depends on system, temperature, time of day, tidal forces, etc.) Up next: - A Perplexity-like agent for interpreting results, making multiple follow-up searches, etc. - Custom embedding and re-ranking stack - Agent for running shell commands, editing code, etc. similar to SWe-Agent: https://arxiv.org/pdf/2405.15793.
tarasglek a day ago | next |
It would be helpful to document your architecture. Eg what kind of text search you use, eg trigrams, some sort of benchmark for searching with/without expensive embeddings.