JAY ZENITH
I enjoy training, evaluating, and serving LLM agents - right now that's glyph, a Rust tool-use agent taken from SFT through verifier RL against strict whole-trace evals.
- glyph
Rust tool-use agent on a 4B model, taken from SFT through verifier RL against strict whole-trace evals - owning the whole stack from synthetic data to pass@k analysis. [blog]
- llama.cpp - CUDA upstream
Added GGML_OP_FILL on the CUDA backend for the Qwen3-Next path; removed a CPU fallback. Merged upstream.
- llama.cpp - sampling hot path
Cut a redundant O(vocab) allocation from token sampling. 1.9-2.2x on the sampling microbench.
- mini-sglang - Mistral support
Sliding-window attention for Mistral-7B, validated against Hugging Face past 6k tokens.
- pd disaggregation benchmark
A/B harness for prefill/decode split vs colocated serving.