Model
Query-specific compression for RAG and Q&A pipelines.
latte_v1Token-level compression that preserves tokens relevant to a given query. Ideal for RAG pipelines and Q&A systems where you want to keep answer-relevant information while compressing the rest.
Parameters
context— text to compressquery— what to preserve
Best For
- RAG pipelines
- Q&A systems
- Query-aware compression
Compression Ratio
Control how aggressively to compress via target_compression_ratio.
| Value | Result |
|---|---|
| 0.2 | Light — keeps 80% |
| 0.5 | Balanced — keeps 50% (default) |
| 0.9 | Aggressive — keeps 10% |