DocsAvailable Models

Model

Query-specific compression for RAG and Q&A pipelines.

latte_v1

Token-level compression that preserves tokens relevant to a given query. Ideal for RAG pipelines and Q&A systems where you want to keep answer-relevant information while compressing the rest.

Parameters

  • context — text to compress
  • query — what to preserve

Best For

  • RAG pipelines
  • Q&A systems
  • Query-aware compression

Compression Ratio

Control how aggressively to compress via target_compression_ratio.

ValueResult
0.2Light — keeps 80%
0.5Balanced — keeps 50% (default)
0.9Aggressive — keeps 10%