← Back to Method Home

Attention-based Preference Modeling

Learning community-specific weights over interpretable attributes

Concept

The model represents each candidate response by its attribute vector \(a\) (e.g., formality, humor, empathy). For a community \(c\), an attention mechanism produces weights \(w_c\) over attributes. The preference score is computed by combining attributes with these weights.

  • Inputs: attribute evidences per response
  • Mechanism: attention over attributes conditioned on community context
  • Output: community-specific preference score and explanation

Why attention?

  • Flexibly re-weights attributes per community/task
  • Produces interpretable importance scores
  • Supports dynamic adaptation when context changes

Performance

46.6%
Avg. improvement over GPT-4o

Evaluated on 45 Reddit communities

Interpretability

Provides attribute weight vectors per community and example-specific rationales via attention scoresexplainable.

Illustrative community weightings

Scholarly

High: Verbosity, Stimulation

Low: Sarcasm

Conflict-oriented

High: Sarcasm, Directness

Low: Empathy

Support-based

High: Empathy

Moderate: Formality

← Back to Method Home