Attention-based Preference Modeling

Learning community-specific weights over interpretable attributes

The model represents each candidate response by its attribute vector \(a\) (e.g., formality, humor, empathy). For a community \(c\), an attention mechanism produces weights \(w_c\) over attributes. The preference score is computed by combining attributes with these weights.

Inputs: attribute evidences per response
Mechanism: attention over attributes conditioned on community context
Output: community-specific preference score and explanation

Flexibly re-weights attributes per community/task
Produces interpretable importance scores
Supports dynamic adaptation when context changes

46.6%

Avg. improvement over GPT-4o

Evaluated on 45 Reddit communities

Provides attribute weight vectors per community and example-specific rationales via attention scoresexplainable.

Scholarly

High: Verbosity, Stimulation

Low: Sarcasm

Conflict-oriented

High: Sarcasm, Directness

Low: Empathy

Support-based

High: Empathy

Moderate: Formality

← Back to Method Home

Attention-based Preference Modeling

Concept

Why attention?

Performance

Interpretability

Illustrative community weightings

Scholarly

Conflict-oriented

Support-based