Learning community-specific weights over interpretable attributes
The model represents each candidate response by its attribute vector \(a\) (e.g., formality, humor, empathy). For a community \(c\), an attention mechanism produces weights \(w_c\) over attributes. The preference score is computed by combining attributes with these weights.
Evaluated on 45 Reddit communities
Provides attribute weight vectors per community and example-specific rationales via attention scoresexplainable.
High: Verbosity, Stimulation
Low: Sarcasm
High: Sarcasm, Directness
Low: Empathy
High: Empathy
Moderate: Formality