ROTTGER, PAUL
ROTTGER, PAUL
Dipartimento di Scienze della Computazione
Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts
2024 Rooein, Donya; Röttger, Paul; Shaitarova, Anastassia; Hovy, Dirk
Data-efficient strategies for expanding hate speech detection into under-resourced languages
2022 Röttger, Paul; Nozza, Debora; Bianchi, Federico; Hovy, Dirk
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ
2024 Holtermann, Carolin; Röttger, Paul; Dill, Timm; Lauscher, Anne
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets
2024 Tonneau, Manuel; Liu, Diyi; Fraiberger, Samuel; Schroeder, Ralph; Hale, Scott; Röttger, Paul
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
2024 Goldzycher, Janis; Röttger, Paul; Schneider, Gerold
Improving Covert Toxicity Detection by Retrieving and Generating References
2024 Lee, Dong-Ho; Cho, Hyundong; Jin, Woojeong; Moon, Jihyung; Park, Sungjoon; Röttger, Paul; Pujara, Jay; Lee, Roy Ka-wei
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think
2024 Wang, Xinpeng; Chengzhi, Hu; Bolei, Ma; Röttger, Paul; Plank, Barbara
Multilingual HateCheck: functional tests for multilingual hate speech detection models
2022 Rottger, Paul; Seelawi, Haitham; Nozza, Debora; Talat, Zeerak; Vidgen, Bertie
Near to Mid-term Risks and Opportunities of Open-Source Generative AI
2024 Eiras, Francisco; Petrov, Aleksandar; Vidgen, Bertie; Christian Schroeder de Witt, ; Pizzati, Fabio; Elkins, Katherine; Mukhopadhyay, Supratik; Bibi, Adel; Csaba, Botos; Steibel, Fabro; Barez, Fazl; Smith, Genevieve; Guadagni, Gianluca; Chun, Jon; Cabot, Jordi; Joseph Marvin Imperial, ; Nolazco-Flores, Juan A.; Landay, Lori; Jackson, Matthew; Röttger, Paul; Torr, Philip H. S.; Darrell, Trevor; Yong Suk Lee, ; Foerster, Jakob
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
2024 Röttger, Paul; Hofmann, Valentin; Pyatkin, Valentina; Hinck, Musashi; Kirk, Hannah; Schuetze, Hinrich; Hovy, Dirk
Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions
2024 Bianchi, Federico; Suzgun, Mirac; Attanasio, Giuseppe; Röttger, Paul; Jurafsky, Dan; Hashimoto, Tatsunori; Zou, James
The benefits, risks and bounds of personalizing the alignment of large language models to individuals
2024 Kirk, Hannah Rose; Vidgen, Bertie; Röttger, Paul; Hale, Scott A.
The ecological fallacy in annotation: modeling human label variation goes beyond sociodemographics
2023 Orlikowski, Matthias; Röttger, Paul; Cimiano, Philipp; Hovy, Dirk
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models
2023 Hannah Rose Kirk, ; Vidgen, Bertie; Röttger, Paul; Hale, Scott A.
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
2023 Kirk, Hannah; Bean, Andrew; Vidgen, Bertie; Rottger, Paul; Hale, Scott
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
2024 Hannah Rose Kirk, ; Whitefield, Alexander; Röttger, Paul; Bean, Andrew; Margatina, Katerina; Ciro, Juan; Mosquera, Rafael; Bartolo, Max; Williams, Adina; He, He; Vidgen, Bertie; Hale, Scott A.
Two contrasting data annotation paradigms for subjective NLP tasks
2022 Rottger, Paul; Vidgen, Bertie; Hovy, Dirk; Pierrehumbert, Janet
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
2024 Röttger, Paul; Kirk, Hannah; Vidgen, Bertie; Attanasio, Giuseppe; Bianchi, Federico; Hovy, Dirk
“My answer is C”: first-token probabilities do not match text answers in instruction-tuned language models
2024 Wang, Xinpeng; Ma, Bolei; Hu, Chengzhi; Weber-Genzel, Leon; Röttger, Paul; Kreuter, Frauke; Hovy, Dirk; Plank, Barbara
Titolo | Data di pubblicazione | Autore(i) | Rivista | Editore |
---|---|---|---|---|
Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts | 1-gen-2024 | Rooein, Donya; Röttger, Paul; Shaitarova, Anastassia; Hovy, Dirk | - | Association for Computational Linguistics |
Data-efficient strategies for expanding hate speech detection into under-resourced languages | 1-gen-2022 | Röttger, Paul; Nozza, Debora; Bianchi, Federico; Hovy, Dirk | - | Association for Computational Linguistics |
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ | 1-gen-2024 | Holtermann, Carolin; Röttger, Paul; Dill, Timm; Lauscher, Anne | - | Association for Computational Linguistics |
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets | 1-gen-2024 | Tonneau, Manuel; Liu, Diyi; Fraiberger, Samuel; Schroeder, Ralph; Hale, Scott; Röttger, Paul | - | Association for Computational Linguistics |
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset | 1-gen-2024 | Goldzycher, Janis; Röttger, Paul; Schneider, Gerold | - | (seleziona...) |
Improving Covert Toxicity Detection by Retrieving and Generating References | 1-gen-2024 | Lee, Dong-Ho; Cho, Hyundong; Jin, Woojeong; Moon, Jihyung; Park, Sungjoon; Röttger, Paul; Pujara, Jay; Lee, Roy Ka-wei | - | Association for Computational Linguistics |
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think | 1-gen-2024 | Wang, Xinpeng; Chengzhi, Hu; Bolei, Ma; Röttger, Paul; Plank, Barbara | - | (seleziona...) |
Multilingual HateCheck: functional tests for multilingual hate speech detection models | 1-gen-2022 | Rottger, Paul; Seelawi, Haitham; Nozza, Debora; Talat, Zeerak; Vidgen, Bertie | - | Association for Computational Linguistics |
Near to Mid-term Risks and Opportunities of Open-Source Generative AI | 1-gen-2024 | Eiras, Francisco; Petrov, Aleksandar; Vidgen, Bertie; Christian Schroeder de Witt, ; Pizzati, Fabio; Elkins, Katherine; Mukhopadhyay, Supratik; Bibi, Adel; Csaba, Botos; Steibel, Fabro; Barez, Fazl; Smith, Genevieve; Guadagni, Gianluca; Chun, Jon; Cabot, Jordi; Joseph Marvin Imperial, ; Nolazco-Flores, Juan A.; Landay, Lori; Jackson, Matthew; Röttger, Paul; Torr, Philip H. S.; Darrell, Trevor; Yong Suk Lee, ; Foerster, Jakob | - | (seleziona...) |
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models | 1-gen-2024 | Röttger, Paul; Hofmann, Valentin; Pyatkin, Valentina; Hinck, Musashi; Kirk, Hannah; Schuetze, Hinrich; Hovy, Dirk | - | Association for Computational Linguistics |
Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions | 1-gen-2024 | Bianchi, Federico; Suzgun, Mirac; Attanasio, Giuseppe; Röttger, Paul; Jurafsky, Dan; Hashimoto, Tatsunori; Zou, James | - | (seleziona...) |
The benefits, risks and bounds of personalizing the alignment of large language models to individuals | 1-gen-2024 | Kirk, Hannah Rose; Vidgen, Bertie; Röttger, Paul; Hale, Scott A. | NATURE MACHINE INTELLIGENCE | - |
The ecological fallacy in annotation: modeling human label variation goes beyond sociodemographics | 1-gen-2023 | Orlikowski, Matthias; Röttger, Paul; Cimiano, Philipp; Hovy, Dirk | - | Association for Computational Linguistics |
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models | 1-gen-2023 | Hannah Rose Kirk, ; Vidgen, Bertie; Röttger, Paul; Hale, Scott A. | - | (seleziona...) |
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values | 1-gen-2023 | Kirk, Hannah; Bean, Andrew; Vidgen, Bertie; Rottger, Paul; Hale, Scott | - | Association for Computational Linguistics |
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models | 1-gen-2024 | Hannah Rose Kirk, ; Whitefield, Alexander; Röttger, Paul; Bean, Andrew; Margatina, Katerina; Ciro, Juan; Mosquera, Rafael; Bartolo, Max; Williams, Adina; He, He; Vidgen, Bertie; Hale, Scott A. | - | (seleziona...) |
Two contrasting data annotation paradigms for subjective NLP tasks | 1-gen-2022 | Rottger, Paul; Vidgen, Bertie; Hovy, Dirk; Pierrehumbert, Janet | - | Association for Computational Linguistics |
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models | 1-gen-2024 | Röttger, Paul; Kirk, Hannah; Vidgen, Bertie; Attanasio, Giuseppe; Bianchi, Federico; Hovy, Dirk | - | Association for Computational Linguistics |
“My answer is C”: first-token probabilities do not match text answers in instruction-tuned language models | 1-gen-2024 | Wang, Xinpeng; Ma, Bolei; Hu, Chengzhi; Weber-Genzel, Leon; Röttger, Paul; Kreuter, Frauke; Hovy, Dirk; Plank, Barbara | - | Association for Computational Linguistics |