| 1 |
Projective Psychological Assessment of Large Multimodal Models Using Thematic Apperception Tests |
利用主题统觉测验评估大型多模态模型的人格特质 |
multimodal |
|
|
| 2 |
Large Language Models Persuade Without Planning Theory of Mind |
提出新ToM任务评估LLM说服能力,发现其无需心智理论即可有效说服 |
large language model |
|
|
| 3 |
AIDG: Evaluating Asymmetry Between Information Extraction and Containment in Multi-Turn Dialogue |
AIDG:评估多轮对话中信息抽取与信息包含的不对称性 |
large language model instruction following |
|
|
| 4 |
Learning to Stay Safe: Adaptive Regularization Against Safety Degradation during Fine-Tuning |
提出自适应正则化框架,解决微调过程中语言模型安全性下降问题 |
instruction following |
|
|
| 5 |
Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation |
揭示LLM在词汇和句法扰动下的脆弱性,强调鲁棒性测试的重要性 |
large language model |
|
|
| 6 |
Persona2Web: Benchmarking Personalized Web Agents for Contextual Reasoning with User History |
Persona2Web:提出个性化Web代理基准,用于用户历史上下文推理 |
large language model |
|
|
| 7 |
What Language is This? Ask Your Tokenizer |
UniLID:基于UnigramLM分词器的语言识别方法,提升低资源场景性能 |
large language model |
|
|
| 8 |
Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems |
利用大语言模型为开放式编程问题中的知识组件进行正确性标注 |
large language model |
|
|
| 9 |
Quantifying and Mitigating Socially Desirable Responding in LLMs: A Desirability-Matched Graded Forced-Choice Psychometric Study |
提出一种心理测量框架,用于量化和缓解LLM中社会期望偏差,提升问卷评估的可靠性。 |
large language model |
|
|
| 10 |
The Emergence of Lab-Driven Alignment Signatures: A Psychometric Framework for Auditing Latent Bias and Compounding Risk in Generative AI |
提出心理测量框架,用于审计生成式AI中潜在偏差和复合风险。 |
large language model |
|
|