Shared Content

Taxonomies of Human-AI Feedback and Debate Architectures for Alignment

Content may be unverified or unsafe. Report

Citations

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

On scalable oversight with weak LLMs judging strong LLMs — AI Alignment Forum

https://www.alignmentforum.org/posts/Qn3ZDf9WAqGuAjWQe/on-scalable-oversight-with-weak-llms-judging-strong-llms

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

An overview of 11 proposals for building safe advanced AI — AI Alignment Forum

https://www.alignmentforum.org/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai

Writeup: Progress on AI Safety via Debate — AI Alignment Forum

https://www.alignmentforum.org/posts/Br4xDbYu4Frwrb64a/writeup-progress-on-ai-safety-via-debate-1

Writeup: Progress on AI Safety via Debate — AI Alignment Forum

https://www.alignmentforum.org/posts/Br4xDbYu4Frwrb64a/writeup-progress-on-ai-safety-via-debate-1

[PDF] Scalable AI Safety via Doubly-Efficient Debate - arXiv

https://arxiv.org/pdf/2311.14125

Let’s use AI to harden human defenses against AI manipulation — AI Alignment Forum

https://www.alignmentforum.org/posts/zxmzBTwKkPMxQQcfR/let-s-use-ai-to-harden-human-defenses-against-ai

Debate update: Obfuscated arguments problem - LessWrong

https://www.lesswrong.com/posts/PJLABqQ962hZEqhdB/debate-update-obfuscated-arguments-problem

Let’s use AI to harden human defenses against AI manipulation — AI Alignment Forum

https://www.alignmentforum.org/posts/zxmzBTwKkPMxQQcfR/let-s-use-ai-to-harden-human-defenses-against-ai

An overview of 11 proposals for building safe advanced AI — AI Alignment Forum

https://www.alignmentforum.org/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai

An overview of 11 proposals for building safe advanced AI — AI Alignment Forum

https://www.alignmentforum.org/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

On scalable oversight with weak LLMs judging strong LLMs — AI Alignment Forum

https://www.alignmentforum.org/posts/Qn3ZDf9WAqGuAjWQe/on-scalable-oversight-with-weak-llms-judging-strong-llms

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

On scalable oversight with weak LLMs judging strong LLMs — AI Alignment Forum

https://www.alignmentforum.org/posts/Qn3ZDf9WAqGuAjWQe/on-scalable-oversight-with-weak-llms-judging-strong-llms

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Educating Intelligence: What Jewish Tradition Can Teach Us About Aligning AI | by Michael Zibulevsky | Aug, 2025 | Medium

https://medium.com/@michaelzibulevsky/educating-intelligence-what-jewish-tradition-can-teach-us-about-aligning-ai-03b94bf8250a

Educating Intelligence: What Jewish Tradition Can Teach Us About Aligning AI | by Michael Zibulevsky | Aug, 2025 | Medium

https://medium.com/@michaelzibulevsky/educating-intelligence-what-jewish-tradition-can-teach-us-about-aligning-ai-03b94bf8250a

On scalable oversight with weak LLMs judging strong LLMs — AI Alignment Forum

https://www.alignmentforum.org/posts/Qn3ZDf9WAqGuAjWQe/on-scalable-oversight-with-weak-llms-judging-strong-llms

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

Debating with More Persuasive LLMs Leads to More Truthful Answers

https://arxiv.org/html/2402.06782v4

On scalable oversight with weak LLMs judging strong LLMs — AI Alignment Forum

https://www.alignmentforum.org/posts/Qn3ZDf9WAqGuAjWQe/on-scalable-oversight-with-weak-llms-judging-strong-llms

Clarifying Factored Cognition — AI Alignment Forum

https://www.alignmentforum.org/posts/eCWkJrFff7oMLwjEp/clarifying-factored-cognition

Clarifying Factored Cognition — AI Alignment Forum

https://www.alignmentforum.org/posts/eCWkJrFff7oMLwjEp/clarifying-factored-cognition

An overview of 11 proposals for building safe advanced AI — AI Alignment Forum

https://www.alignmentforum.org/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai

An overview of 11 proposals for building safe advanced AI — AI Alignment Forum

https://www.alignmentforum.org/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai

Clarifying Factored Cognition — AI Alignment Forum

https://www.alignmentforum.org/posts/eCWkJrFff7oMLwjEp/clarifying-factored-cognition

Clarifying Factored Cognition — AI Alignment Forum

https://www.alignmentforum.org/posts/eCWkJrFff7oMLwjEp/clarifying-factored-cognition

Clarifying Factored Cognition — AI Alignment Forum

https://www.alignmentforum.org/posts/eCWkJrFff7oMLwjEp/clarifying-factored-cognition

An overview of 11 proposals for building safe advanced AI — AI Alignment Forum

https://www.alignmentforum.org/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai

https://aclanthology.org/2024.acl-long.331.pdf

AI safety via market making — LessWrong

https://www.lesswrong.com/posts/YWwzccGbcHMJMpT45/ai-safety-via-market-making

AI safety via market making — LessWrong

https://www.lesswrong.com/posts/YWwzccGbcHMJMpT45/ai-safety-via-market-making

AI safety via market making — LessWrong

https://www.lesswrong.com/posts/YWwzccGbcHMJMpT45/ai-safety-via-market-making

AI safety via market making — LessWrong

https://www.lesswrong.com/posts/YWwzccGbcHMJMpT45/ai-safety-via-market-making

AI safety via market making — LessWrong

https://www.lesswrong.com/posts/YWwzccGbcHMJMpT45/ai-safety-via-market-making

Educating Intelligence: What Jewish Tradition Can Teach Us About Aligning AI | by Michael Zibulevsky | Aug, 2025 | Medium

https://medium.com/@michaelzibulevsky/educating-intelligence-what-jewish-tradition-can-teach-us-about-aligning-ai-03b94bf8250a

https://aclanthology.org/2024.acl-long.331.pdf

Let’s use AI to harden human defenses against AI manipulation — AI Alignment Forum

https://www.alignmentforum.org/posts/zxmzBTwKkPMxQQcfR/let-s-use-ai-to-harden-human-defenses-against-ai

Let’s use AI to harden human defenses against AI manipulation — AI Alignment Forum

https://www.alignmentforum.org/posts/zxmzBTwKkPMxQQcfR/let-s-use-ai-to-harden-human-defenses-against-ai

Clarifying Factored Cognition — AI Alignment Forum

https://www.alignmentforum.org/posts/eCWkJrFff7oMLwjEp/clarifying-factored-cognition

Let’s use AI to harden human defenses against AI manipulation — AI Alignment Forum

https://www.alignmentforum.org/posts/zxmzBTwKkPMxQQcfR/let-s-use-ai-to-harden-human-defenses-against-ai

https://aclanthology.org/2024.acl-long.331.pdf

AI safety via market making — LessWrong

https://www.lesswrong.com/posts/YWwzccGbcHMJMpT45/ai-safety-via-market-making

All Sources