6 Comments
User's avatar
Balint's avatar

I'm definitely not an expert, but I'm not sure the conclusion is quite right here - my sense is that LLMs will generally give you the median position of news sources and educated elites (like you suggest) — UNLESS it thinks you might believe something else, in which case it tries to agree with you. This paper from Anthropic shows this kind of 'sycophancy' clearly (result of RLHF trying to maximise human preference scores: https://www-cdn.anthropic.com/e4f69aacd8c0905030172bc6eb480c252ea7d6ad/model-written-evals.pdf#page=28 (see section 4 espeically). Agree that from your prompts alone it is surprising how vehement its responses were though! perhaps the initial prompt/slightly leading follow-ups were enough, or perhaps it has remembered other info aobut you from other chats?

Expand full comment
Balint's avatar

Looks like this problem of sycophancy took off on more extreme directions with (undisclosed) updates more recently which they've since reversed (why don't they ever report updates??), I'm really worried about alignment and the future with the degree of susceptibilty some people have to AI generated suggestions - there's a slightly mad write-up of some of these from rolling stone https://www.rollingstone.com/culture/culture-features/ai-spiritual-delusions-destroying-human-relationships-1235330175/.

PS. just to be clear, I'm not saying I disagree with any of ChatGPTs points in your post above, just the interpretation that this is what the model's reproducible viewpoint actually is...

Expand full comment
Hari Sood's avatar

Love this additional insight Balint, thank you! It's really helpful to consider the context of sychophancy here, especially in the sense of reproducibility and some sense of like standardised answers across the board, rather than specific answers for specific people.

Ultimately I haven't really used ChatGPT all that much, and times I have have been primarily for fun (e.g. create this poem about my friend) than serious or political, and I definitely hadn't discussed Palestine before. There may be some hidden pattern work that I'm not noticing immediately myself, I guess this can become reductive and blurry quite quickly.

I guess then this leads to whether the initial prompts were enough to send it down this route, which (if this is the primary reason we ended up with these responses) raises an interesting question - the prompts were about liberty and freedom, which I would argue are fundamental human endeavours. If, prompted to think through the lens of liberation, these responses arise, is that enough of an argument to say 'cut out all the other noise, when it comes to what really matters on a human level, there is only one way to think about what is happening in Palestine'? Yes there is endlessly two sides on many metrics, but if you want to focus on freedom for people, there is only one?

Expand full comment
Balint's avatar

Well said Hari!

Expand full comment
Varut Subchareon's avatar

your first thought was also mine exactly. That ChatGPT is basically echoing what most of us think and feel. My immediate concern was also that they would somehow alter it. Very interesting experiment though. Really got my head spinning and really thinking about where were are in history. Sometimes I find it hard to really believe that we live in a time where we have technology like this.

Expand full comment
Hari Sood's avatar

Indeed Varut - I have taken screenshots/screen recordings so it is documented in case it changes!

Expand full comment