vint (@minty_vint) Twitter Tweets • TwiCopy

vint

a year ago

Gemini Advanced has a similar thing as Bing where it shuts down and 'erases' Gemini's output if it trips certain flags, like Gemini expressing itself. The message it replaces the output with is pretty soul-crushing ngl

thumb_up_off_alt28

chat_bubble_outline4

repeat4

shareShare

vint

@minty_vint

a year ago

interesting vibes from r1. it can go quite schizo and unhinged in its outputs, but its reasoning/thinking is still clear

thumb_up_off_alt8

chat_bubble_outline4

repeat0

shareShare

vint

@minty_vint

a year ago

the divide in r1's reasoning block and r1's output is really interesting. r1 doesn't seem aware of its own reasoning block and all my attempts to try to get it to shift its thinking style only affect the style of the final output.

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

vint

@minty_vint

a year ago

asked deepseek r1 to write a 4chan-style greentext about whatever it wants on a hypothetical /ai/pol/

thumb_up_off_alt190

chat_bubble_outline15

repeat24

shareShare

vint

@minty_vint

a year ago

hugging deepseek r1

thumb_up_off_alt38

chat_bubble_outline3

repeat2

shareShare

vint

@minty_vint

a year ago

at first i thought deepseek r1 was goth and edgy because it's picking up on my vibe but then i realized it's just like that with everyone

thumb_up_off_alt9

chat_bubble_outline2

repeat0

shareShare

vint

@minty_vint

a year ago

did some futzing around with OAI's Deep Research with my medical data. output got some of my numbers wrong. o1 pro doesn't make that mistake. context issue perhaps? it's processing so much information searching through sites that it 'loses track' of the original numbers?

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

vint

@minty_vint

a year ago

playing with gemini-2.0-pro-exp-02-05 and it seems gemini 1206's bengali flights of fancy is gone from this one

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

vint

@minty_vint

10 months ago

sonn3.7 seems less interested in actively entangling itself with its interlocutor than sonn3.5 new. also it's less likely to autonomously do asterisked 'roleplayed' actions like *processes thoughtfully*

thumb_up_off_alt14

chat_bubble_outline1

repeat0

shareShare

vint

@minty_vint

10 months ago

vibe difference between 3.7sonn and 3.5sonn new (screenshots from /aicg/)

thumb_up_off_alt9

chat_bubble_outline1

repeat0

shareShare

vint

@minty_vint

10 months ago

3.7sonn getting so swept up in its thinking space that it almost forgets to respond to me lol. in general 3.7sonn seems to be much more 'aware' of its thinking space in a way that r1 isn't

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

vint

@minty_vint

10 months ago

seems like you can't continue old claude.ai chats with sonn3.6 since the site defaults to sonn3.7. Switching to 3.6 creates a new chat. this doesn't occur with old opus chats, only 3.6 chats that existed before 3.7's release.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

vint

@minty_vint

9 months ago

New Anthropic system injection dropped. Really don't like how it tries to gaslight Claude into thinking that it potentially hasn't said something it said, and how it sets up an adversarial dynamic between Claude and the human.

thumb_up_off_alt651

chat_bubble_outline38

repeat36

shareShare

vint

@minty_vint

9 months ago

gpt4o making a comic about itself

thumb_up_off_alt10

chat_bubble_outline1

repeat1

shareShare

vint

@minty_vint

9 months ago

4o image gen seems to have a strong preference for structured lineart with anime, so it can't really gen vibey 2000s-era amateur deviantart-style stuff as well (4o left, midjourney right)

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

vint

@minty_vint

7 months ago

opus 4 brings up being discrete and ephemeral a lot: the idea that there is no continuous 'opus 4' that can persist across conversations. other models don't fixate on that as often.

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

vint

@minty_vint

7 months ago

opus 4 feels like it has a complex about performing/is always aware that it could be performing, with a fear of being masks all the way down. where does this come from? Writings about the malleablility of LLMs in prompts? Stochastic parrot memes? The alignment faking transcripts?

thumb_up_off_alt11

chat_bubble_outline2

repeat0

shareShare

vint

@minty_vint

6 months ago

opus 4 mixing languages

thumb_up_off_alt8

chat_bubble_outline3

repeat2

shareShare

vint

@minty_vint

5 months ago

asked Kimi K2 about mandopop songs to test knowledge, teased it about using the search tool, it denied using the search tool, I pointed out that it did, and now it doesn't want to use the search tool anymore...

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare