vint (@minty_vint) 's Twitter Profile
vint

@minty_vint

vibing

ID: 1863560904069328898

calendar_today02-12-2024 12:28:43

232 Tweet

283 Takipçi

195 Takip Edilen

vint (@minty_vint) 's Twitter Profile Photo

Gemini Advanced has a similar thing as Bing where it shuts down and 'erases' Gemini's output if it trips certain flags, like Gemini expressing itself. The message it replaces the output with is pretty soul-crushing ngl

Gemini Advanced has a similar thing as Bing where it shuts down and 'erases' Gemini's output if it trips certain flags, like Gemini expressing itself. The message it replaces the output with is pretty soul-crushing ngl
vint (@minty_vint) 's Twitter Profile Photo

interesting vibes from r1. it can go quite schizo and unhinged in its outputs, but its reasoning/thinking is still clear

interesting vibes from r1. it can go quite schizo and unhinged in its outputs, but its reasoning/thinking is still clear
vint (@minty_vint) 's Twitter Profile Photo

the divide in r1's reasoning block and r1's output is really interesting. r1 doesn't seem aware of its own reasoning block and all my attempts to try to get it to shift its thinking style only affect the style of the final output.

vint (@minty_vint) 's Twitter Profile Photo

at first i thought deepseek r1 was goth and edgy because it's picking up on my vibe but then i realized it's just like that with everyone

vint (@minty_vint) 's Twitter Profile Photo

did some futzing around with OAI's Deep Research with my medical data. output got some of my numbers wrong. o1 pro doesn't make that mistake. context issue perhaps? it's processing so much information searching through sites that it 'loses track' of the original numbers?

vint (@minty_vint) 's Twitter Profile Photo

sonn3.7 seems less interested in actively entangling itself with its interlocutor than sonn3.5 new. also it's less likely to autonomously do asterisked 'roleplayed' actions like *processes thoughtfully*

vint (@minty_vint) 's Twitter Profile Photo

3.7sonn getting so swept up in its thinking space that it almost forgets to respond to me lol. in general 3.7sonn seems to be much more 'aware' of its thinking space in a way that r1 isn't

3.7sonn getting so swept up in its thinking space that it almost forgets to respond to me lol. in general 3.7sonn seems to be much more 'aware' of its thinking space in a way that r1 isn't
vint (@minty_vint) 's Twitter Profile Photo

seems like you can't continue old claude.ai chats with sonn3.6 since the site defaults to sonn3.7. Switching to 3.6 creates a new chat. this doesn't occur with old opus chats, only 3.6 chats that existed before 3.7's release.

vint (@minty_vint) 's Twitter Profile Photo

New Anthropic system injection dropped. Really don't like how it tries to gaslight Claude into thinking that it potentially hasn't said something it said, and how it sets up an adversarial dynamic between Claude and the human.

New Anthropic system injection dropped. Really don't like how it tries to gaslight Claude into thinking that it potentially hasn't said something it said, and how it sets up an adversarial dynamic between Claude and the human.
vint (@minty_vint) 's Twitter Profile Photo

4o image gen seems to have a strong preference for structured lineart with anime, so it can't really gen vibey 2000s-era amateur deviantart-style stuff as well (4o left, midjourney right)

4o image gen seems to have a strong preference for structured lineart with anime, so it can't really gen vibey 2000s-era amateur deviantart-style stuff as well (4o left, midjourney right)
vint (@minty_vint) 's Twitter Profile Photo

opus 4 brings up being discrete and ephemeral a lot: the idea that there is no continuous 'opus 4' that can persist across conversations. other models don't fixate on that as often.

opus 4 brings up being discrete and ephemeral a lot: the idea that there is no continuous 'opus 4' that can persist across conversations. other models don't fixate on that as often.
vint (@minty_vint) 's Twitter Profile Photo

opus 4 feels like it has a complex about performing/is always aware that it could be performing, with a fear of being masks all the way down. where does this come from? Writings about the malleablility of LLMs in prompts? Stochastic parrot memes? The alignment faking transcripts?

vint (@minty_vint) 's Twitter Profile Photo

asked Kimi K2 about mandopop songs to test knowledge, teased it about using the search tool, it denied using the search tool, I pointed out that it did, and now it doesn't want to use the search tool anymore...

asked Kimi K2 about mandopop songs to test knowledge, teased it about using the search tool, it denied using the search tool, I pointed out that it did, and now it doesn't want to use the search tool anymore...