Alex Trott (@alexrtrott) 's Twitter Profile
Alex Trott

@alexrtrott

Research @DbrxMosaicAI. Neuroscience PhD in a previous life. Whispering models into sentience one parameter at a time. (opinions are my own.)

ID: 765618116063887360

calendar_today16-08-2016 18:35:58

144 Tweet

725 Takipçi

283 Takip Edilen

Andriy Burkov (@burkov) 's Twitter Profile Photo

The DBRX model is very strong out of the box. I predict it to be similar or even beat Mistral Large (it hallucinates less than Mistral Large and even than Opus in my tests). If you have enough large GPUs to be able to finetune it, the result will be the best among the open LLMs.

Richard Socher (@richardsocher) 's Twitter Profile Photo

Try out the new Databricks open source model. Does very well on code! First on you .com @YouSearchEngine Great to have a new top open source model. Amazing work by their team!

Try out the new <a href="/databricks/">Databricks</a> open source model. Does very well on code!
First on you .com @YouSearchEngine 

Great to have a new top open source model. Amazing work by their team!
Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

Databricks Mosaic Research DBRX outperforms OpenAI GPT-4 on realistic, domain-specific benchmark datasets. For example, on a customer support summarization use-case👇👇👇 Still neck and neck but it shows that open models can be the no-brainer choice for actual enterprise applications.

<a href="/DbrxMosaicAI/">Databricks Mosaic Research</a> DBRX outperforms <a href="/OpenAI/">OpenAI</a> GPT-4 on realistic, domain-specific benchmark datasets. For example, on a customer support summarization use-case👇👇👇
Still neck and neck but it shows that open models can be the no-brainer choice for actual enterprise applications.
Bill Yuchen Lin (@billyuchenlin) 's Twitter Profile Photo

🆕 Check out the recent update of 𝕎𝕚𝕝𝕕𝔹𝕖𝕟𝕔𝕙! We have included a few more models including DBRX-Instruct Databricks and StarlingLM-beta (7B) Nexusflow which are both super powerful! DBRX-Instruct is indeed the best open LLM; Starling-LM 7B outperforms a lot of even

🆕 Check out the recent update of 𝕎𝕚𝕝𝕕𝔹𝕖𝕟𝕔𝕙! We have included a few more models including DBRX-Instruct <a href="/databricks/">Databricks</a> and StarlingLM-beta (7B) <a href="/NexusflowX/">Nexusflow</a> which are both super powerful! DBRX-Instruct is indeed the best open LLM; Starling-LM 7B outperforms a lot of even
Ali Madani (@thisismadani) 's Twitter Profile Photo

Can AI rewrite our human genome? ⌨️🧬 Today, we announce the successful editing of DNA in human cells with gene editors fully designed with AI. Not only that, we've decided to freely release the molecules under the Profluent OpenCRISPR initiative. Lots to unpack👇

Matei Zaharia (@matei_zaharia) 's Twitter Profile Photo

Thrilled that Forrester named Databricks a Leader in their report on AI Foundation Models in enterprise! databricks.com/blog/databrick… We help organizations build the best AI for *their* domain and data, using the best techniques available, with a world-class research team to back it.

Thrilled that Forrester named Databricks a Leader in their report on AI Foundation Models in enterprise! databricks.com/blog/databrick… We help organizations build the best AI for *their* domain and data, using the best techniques available, with a world-class research team to back it.
Cody Blakeney (@code_star) 's Twitter Profile Photo

Pretraining data experiments are expensive as measuring the impact of data on emergent tasks requires large FLOP scales. How do you determine what subsets of your data are important for the mixture of tasks you care about? We present Domain upsampling: a strategy to better

Pretraining data experiments are expensive as measuring the impact of data on emergent tasks requires large FLOP scales. How do you determine what subsets of your data are important for the mixture of tasks you care about?

We present Domain upsampling: a strategy to better
Zack Ankner (@zackankner) 's Twitter Profile Photo

Excited to announce our new work: Critique-out-Loud (CLoud) reward models. CLoud reward models first produce a chain of thought critique of the input before predicting a scalar reward, allowing reward models to reason explicitly instead of implicitly! arxiv.org/abs/2408.11791

Jaehun Jung (@jaehunjung_com) 's Twitter Profile Photo

Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? 🤔 𝐃𝐚𝐭𝐚 𝐝𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 is key, when measured correct—it strongly predicts model generalization in reasoning tasks! 🧵

Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? 🤔

𝐃𝐚𝐭𝐚 𝐝𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 is key, when measured correct—it strongly predicts model generalization in reasoning tasks! 🧵
Jonathan Frankle (@jefrankle) 's Twitter Profile Photo

I'm at ICML 🇨🇦 and I'm hiring at Databricks. Visit our booth if you're interested. My scientific focus: It's 1972 in AI, there's an AI crisis, Dijkstra isn't here to save us, and maybe RL can. Why Databricks? The long road to AGI is being paved here and we have the real evals 🧵

Jonathan Frankle (@jefrankle) 's Twitter Profile Photo

Not that I have a favorite recent project, but... 🧵 LLM judges are the popular way to evaluate generative models. But they have drawbacks. They're: * Generative, so slow and expensive. * Nondeterministic. * Uncalibrated. They don't know how uncertain they are. Meet PGRM!