Juraj Vladika(@JurajVladika) 's Twitter Profile Photo

Glad to share our survey on automated Scientific Fact-Checking, accepted to Findings of ! 🎉🇨🇦

We analyze the datasets & approaches, discuss discovered challenges, and provide future directions for this emerging task.🔎🔬

📜: arxiv.org/abs/2305.16859
nlp

Glad to share our survey on automated Scientific Fact-Checking, accepted to Findings of #ACL2023! 🎉🇨🇦

We analyze the datasets & approaches, discuss discovered challenges, and provide future directions for this emerging #NLProc task.🔎🔬

📜: arxiv.org/abs/2305.16859 
#ACL2023nlp
account_circle
Elizabeth Clark(@eaclark07) 's Twitter Profile Photo

We are excited to release Seahorse 🌊🐴, a ✨multilingual, multifaceted summarization evaluation dataset✨
96,000+ human ratings to enable faster progress in training and evaluating learnt metrics for summarization!
Preprint: arxiv.org/abs/2305.13194
Data: goo.gle/seahorse

account_circle
Omer Ali Bayraktar(@bayraktar_lab) 's Twitter Profile Photo

here comes WebAtlas: our 'Google Maps' for tissue atlases with integrated single cell and spatial transcriptomics so anyone & anywhere can access/explore atlas datasets on a web browser. Fantastic collab with Muzlifah Haniffa, led by Tong LI 李彤 Dave Horsfall & Daniela. Links 👇

account_circle
Yao Fu(@Francis_YAO_) 's Twitter Profile Photo

If you take a look at some real challenging datasets like Chain-of-Thought Hub, you will see how large the gap between small models vs large ones

github.com/FranxYao/chain…

Again, pushing forward open source is awesome, but one should be honest to themselves.

If you take a look at some real challenging datasets like Chain-of-Thought Hub, you will see how large the gap between small models vs large ones 

github.com/FranxYao/chain…

Again, pushing forward open source is awesome, but one should be honest to themselves.
account_circle
Oscar Sainz(@osainz59) 's Twitter Profile Photo

As an example, for popular datasets like CoNLL03, ChatGPT is capable of generating the training, validation, and even test splits. It turns out that ChatGPT has been evaluated as a zero-shot or few-shot NER system on this dataset by multiple papers.

🧵2/5

As an example, for popular datasets like CoNLL03, ChatGPT is capable of generating the training, validation, and even test splits. It turns out that ChatGPT has been evaluated as a zero-shot or few-shot NER system on this dataset by multiple papers.

🧵2/5
account_circle
Gleb Mikhaylov(@glebmikha) 's Twitter Profile Photo

Yesterday, one of my data analysis students told me about the chatGPT plugin for Noteable. It's amazing! I'm no longer waiting for the Code Interpreter plugin; I can chat with my datasets right now!

account_circle
Oscar Sainz(@osainz59) 's Twitter Profile Photo

⚠️Did cheat on your test? Probably yes

Many papers have evaluated ChatGPT on various benchmarks. However, it is important to consider that LLMs might have seen and memorized these datasets during pretraining.

Read our latest blog post: hitz-zentroa.github.io/lm-contaminati…
🧵1/5

account_circle
Isabelle Mohr(@isabelle_mohr) 's Twitter Profile Photo

Looking for a fresh, visual solution to your search needs? 👀

Explore the Multimodal Search Map: our map-based search interface, ready to be customized to unique datasets and use cases. Really fun for 🤩

Dive in 👉 jina.ai/multimodal-sea…

account_circle
Edward Beeching(@edwardbeeching) 's Twitter Profile Photo

We have a new leader on the Open LLM leaderboard.

Congrats to ausboss/llama-30b-supercot!

They combined chain-of-thought datasets, code explanations and instructions, snippets, logical deductions and Alpaca GPT-4 prompts.

Check it out here: huggingface.co/spaces/Hugging…

We have a new leader on the Open LLM leaderboard.

Congrats to ausboss/llama-30b-supercot! 

They combined chain-of-thought datasets, code explanations and instructions, snippets, logical deductions and Alpaca GPT-4 prompts.

Check it out here: huggingface.co/spaces/Hugging…
account_circle
PYOFLIFE.COM(@Parajulisaroj16) 's Twitter Profile Photo

Hypothesis testing is a statistical technique used to make decisions about a population based on a sample of data.  pyoflife.com/how-to-perform…
stats

Hypothesis testing is a statistical technique used to make decisions about a population based on a sample of data.  pyoflife.com/how-to-perform…
#DataScience #rstats #DataAnalytics #r #programming #Statistics #datasets
account_circle
Ziming Liu(@ZimingLiu11) 's Twitter Profile Photo

Many scientific problems hinge on finding interpretable formulas that fit data, but neural networks are the outright opposite! Check out our recent work that make neural networks modular and interpretable. If you have interesting datasets at hand, we're happy to collaborate!

account_circle
Google AI(@GoogleAI) 's Twitter Profile Photo

Introducing a new differentially-private algorithm for clustering hierarchical graphs plus open-source code for a scalable differentially-private k-means algorithm, which can be applied to large datasets using distributed computing. Learn more at: goo.gle/3WP9NLz

account_circle
Mishig(@mishig25) 's Twitter Profile Photo

RefinedWeb is a massive English web dataset on which falcon-40b was pretrained on. Atm, falcon-40b holds the first position on top of llama-65b on Open LLM Leaderboard.

Explore the dataset through datasets-viewer: huggingface.co/datasets/tiiua…

account_circle
Tom Jobbins(@TheBlokeAI) 's Twitter Profile Photo

New 30B model from OpenAccess AI Collective (Wing Lian): Hippogriff 30B Chat
'an experiment that builds on Manticore with new datasets, while removing a few more instruction and chat datasets.'
huggingface.co/TheBloke/hippo…
huggingface.co/TheBloke/hippo…
Main repo: huggingface.co/openaccess-ai-…

account_circle
Min Choi(@minchoi) 's Twitter Profile Photo

4. More Efficient AI Training

Training large AI models is super resource and time consuming tasks. With DGX GH200's ability to work with Terabytes of datasets, developers can conduct research at larger scale and much faster speeds.

4. More Efficient AI Training

Training large AI models is super resource and time consuming tasks.  With DGX GH200's ability to work with Terabytes of datasets, developers can conduct research at larger scale and much faster speeds.
account_circle
Metaverse Things | Official(@MetaThingsMett) 's Twitter Profile Photo

The results for each object will get perfect as new datasets are added. These are users' test results. In addition, we started to add datasets for clothes, pants, shoes, and especially for objects such as swords, armor, helmets used in games.
$METT website metathingstech.com

account_circle
Zach Horn ⟁(@zacharyhorn) 's Twitter Profile Photo

“We find that Gorilla significantly outperforms GPT-4…”

This feels like the beginning of a momentum shift away from massive closed models, and towards application-specific fine tuning of open models with custom datasets.

“We find that Gorilla significantly outperforms GPT-4…”

This feels like the beginning of a momentum shift away from massive closed models, and towards application-specific fine tuning of open models with custom datasets.
account_circle
lancedb (YC W22)(@lancedb) 's Twitter Profile Photo

Lance is now the fastest growing open source columnar data format 🔥

Who's using Lance?
🚗 Self-driving company to store 1+ petabyte of data/day
🖼️ Generative AI company to train LLMs on 1PB+ datasets
🛒 E-commerce company for billion-scale vector search for 100x cheaper

1/2

Lance is now the fastest growing open source columnar data format 🔥

Who's using Lance?
🚗 Self-driving company to store 1+ petabyte of data/day
🖼️ Generative AI company to train LLMs on 1PB+ datasets
🛒 E-commerce company for billion-scale vector search for 100x cheaper

1/2
account_circle
Aran Komatsuzaki(@arankomatsuzaki) 's Twitter Profile Photo

The False Promise of Imitating Proprietary LLMs

Open-sourced LLMs are adept at mimicking ChatGPT’s style but not its factuality. There exists a substantial capabilities gap, which requires better base LM.

arxiv.org/abs/2305.15717

The False Promise of Imitating Proprietary LLMs

Open-sourced LLMs are adept at mimicking ChatGPT’s style but not its factuality. There exists a substantial capabilities gap, which requires better base LM.

arxiv.org/abs/2305.15717
account_circle
Angela Francis(@Angela_Francis_) 's Twitter Profile Photo

💡 Are you making the most of your marketing data? Dive deep into analytics to gain valuable insights about your customers' behaviors and preferences. Use these insights to refine your strategies and drive better results.

💡 Are you making the most of your marketing data? Dive deep into analytics to gain valuable insights about your customers' behaviors and preferences. Use these insights to refine your strategies and drive better results. #MarketingAnalytics #DataDrivenInsights
account_circle