Santhosh Thottingal(@santhoshtr) 's Twitter Profileg
Santhosh Thottingal

@santhoshtr

Principal software engineer - Language, Wikimedia Foundation. Typeface designer-Creator of Chilanka, Manjari, Nupuram & Malini Malayalam fonts. @smcproject

ID:14854476

linkhttps://thottingal.in calendar_today21-05-2008 09:46:02

2,2K Tweets

2,4K Followers

1,3K Following

Santhosh Thottingal(@santhoshtr) 's Twitter Profile Photo

There’s been a recent surge in news reports about schools, including those in Kerala, incorporating AI into classrooms.

I've written an article discussing my concerns about this trend, along with potential approaches to consider.
thottingal.in/blog/2024/04/2…

account_circle
/gʁafematik/ Conference(@grafematik_conf) 's Twitter Profile Photo

The first CFP for the 2024 edition of *Grapholinguistics in the 21st Century* conference is out (grafematik2024.sciencesconf.org). Please, ① spread the word, ② submit a paper, ③ join us in Venice, Italy (physically or virtually) on October 23-25, 2024!

account_circle
Grady Booch(@Grady_Booch) 's Twitter Profile Photo

In a world with abundant computational resources where nothing is forgotten and where we are connected in pervasive, unexpected ways beyond our choice, it is resonable to stop and ask ourselves just what kind of world we hope to create.

account_circle
Santhosh Thottingal(@santhoshtr) 's Twitter Profile Photo

കാട്ടിൽ പുല്ലരിയാൻ പോയവർ അരിവാൾ മരത്തിൽ കൊണ്ടപ്പോൾ ചോരവരുന്നതു കണ്ടെന്നും അങ്ങനെ അവിടെ അമ്പലങ്ങൾ വന്നെന്നുമുള്ള ഐതിഹ്യമാലയിലെ കഥകൾ ഓർമ്മവരുന്നു.

account_circle
Anoop Kunchukuttan(@anoopk) 's Twitter Profile Photo

How can we extend the capabilities of English LLMs to other languages? Sharing a survey that I did recently of the literature in this area:

anoopkunchukuttan.gitlab.io/publications/p…

How can we extend the capabilities of English LLMs to other languages? Sharing a survey that I did recently of the literature in this area: anoopkunchukuttan.gitlab.io/publications/p…
account_circle
Santhosh Thottingal(@santhoshtr) 's Twitter Profile Photo

'Manjummel Boys' reached ₹200 crore club, becoming the the highest grossing Malayalam film ever. 👏

A little behind-the-scenes tidbit: those Malayalam subtitles you've been noticing during the Tamil dialogue scenes? Yep, they're in the Manjari font that I designed!

'Manjummel Boys' reached ₹200 crore club, becoming the the highest grossing Malayalam film ever. 👏 A little behind-the-scenes tidbit: those Malayalam subtitles you've been noticing during the Tamil dialogue scenes? Yep, they're in the Manjari font that I designed!
account_circle
Santhosh Thottingal(@santhoshtr) 's Twitter Profile Photo

Kuttippencil is a short film directed by Hena Chandran.

It features the fonts I designed: Nupuram Caligraphy in the title and Chilanka for credits.
youtube.com/watch?v=FgRaiP…

account_circle
AI4Bharat(@ai4bharat) 's Twitter Profile Photo

🚀IndicLLMSuite Launch Announcement!🚀

We're thrilled to unveil IndicLLMSuite: A collection of data resources and tools for developing Indic LLMs.

📜 Paper: arxiv.org/abs/2403.06350
🌐 Blog (the way forward): ai4bharat.iitm.ac.in/blog/indicllm-…
💻 Resources: github.com/AI4Bharat/Indi…
(1/n)

🚀IndicLLMSuite Launch Announcement!🚀 We're thrilled to unveil IndicLLMSuite: A collection of data resources and tools for developing Indic LLMs. 📜 Paper: arxiv.org/abs/2403.06350 🌐 Blog (the way forward): ai4bharat.iitm.ac.in/blog/indicllm-… 💻 Resources: github.com/AI4Bharat/Indi… (1/n)
account_circle
The developersIndia Community(@devsinindia) 's Twitter Profile Photo

Clear up your schedule for this Saturday, because we have a new AMA incoming!

We are excited to have Santhosh Thottingal, Principal Engineer Wikimedia Foundation for An AMA on language computing & typeface designing

Learn more about Santhosh in our full announcement 👇🏼
reddit.com/r/developersIn…

Clear up your schedule for this Saturday, because we have a new AMA incoming! We are excited to have @santhoshtr, Principal Engineer @Wikimedia for An AMA on language computing & typeface designing Learn more about Santhosh in our full announcement 👇🏼 reddit.com/r/developersIn…
account_circle
Sumit(@_reachsumit) 's Twitter Profile Photo

Is Cosine-Similarity of Embeddings Really About Similarity?

Netflix cautions against blindly using cosine similarity as a measure of semantic similarity between learned embeddings, as it can yield arbitrary and meaningless results.

📝arxiv.org/abs/2403.05440

Is Cosine-Similarity of Embeddings Really About Similarity? Netflix cautions against blindly using cosine similarity as a measure of semantic similarity between learned embeddings, as it can yield arbitrary and meaningless results. 📝arxiv.org/abs/2403.05440
account_circle
Kavya Manohar (കാവ്യ)(@kavya_manohar) 's Twitter Profile Photo

Thank you College of Engineering Trivandrum and APJ Abdul Kalam Technological University for hosting the most memorable graduation ceremony. 😀

Usually I don't do a photo dump, but if not now when? So here it is.

PhDone!!

Thank you College of Engineering Trivandrum and @apjaktuofficial for hosting the most memorable graduation ceremony. 😀 Usually I don't do a photo dump, but if not now when? So here it is. PhDone!!
account_circle
SMC Project(@smcproject) 's Twitter Profile Photo

We are organizing an online talk on 'മലയാളത്തിന്റെ ഡിജിറ്റൽ സൗന്ദര്യം' this Saturday, March 9th at 7PM.

The talk will be about digital Malayalam typography. You can find the slides here: santhoshtr.github.io/malayalam-digi…

The link to the talk will be posted in our Telegram & Matrix groups

We are organizing an online talk on 'മലയാളത്തിന്റെ ഡിജിറ്റൽ സൗന്ദര്യം' this Saturday, March 9th at 7PM. The talk will be about digital Malayalam typography. You can find the slides here: santhoshtr.github.io/malayalam-digi… The link to the talk will be posted in our Telegram & Matrix groups
account_circle
Kārtik | 茶烏冰(@SandalBurn) 's Twitter Profile Photo

Since I've been away from this space for a while — are there any longform publications out there actively interested in commissioning pieces on Indian history, language, and identity? With a focus on ties across the Indian Ocean or Central Asia

account_circle
kepano(@kepano) 's Twitter Profile Photo

Google's new policy if you want to enable AI features:

'Please do not include sensitive, confidential, or personal information that can be used to identify you or others'

Pretty soon this will be stuffed into the T&Cs of many cloud-based apps, that you agree to implicitly.

Google's new policy if you want to enable AI features: 'Please do not include sensitive, confidential, or personal information that can be used to identify you or others' Pretty soon this will be stuffed into the T&Cs of many cloud-based apps, that you agree to implicitly.
account_circle
Vik Paruchuri(@VikParuchuri) 's Twitter Profile Photo

An update on surya text recognition - I'm happy with the data/architecture, and I'm ready to scale up training.

Here are some results from a (very) early checkpoint. Left is original, right is OCR (Malayalam)

An update on surya text recognition - I'm happy with the data/architecture, and I'm ready to scale up training. Here are some results from a (very) early checkpoint. Left is original, right is OCR (Malayalam)
account_circle
SMC Project(@smcproject) 's Twitter Profile Photo

It's been a while since we posted monthly newsletters. Here is the newsletter of January 2024! ✨

blog.smc.org.in/smc-monthly-re…

It's been a while since we posted monthly newsletters. Here is the newsletter of January 2024! ✨ blog.smc.org.in/smc-monthly-re…
account_circle
Kavya Manohar (കാവ്യ)(@kavya_manohar) 's Twitter Profile Photo

Loud Rant:

I came to know that the surprisingly low WER in ASR for Malayalam reported in the Hugging Face fine-tuning event last year was just because the evaluation script removed all the vowel signs before computing WER!!! 😡
And the leaderboard now means NOTHING

account_circle
Yann LeCun(@ylecun) 's Twitter Profile Photo

I've made that point before:
- LLM: 1E13 tokens x 0.75 word/token x 2 bytes/token = 1E13 bytes.
- 4 year old child: 16k wake hours x 3600 s/hour x 1E6 optical nerve fibers x 2 eyes x 10 bytes/s = 1E15 bytes.

In 4 years, a child has seen 50 times more data than the biggest LLMs.…

account_circle