Kyle Weller(@KyleJWeller) 's Twitter Profile Photo

is a benchmarking machine 💪 I saw the Doris vs benchmark this week so I was curious and I found many more... If you want some 🍿 on a Fri afternoon, I linked the ones I found in 🧵below

#ApacheDoris is a benchmarking machine 💪 I saw the Doris vs #Trino benchmark this week so I was curious and I found many more... If you want some 🍿 on a Fri afternoon, I linked the ones I found in 🧵below
#apachespark #elasticsearch #clickhouse #duckdb #apachepinot #starrocks
account_circle
Khuyen Tran(@KhuyenTran16) 's Twitter Profile Photo

Retrieving all rows from a large dataset into memory can cause out-of-memory errors. DataFrame delays computations until collect() is called, allowing for row reduction through filtering or aggregating.

This results in more efficient memory usage.

Retrieving all rows from a large dataset into memory can cause out-of-memory errors. #ApacheSpark DataFrame delays computations until collect() is called, allowing for row reduction through filtering or aggregating.

This results in more efficient memory usage.
account_circle
Matthew Powers(@neapowers) 's Twitter Profile Photo

Did you know that you can query PySpark DataFrames with SQL now without creating a temporary table/view?

This is a huge quality of life improvement for users and shows how Apache Spark is continuously improving.

Did you know that you can query PySpark DataFrames with SQL now without creating a temporary table/view?

This is a huge quality of life improvement for #pyspark users and shows how @ApacheSpark is continuously improving.
account_circle
Luca Canali(@LucaCanaliDB) 's Twitter Profile Photo

🚀 New Blog Post: 'Building an Apache Spark Performance Lab: Tools and Techniques for Spark Optimization.' Get tips, tools, and short demos to boost your Spark performance. Ideal for developers! 🛠️

Read more: db-blog.web.cern.ch/node/195

🚀 New Blog Post: 'Building an Apache Spark Performance  Lab: Tools and Techniques for Spark Optimization.' Get tips, tools, and  short demos to boost your Spark performance. Ideal for developers! 🛠️#ApacheSpark #Performance

Read more: db-blog.web.cern.ch/node/195
account_circle
No Priors(@NoPriorsPod) 's Twitter Profile Photo

New🔥 Ep#11: sarah guo // conviction & Elad Gil talk to Matei Zaharia, founder Databricks, creator of Apache Spark, Stanford University CS professor:
- Dolly, betting on small models
- scaling asymptotes
- LLMs in the enterprise
- academic -> founder/CTO of $1B+ revenue co
🎙no-priors.com

account_circle
Kyle Weller(@KyleJWeller) 's Twitter Profile Photo

Amazon revealed the data arch of their package delivery platform. Since working from home, I witness a steady stream of packages on my porch and I'm starting to wonder how many GBs of data my spouse has contributed to this dataset... 💸


🧵link below👇

Amazon revealed the data arch of their package delivery platform. Since working from home, I witness a steady stream of packages on my porch and I'm starting to wonder how many GBs of data my spouse has contributed to this dataset... 💸
#apachehudi #apachespark

🧵link below👇
account_circle
Khuyen Tran(@KhuyenTran16) 's Twitter Profile Photo

3.5 added new array helper functions that simplify the process of working with array data. Below are a few examples showcasing these new array functions.

🚀 View other array functions: bit.ly/4c0txD1
⭐️ Bookmark this post: bit.ly/3TnNCM3

#ApacheSpark 3.5 added new array helper functions that simplify the process of working with array data. Below are a few examples showcasing these new array functions.

🚀 View other array functions: bit.ly/4c0txD1
⭐️ Bookmark this post: bit.ly/3TnNCM3
account_circle
Anna Geller(@anna__geller) 's Twitter Profile Photo

New blog post: a deep dive into dataframes and table abstractions featuring polars data, DuckDB, pandas, dbt, Apache Spark, Dask, Ponder, Fugue Project, ... — when to use which framework and how do they compare or integrate with each other

New blog post: a deep dive into dataframes and table abstractions featuring @DataPolars, @duckdb, @pandas_dev, @getdbt, @ApacheSpark, @dask_dev, @ponderdata, @fugue_project, ... — when to use which framework and how do they compare or integrate with each other
account_circle
Khuyen Tran(@KhuyenTran16) 's Twitter Profile Photo

Duplicated code in queries can lead to inconsistencies if changes are made to one instance of the duplicated code but not to others.

Apache Spark UDFs can help address these issues by encapsulating complex logic that is reused across multiple SQL queries.

Duplicated code in #SQL queries can lead to inconsistencies if changes are made to one instance of the duplicated code but not to others.

@ApacheSpark UDFs can help address these issues by encapsulating complex logic that is reused across multiple SQL queries.
account_circle
Matthew Giglia(@matthewgiglia) 's Twitter Profile Photo

Early bird catches the worm 🐦

Save $400 by registering for the Databricks before April 30! You’ll explore the latest advances in , , , , , , and more! sprou.tt/1OcnUCtPPN1

account_circle
Wolfgang Strasser(@wstrasser) 's Twitter Profile Photo

The new runtime 1.2 is available
📣 Apache Spark 3.41
📣 Delta Lake 2.4.0
📣 R: 4.22
....
read more in the documentation: learn.microsoft.com/en-us/fabric/d…

Picture powered by DALL-E3 (chatGPT plus)

The new #MicrosoftFabric runtime 1.2 is available
📣 Apache Spark 3.41
📣 Delta Lake 2.4.0
📣 R: 4.22
....
read more in the documentation: learn.microsoft.com/en-us/fabric/d…

Picture powered by DALL-E3 (chatGPT plus)

#ApacheSpark #PowerBI
account_circle
Luca Canali(@LucaCanaliDB) 's Twitter Profile Photo

🚀 Just dropped a fresh blog post! Dive into the world of Apache Spark optimization with flame graphs, featuring a hands-on example with Grafana Pyroscope. 🔥📈 🔗 db-blog.web.cern.ch/node/193

🚀 Just dropped a fresh blog post! Dive into the world of Apache Spark optimization with flame graphs, featuring a hands-on example with Grafana Pyroscope. 🔥📈 🔗 db-blog.web.cern.ch/node/193

#ApacheSpark #FlameGraphs #Pyroscope
account_circle