Twitter #ApacheSpark hashtag • TwiCopy

Luca Canali

5 days ago

🚀 New Blog Post: 'Building an Apache Spark Performance Lab: Tools and Techniques for Spark Optimization.' Get tips, tools, and short demos to boost your Spark performance. Ideal for developers! 🛠️ #ApacheSpark #Performance

Read more: db-blog.web.cern.ch/node/195

🚀 New Blog Post: 'Building an Apache Spark Performance Lab: Tools and Techniques for Spark Optimization.' Get tips, tools, and short demos to boost your Spark performance. Ideal for developers! 🛠️#ApacheSpark #Performance

Read more: db-blog.web.cern.ch/node/195

thumb_up_off_alt13

chat_bubble_outline0

account_circle

SkillExchangeXYZ

@skillexchange_x

11 minutes ago

🔍 Dandi is hiring Senior Data Engineer (Full Time - Hybrid)

🔗 Apply Now: skillexchange.xyz/job/senior-dat…

#Hiring #Jobs #JobSearch #Engineering #ApacheSpark #Scala #K8S #MLlib #TensorFlow #VertexAI

thumb_up_off_alt0

chat_bubble_outline0

account_circle

Coiled

19 hours ago

How does Dask compare to Apache Spark? We’ve re-run the TPC-H benchmarks and noticed some interesting results 🧵

docs.coiled.io/blog/spark-vs-…

How does @dask_dev compare to @ApacheSpark? We’ve re-run the TPC-H benchmarks and noticed some interesting results 🧵

docs.coiled.io/blog/spark-vs-…

thumb_up_off_alt14

chat_bubble_outline0

account_circle

No Priors

1 year ago

New🔥 Ep#11: sarah guo // conviction & Elad Gil talk to Matei Zaharia, founder Databricks, creator of Apache Spark, Stanford University CS professor:
- Dolly, betting on small models
- scaling asymptotes
- LLMs in the enterprise
- academic -> founder/CTO of $1B+ revenue co
🎙no-priors.com

thumb_up_off_alt37

chat_bubble_outline0

account_circle

Mim

6 months ago

same code, just replace import SparkSession from #apachespark with #DuckDB and it is nearly 4 X faster

same code, just replace import SparkSession from #apachespark with #DuckDB and it is nearly 4 X faster

thumb_up_off_alt73

chat_bubble_outline0

account_circle

Kyle Weller

3 months ago

Amazon revealed the data arch of their package delivery platform. Since working from home, I witness a steady stream of packages on my porch and I'm starting to wonder how many GBs of data my spouse has contributed to this dataset... 💸
#apachehudi #apachespark

🧵link below👇

Amazon revealed the data arch of their package delivery platform. Since working from home, I witness a steady stream of packages on my porch and I'm starting to wonder how many GBs of data my spouse has contributed to this dataset... 💸
#apachehudi #apachespark

🧵link below👇

thumb_up_off_alt12

chat_bubble_outline0

account_circle

DTS (DegenTogetherStrong)

2 months ago

Found a low cap DeSci project backed by PolygonDAO and partnered with TensorFlow Apache Spark Cerebras & many more
Sounds interesting?

thumb_up_off_alt86

chat_bubble_outline0

account_circle

Matthew Powers

8 months ago

Did you know that you can query PySpark DataFrames with SQL now without creating a temporary table/view?

This is a huge quality of life improvement for #pyspark users and shows how Apache Spark is continuously improving.

Did you know that you can query PySpark DataFrames with SQL now without creating a temporary table/view?

This is a huge quality of life improvement for #pyspark users and shows how @ApacheSpark is continuously improving.

thumb_up_off_alt139

chat_bubble_outline0

account_circle

Igor De Souza

3 weeks ago

Scaling AI/ML Infrastructure at Uber.

#apachekafka
#apacheflink
#apachespark

uber.com/en-IE/blog/sca…

Scaling AI/ML Infrastructure at Uber.

#apachekafka
#apacheflink
#apachespark

uber.com/en-IE/blog/sca…

thumb_up_off_alt33

chat_bubble_outline0

account_circle

Khuyen Tran

2 months ago

Retrieving all rows from a large dataset into memory can cause out-of-memory errors. #ApacheSpark DataFrame delays computations until collect() is called, allowing for row reduction through filtering or aggregating.

This results in more efficient memory usage.

Retrieving all rows from a large dataset into memory can cause out-of-memory errors. #ApacheSpark DataFrame delays computations until collect() is called, allowing for row reduction through filtering or aggregating.

This results in more efficient memory usage.

thumb_up_off_alt140

chat_bubble_outline0

account_circle

Matei Zaharia

10 months ago

One of my favorite announcements: English SDK for Apache Spark! No more need to remember weird syntax, just chain transformations in natural language with the familiar Spark API. So many fun examples.
databricks.com/blog/introduci…

One of my favorite announcements: English SDK for @ApacheSpark! No more need to remember weird syntax, just chain transformations in natural language with the familiar Spark API. So many fun examples.
databricks.com/blog/introduci…

thumb_up_off_alt423

chat_bubble_outline0

account_circle

Khuyen Tran

3 months ago

In PySpark, parametrized queries enable the same query structure to be reused with different inputs, without rewriting the SQL.

Additionally, they safeguard against SQL injection attacks by treating input data as parameters rather than as executable code.

#ApacheSpark

In PySpark, parametrized queries enable the same query structure to be reused with different inputs, without rewriting the SQL.

Additionally, they safeguard against SQL injection attacks by treating input data as parameters rather than as executable code.

#ApacheSpark

thumb_up_off_alt205

chat_bubble_outline0

account_circle

Jacek Laskowski @[email protected]

@jaceklaskowski

2 weeks ago

And we know what's coming in #ApacheSpark 4.0.0. This version surely makes us all long-time Spark users soooo OLD! 😆

And I'd not be surprised if some tricks of mine may've happened to be outdated already 😉

Named parameters in SQL statements are already available since 3.5.

And we know what's coming in #ApacheSpark 4.0.0. This version surely makes us all long-time Spark users soooo OLD! 😆

And I'd not be surprised if some tricks of mine may've happened to be outdated already 😉

Named parameters in SQL statements are already available since 3.5.

thumb_up_off_alt14

chat_bubble_outline0

account_circle

Docker Build Cloud is here! 🐳🧱☁️

6 months ago

Trusted Content to build your AI or ML stack from:

R
Python
LangChain
@Ollama_ai
TensorFlow
PyTorch
Intel Federated Learning
Apache Spark
Project Jupyter
Redis
Amazon Web Services Sagemaker
MLflow
& so many more.

thumb_up_off_alt40

chat_bubble_outline0

account_circle

Khuyen Tran

2 weeks ago

Duplicated code in #SQL queries can lead to inconsistencies if changes are made to one instance of the duplicated code but not to others.

Apache Spark UDFs can help address these issues by encapsulating complex logic that is reused across multiple SQL queries.

Duplicated code in #SQL queries can lead to inconsistencies if changes are made to one instance of the duplicated code but not to others.

@ApacheSpark UDFs can help address these issues by encapsulating complex logic that is reused across multiple SQL queries.

thumb_up_off_alt127

chat_bubble_outline0

account_circle

Anna Geller

8 months ago

New blog post: a deep dive into dataframes and table abstractions featuring polars data, DuckDB, pandas, dbt, Apache Spark, Dask, Ponder, Fugue Project, ... — when to use which framework and how do they compare or integrate with each other

New blog post: a deep dive into dataframes and table abstractions featuring @DataPolars, @duckdb, @pandas_dev, @getdbt, @ApacheSpark, @dask_dev, @ponderdata, @fugue_project, ... — when to use which framework and how do they compare or integrate with each other

thumb_up_off_alt135

chat_bubble_outline0

account_circle

Khuyen Tran

6 months ago

Spark enables scaling of your pandas workloads across multiple nodes. However, learning PySpark syntax can be daunting for pandas users.

Pandas API on Spark enables leveraging Spark's capabilities for big data while retaining a familiar pandas-like syntax.

#apachespark #pandas

Spark enables scaling of your pandas workloads across multiple nodes. However, learning PySpark syntax can be daunting for pandas users.

Pandas API on Spark enables leveraging Spark's capabilities for big data while retaining a familiar pandas-like syntax.

#apachespark #pandas

thumb_up_off_alt202

chat_bubble_outline0

account_circle

Khuyen Tran

1 month ago

#ApacheSpark 3.5 added new array helper functions that simplify the process of working with array data. Below are a few examples showcasing these new array functions.

🚀 View other array functions: bit.ly/4c0txD1
⭐️ Bookmark this post: bit.ly/3TnNCM3

#ApacheSpark 3.5 added new array helper functions that simplify the process of working with array data. Below are a few examples showcasing these new array functions.

🚀 View other array functions: bit.ly/4c0txD1
⭐️ Bookmark this post: bit.ly/3TnNCM3

thumb_up_off_alt82

chat_bubble_outline0

account_circle

Eléa

1 year ago

Salle comble au #VeryTechTrip pour le talk de Claire et Adrien sur le traitement d'images distribué au service de l' #IA ! Un réel gain de temps pour le traitement de données 😉... OVHcloud_Tech

#apachespark #dataprocessing

Salle comble au #VeryTechTrip pour le talk de Claire et Adrien sur le traitement d'images distribué au service de l'#IA ! Un réel gain de temps pour le traitement de données 😉... @OVHcloud_Tech

#apachespark #dataprocessing

thumb_up_off_alt21

chat_bubble_outline0

account_circle