Twitter #apachespark hashtag • TwiCopy

repeat5

account_circle

Kyle Weller

@KyleJWeller

1 month ago

#ApacheDoris is a benchmarking machine 💪 I saw the Doris vs #Trino benchmark this week so I was curious and I found many more... If you want some 🍿 on a Fri afternoon, I linked the ones I found in 🧵below
#apachespark #elasticsearch #clickhouse #duckdb #apachepinot #starrocks

thumb_up_off_alt11

repeat2

account_circle

Retrieving all rows from a large dataset into memory can cause out-of-memory errors. #ApacheSpark DataFrame delays computations until collect() is called, allowing for row reduction through filtering or aggregating.

This results in more efficient memory usage.

account_circle

Matthew Powers

@neapowers

8 months ago

Did you know that you can query PySpark DataFrames with SQL now without creating a temporary table/view?

This is a huge quality of life improvement for #pyspark users and shows how Apache Spark is continuously improving.

account_circle

Luca Canali

@LucaCanaliDB

4 days ago

🚀 New Blog Post: 'Building an Apache Spark Performance Lab: Tools and Techniques for Spark Optimization.' Get tips, tools, and short demos to boost your Spark performance. Ideal for developers! 🛠️ #ApacheSpark #Performance

Read more: db-blog.web.cern.ch/node/195

thumb_up_off_alt13

repeat3

account_circle

No Priors

@NoPriorsPod

1 year ago

New🔥 Ep#11: sarah guo // conviction & Elad Gil talk to Matei Zaharia, founder Databricks, creator of Apache Spark, Stanford University CS professor:
- Dolly, betting on small models
- scaling asymptotes
- LLMs in the enterprise
- academic -> founder/CTO of $1B+ revenue co
🎙no-priors.com

account_circle

Kyle Weller

@KyleJWeller

3 months ago

Amazon revealed the data arch of their package delivery platform. Since working from home, I witness a steady stream of packages on my porch and I'm starting to wonder how many GBs of data my spouse has contributed to this dataset... 💸
#apachehudi #apachespark

🧵link below👇

thumb_up_off_alt12

repeat3

account_circle

Igor De Souza

@Igfasouza

3 weeks ago

Scaling AI/ML Infrastructure at Uber.

#apachekafka
#apacheflink
#apachespark

uber.com/en-IE/blog/sca…

account_circle

Khuyen Tran

@KhuyenTran16

1 month ago

#ApacheSpark 3.5 added new array helper functions that simplify the process of working with array data. Below are a few examples showcasing these new array functions.

🚀 View other array functions: bit.ly/4c0txD1
⭐️ Bookmark this post: bit.ly/3TnNCM3

account_circle

Anna Geller

@anna__geller

8 months ago

New blog post: a deep dive into dataframes and table abstractions featuring polars data, DuckDB, pandas, dbt, Apache Spark, Dask, Ponder, Fugue Project, ... — when to use which framework and how do they compare or integrate with each other

New blog post: a deep dive into dataframes and table abstractions featuring @DataPolars, @duckdb, @pandas_dev, @getdbt, @ApacheSpark, @dask_dev, @ponderdata, @fugue_project, ... — when to use which framework and how do they compare or integrate with each other

account_circle

Khuyen Tran

@KhuyenTran16

2 weeks ago

Duplicated code in #SQL queries can lead to inconsistencies if changes are made to one instance of the duplicated code but not to others.

Apache Spark UDFs can help address these issues by encapsulating complex logic that is reused across multiple SQL queries.

account_circle

Analytics Insight

@analyticsinme

18 hours ago

Apache Spark vs. Jupyter: The Ultimate Data Science Battle!

tinyurl.com/bdwas6we

#ApacheSpark VsJupyter #BestDataScienceTool #DataScienceTool #ApacheSpark #JupyterNotebook #AINews #AnalyticsInsight #AnalyticsInsight Magazine

thumb_up_off_alt0

account_circle

Matthew Giglia

@matthewgiglia

5 days ago

Early bird catches the worm 🐦

Save $400 by registering for the Databricks #DataAISummit before April 30! You’ll explore the latest advances in #ApacheSpark , #DeltaLake , #MLflow , #LangChain , #PyTorch , #dbt , and more! #DAIS sprou.tt/1OcnUCtPPN1

thumb_up_off_alt1

account_circle

DTS (DegenTogetherStrong)

@DTSCapital

2 months ago

Found a low cap DeSci project backed by PolygonDAO and partnered with TensorFlow Apache Spark Cerebras & many more
Sounds interesting?

account_circle

Technavik Solutions

@technaviksolns

6 days ago

Unlocking Insights with Databricks: Technavik Solutions Review.

To read our full review, click the link below:
linkedin.com/feed/update/ur…

#technavi_productshowcase #techcurator #DataScienceEngineering #DataAnalytics #ApacheSpark #CollaborativeWorking

thumb_up_off_alt0

account_circle

Dirk Van den Poel

@dirkvandenpoel

6 months ago

Today’s online lecture of my #BigData class is on introducing #PySpark for data science #MachineLearning #orms #python #DataScience #dataanalytics #ApacheSpark #SQL

thumb_up_off_alt6

repeat1

account_circle

Bigdata Engineer

@bigdata_engnr

1 day ago

Big Data Visualization Tools

1)Apache Superset
2)Jupyter Notebook
3)Apache Zeppelin
4) Metabase

Course Link: buff.ly/3AkZjK5

#bigdata #apachespark #hadoop #programming #programmer #developer #code #codinglife #100DaysOfCode #100daysofcodechallenge #100DaysOfMLCode

thumb_up_off_alt0

account_circle

Wolfgang Strasser

@wstrasser

5 months ago

The new #MicrosoftFabric runtime 1.2 is available
📣 Apache Spark 3.41
📣 Delta Lake 2.4.0
📣 R: 4.22
....
read more in the documentation: learn.microsoft.com/en-us/fabric/d…

Picture powered by DALL-E3 (chatGPT plus)

#ApacheSpark #PowerBI

thumb_up_off_alt9

repeat2

account_circle

Luca Canali

@LucaCanaliDB

7 months ago

🚀 Just dropped a fresh blog post! Dive into the world of Apache Spark optimization with flame graphs, featuring a hands-on example with Grafana Pyroscope. 🔥📈 🔗 db-blog.web.cern.ch/node/193

#ApacheSpark #FlameGraphs #Pyroscope

thumb_up_off_alt52

repeat8