Use case

using Polars to develop spark udf instead of pandas

Actually you can. For example, you can use Polars to write Arrow UDF, because Polars allows zero-copy creation of their dataframe from pyarrow RecordBatch and back. At the moment there is only mapInArrow, but applyInArrow is already added to the master branch of PySpark and it will be available in spark 4.0. https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.mapInArrow.html Polars UDF will be much much faster than pandas UDFs, I already tried it, it gave about x1.5 - x2

reddit source

detail tutorial on best way to use polars

https://kevinheavey.github.io/modern-polars/tidy.html#pivot-and-melt

sluofoss

Recent Notes

Graph Analytics

Understanding repo

Tea

Explorer

Polars

Table of Contents

Use case

using Polars to develop spark udf instead of pandas

detail tutorial on best way to use polars

Graph View

Table of Contents

Backlinks