Use case
using Polars to develop spark udf instead of pandas
Actually you can. For example, you can use Polars to write Arrow UDF, because Polars allows zero-copy creation of their dataframe from pyarrow RecordBatch and back. At the moment there is only
mapInArrow
, butapplyInArrow
is already added to the master branch of PySpark and it will be available in spark 4.0. https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.mapInArrow.html Polars UDF will be much much faster than pandas UDFs, I already tried it, it gave about x1.5 - x2
detail tutorial on best way to use polars
https://kevinheavey.github.io/modern-polars/tidy.html#pivot-and-melt