1dHoOmVQz7hB1Fv7dyu5sZ2JxQEOB82Iy?usp=sharing by ON

PySpark

Spark 구성 모듈

Loading Dataset

dataframe : 2-dimensinal labled data structor with cloumns of potentially different types
ways to view dataframe in PySpark
- df.take(n) df.collect() df.show() df.limit(n), df.limit()
viewing dataframe columns : df.columns
dataframe schema : df.dtypes df.printSchema()
Inferring Schema Implicitly : inferSchema=True
Defining Schema Explicitly : label → StructType → schema=StructType instance

DataFrame Operations on Columns

Selecting Columns
Selecting Multiple Columns
- df.select(df.ColumnName).show()
- df.select(df['c1'], df['c2']).show()