记录Spark2使用的一些笔记
开始上手
Spark中所有功能的入口点是SparkSession类。要创建基本的SparkSession,只需使用 SparkSession.builder()
:
1 2 3 4 5 6 7 8 9 10
| import org.apache.spark.sql.SparkSession
val spark = SparkSession .builder() .appName("Spark SQL basic example") .config("spark.some.config.option", "some-value") .getOrCreate()
import spark.implicits._
|
一些配置
1 2 3 4
| spark.master spark://5.6.7.8:7077 spark.executor.memory 4g spark.eventLog.enabled true spark.serializer org.apache.spark.serializer.KryoSerializer
|
所有配置可在 https://spark.apache.org/docs/latest/configuration.html 找到
一些问题
netty版本问题
1
| Exception in thread "main" java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.defaultNumHeapArena()I
|
配置以下,然后解决jar包冲突(maven helper),排除掉老版本的 netty
1 2 3 4 5 6 7 8 9
| <dependencyManagement> <dependencies> <dependency> <groupId>io.netty</groupId> <artifactId>netty-all</artifactId> <version>4.1.18.Final</version> </dependency> </dependencies> </dependencyManagement>
|
1 2 3 4
| <dependency> <groupId>io.netty</groupId> <artifactId>netty-all</artifactId> </dependency>
|
1
| Exception in thread "main" java.lang.NoClassDefFoundError: org/codehaus/janino/InternalCompilerException
|
增加依赖,把老版本排除
1 2 3 4 5
| <dependency> <groupId>org.codehaus.janino</groupId> <artifactId>janino</artifactId> <version>3.0.8</version> </dependency>
|
RDD转成 dataframe
直接转的话
1 2
| import sc.implicits._ stuRDD.toDF().show()
|
1
| Exception in thread "main" scala.reflect.internal.Symbols$CyclicReference: illegal cyclic reference involving object InterfaceAudience
|
原因未知~
用craeteDataframe
1
| sc.createDataFrame(stuRDD,classOf[PatientTest]).show()
|
1
| Exception in thread "main" java.lang.NoClassDefFoundError: org/codehaus/janino/InternalCompilerException
|