Spark2-笔记一

记录Spark2使用的一些笔记

开始上手

Spark中所有功能的入口点是SparkSession类。要创建基本的SparkSession,只需使用 SparkSession.builder()

1
2
3
4
5
6
7
8
9
10
import org.apache.spark.sql.SparkSession

val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.some.config.option", "some-value")
.getOrCreate()

// For implicit conversions like converting RDDs to DataFrames
import spark.implicits._

一些配置

1
2
3
4
spark.master            spark://5.6.7.8:7077
spark.executor.memory 4g
spark.eventLog.enabled true
spark.serializer org.apache.spark.serializer.KryoSerializer

所有配置可在 https://spark.apache.org/docs/latest/configuration.html 找到

一些问题

netty版本问题

1
Exception in thread "main" java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.defaultNumHeapArena()I

配置以下,然后解决jar包冲突(maven helper),排除掉老版本的 netty

1
2
3
4
5
6
7
8
9
<dependencyManagement>
<dependencies>
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
<version>4.1.18.Final</version>
</dependency>
</dependencies>
</dependencyManagement>
1
2
3
4
<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-all</artifactId>
</dependency>
1
Exception in thread "main" java.lang.NoClassDefFoundError: org/codehaus/janino/InternalCompilerException

增加依赖,把老版本排除

1
2
3
4
5
 <dependency>
<groupId>org.codehaus.janino</groupId>
<artifactId>janino</artifactId>
<version>3.0.8</version>
</dependency>

RDD转成 dataframe

直接转的话

1
2
import sc.implicits._
stuRDD.toDF().show()
1
Exception in thread "main" scala.reflect.internal.Symbols$CyclicReference: illegal cyclic reference involving object InterfaceAudience

原因未知~

用craeteDataframe

1
sc.createDataFrame(stuRDD,classOf[PatientTest]).show()
1
Exception in thread "main" java.lang.NoClassDefFoundError: org/codehaus/janino/InternalCompilerException
0%
隐藏