spark tuning..

import sys

from pyspark import SparkContext, SparkConf
from pyspark.sql.session import SparkSession


i_data="../wordcount/input/xaa"
o_data="../wordcount/output_ex"

# spark=SparkSession
# spark.conf.set("spark.driver.memory", '5g')

if __name__ == "__main__":
	
	# create Spark context with necessary configuration
	sc = SparkContext("local","PySpark Word Count Example")
	
	# read data from text file and split each line into words
	words = sc.textFile(i_data).flatMap(lambda line: line.split(" "))
	
	# count the occurrence of each word
	wordCounts = words.map(lambda word: (word, 1)).reduceByKey(lambda a,b:a +b)
	
	# save the counts to output
	wordCounts.saveAsTextFile(o_data)

위 코드로 실행

spark-submit로 실행

옵션 x

2. yarn 옵션

3. cluster 모드 옵션

4. client 모드 옵션

이대로 spark를 세팅해볼지 아니면 다른 어플리케이션 세팅을 고려해볼지 선택해야 할 듯

'개발일기 > DataBase' 카테고리의 다른 글

pycharm install & setting (0)	2021.06.18
redis tuning (0)	2021.06.04
Spark Tuning (0)	2021.05.24
Spark 기초 (0)	2021.05.24
spark-wordcount quick example (Spark streaming programming guide) (0)	2021.05.20

jstory

spark tuning..

'개발일기 > DataBase' 카테고리의 다른 글

티스토리툴바

spark tuning..

'개발일기 > DataBase' 카테고리의 다른 글

'개발일기/DataBase' Related Articles

티스토리툴바