import sys
from pyspark import SparkContext, SparkConf
from pyspark.sql.session import SparkSession
i_data="../wordcount/input/xaa"
o_data="../wordcount/output_ex"
# spark=SparkSession
# spark.conf.set("spark.driver.memory", '5g')
if __name__ == "__main__":
# create Spark context with necessary configuration
sc = SparkContext("local","PySpark Word Count Example")
# read data from text file and split each line into words
words = sc.textFile(i_data).flatMap(lambda line: line.split(" "))
# count the occurrence of each word
wordCounts = words.map(lambda word: (word, 1)).reduceByKey(lambda a,b:a +b)
# save the counts to output
wordCounts.saveAsTextFile(o_data)
위 코드로 실행
spark-submit로 실행
- 옵션 x
2. yarn 옵션
3. cluster 모드 옵션
4. client 모드 옵션
이대로 spark를 세팅해볼지 아니면 다른 어플리케이션 세팅을 고려해볼지 선택해야 할 듯
'개발일기 > DataBase' 카테고리의 다른 글
pycharm install & setting (0) | 2021.06.18 |
---|---|
redis tuning (0) | 2021.06.04 |
Spark Tuning (0) | 2021.05.24 |
Spark 기초 (0) | 2021.05.24 |
spark-wordcount quick example (Spark streaming programming guide) (0) | 2021.05.20 |