spark-wordcount quick example (Spark streaming programming guide)

https://spark.apache.org/docs/latest/streaming-programming-guide.html

Spark Streaming - Spark 3.1.1 Documentation

Spark Streaming Programming Guide Overview Spark Streaming is an extension of the core Spark API that enables scalable, high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Kinesis, or T

spark.apache.org

위 링크의 a quick example 참조

코드(../wordcount/network_wordcount.py)

from pyspark import SparkContext
from pyspark.streaming import StreamingContext

# Create a local StreamingContext with two working thread and batch interval of 1 second
sc = SparkContext("local[2]", "NetworkWordCount")
ssc = StreamingContext(sc, 1)

# Create a DStream that will connect to hostname:port, like localhost:9999
lines = ssc.socketTextStream("localhost", 9999)

# Split each line into words
words = lines.flatMap(lambda line: line.split(" "))

# Count each word in each batch
pairs = words.map(lambda word: (word, 1))
wordCounts = pairs.reduceByKey(lambda x, y: x + y)

# Print the first ten elements of each RDD generated in this DStream to the console
wordCounts.pprint()

ssc.start()             # Start the computation
ssc.awaitTermination()  # Wait for the computation to terminate

실행

terminal1에서

spark-submit ../wordcount/network_wordcount.py localhost 9999

terminal2에서

nc -lk 9999

실행 후 terminal2에 count할 문장을 치고 엔터

실행 후

terminal2의 문장을 terminal1에서 카운트함

terminal1은 1초 단위로 계속 돌아감, terminal에서 인풋이 있는 경우 받아서 카운트하고 다음 작업으로 넘어감

>>로 log를 따면

이런 식으로 카운트 결과만 프린트 됨.

terminal1에서 나오는 ms단위의 처리시간은 보이지 않음

처리 시간 확인을 위해 다른 방법을 찾아봐야 할 듯

'개발일기 > DataBase' 카테고리의 다른 글

pycharm install & setting (0)	2021.06.18
redis tuning (0)	2021.06.04
spark tuning.. (0)	2021.05.25
Spark Tuning (0)	2021.05.24
Spark 기초 (0)	2021.05.24

jstory

spark-wordcount quick example (Spark streaming programming guide)

'개발일기 > DataBase' 카테고리의 다른 글

티스토리툴바

spark-wordcount quick example (Spark streaming programming guide)

'개발일기 > DataBase' 카테고리의 다른 글

'개발일기/DataBase' Related Articles

티스토리툴바