Python / CSV파일에서 그룹지어 계산하기

2012. 5. 11. 16:34

import csv
from collections import defaultdict

# a dictionary whose value defaults to a list.
data = defaultdict(list)

# open the csv file and iterate over its rows. the enumerate()
# function gives us an incrementing row number
for i, row in enumerate(csv.reader(open('python_test.csv', 'rb'))):
    # skip the header line and any empty rows
    # we take advantage of the first row being indexed at 0
    # i=0 which evaluates as false, as does an empty row
    if not i or not row:
        continue

    # unpack the columns into local variables
    _, zipcode, level = row
    # for each zipcode, add the level the list
    data[zipcode].append(float(level))

# loop over each zipcode and its list of levels and calculate the average
for zipcode, levels in data.iteritems():
    print zipcode, sum(levels) / float(len(levels))

[python_test.csv]

ID	ZIPCODE	RATE
1	19003	27.5
2	19003	31.33
3	19083	41.4
4	19083	17.9
5	19102	21.4

위의 Input data가 소스코드를 통해 다음과 같은 결과로 나온다(Python 2.5 이상에서 정상적으로 작동된다.)

19003 29.415
19083 29.65
19102 21.4

Advanced Source code

import csv
from collections import defaultdict

# a dictionary whose value defaults to a list.
data = defaultdict(list)
data2 = defaultdict(list)

f = open('python_test.txt', 'r')

fList = []

for line in f.readlines():
	fList.append(line.split('\t'))

f.close()

# open the csv file and iterate over its rows. the enumerate()
# function gives us an incrementing row number

for i, row in enumerate(fList):
    # skip the header line and any empty rows
    # we take advantage of the first row being indexed at 0
    # i=0 which evaluates as false, as does an empty row
    if not i or not row:
        continue

    # unpack the columns into local variables
    _, zipcode, level, info = row
    # for each zipcode, add the level the list
    data[zipcode].append(float(level))
    data2[zipcode].append(info.strip())

# loop over each zipcode and its list of levels and calculate the average
for zipcode, levels in data.iteritems():
    print zipcode, sum(levels) / float(len(levels)), '|'.join(data2[zipcode])

Reference

<defaultdict>
http://docs.python.org/release/2.5.2/lib/defaultdict-examples.html

http://stackoverflow.com/questions/5328971/python-csv-need-to-group-and-calculate-values-based-on-one-key

'Technology > Programming' 카테고리의 다른 글

Python / 숫자, 소수점 정규표현식 (0)	2012.07.23
Python / Text parsing and match (0)	2012.05.11
Python / MBR MeSH 파싱하기 (0)	2012.01.17
PHP / 올바른 코딩, 잘못된 코딩 (0)	2011.12.15
Javascript / AJAX 콜백에서 팝업창 띄우기(window.open) (0)	2011.12.14

SNOWPLE

Python / CSV파일에서 그룹지어 계산하기

'Technology > Programming' 카테고리의 다른 글

+ Recent posts

티스토리툴바