How to apply a 'moving window' to analyse chunks of text sequentially in Python? -
i calculate simple moving window average type/ token ratio (ttr) of text sample. know how calculate ttr of whole text, or select first 50 words , calculate ttr that. think need create loop iterates on 50 words @ time, start moving +1 each time window moves through text, appending resulting ttr each window in list can average. it's looping/ chunking/ +1 part i'm stuck on.
this (think) want in loop. text has been lowered etc.:
window = text[0:50] wordcount = collections.counter(window) uniquewords = list(wordcount.keys()) ttr = (len(uniquewords))/(len(window)) windowsttr.append(ttr)
i have read other answers here, documentation enumerate , itertools.islice, still can't seem solve problem. gratefully receieved, i'm new python.
parametrize loop body according start position. iterate through possible start positions.
window_width = 50 last_index = len(text) - window_width start in range (last_index): window = text[start:start+window_width] wordcount = collections.counter(window) uniquewords = list(wordcount.keys()) ttr = (len(uniquewords))/(len(window)) windowsttr.append(ttr)
if need take larger steps through text, parametrize that, well:
window_width = 50 last_index = len(text) - window_width step = 4 # shift 4 positions @ time start in range (0, last_index, step):
Comments
Post a Comment