Voluntary Coding #1

Problem: 3. A simple measure of how complex a sequence is would be the count of the most frequent character n-gram, divided by the count of all n-grams. For example, if n is 3, then the sequence ATATATATAG contains 4x ATA, 3x TAT and 1x TAG. The proportion is thus 4/8=0.5. The higher this number, the more repetitive the sequence.

Write a function simple(s,n) where s is a sequence and n is the length of the n-gram to consider. The function will return the proportion described above.

Solution:

def simple(s,n):
  temp ="" 
  chargram = []

  for i in range(0,len(s)-n+1):
    for j in range(i,i+n):
      temp += s[j] 
    print temp #this line could be omitted
    chargram.append(temp)
    temp =""
  print chargram #this line could be omitted
  
  temp2 =[]
  temp3 =[]
  count = 0
  gramN = 0 
  previouslyFound = 0
  
  for i in range(0, len(chargram)):
#    print i,
    for k in range(0,len(temp2)):
        if chargram[i] == temp2[k]:
          previouslyFound = 1
    if previouslyFound == 1:
      previouslyFound = 0
    else:  
      for j in range(0, len(chargram)):
        if chargram[i] == chargram[j]:
          count = count+1
        gramN = count
      count = 0
      temp3.append(gramN)
#    print gramN
    temp2.append(chargram[i])
  
  print temp3 #this line could be omitted

  sum = 0.0
  totalnoOfGram = 0.0
  freqGram = 0.0
  
  for i in range(0,len(temp3)):
    totalnoOfGram +=temp3[i]
  print totalnoOfGram #this line could be omitted 
  
  result = 0.0
  freqGram = max(temp3)
  result = freqGram/totalnoOfGram
#  print result
  return result
  
# end of the function simple(s,n)

print simple("ATATATATAG",3)
Voluntary Coding #1