python docstrings
[Update]I just figured out I had a terrible bug in the gedit plugin which caused the wizardy to work only once. This should be fixed now in 1.0b2[Update]
[Update]I find this script useful enough, so I created a gedit plugin for it and hosted is on google code[Update]
I work on a few little projects. When I get going I focus solely on the new feature or on squashing that bug, but not on publishable code. This is a flow in my working process but just who I am. The consequence of that behavior is: I neglect proper documentation. And by proper documentation I mean epydoc conforming docstrings.
When the time comes and you want to release something, especially when you want people to look at your code and work with it, you realise that you need proper documentation. So I sat down and started to write the docstrings. Writing documentation can actually be fun! What certainly is NO fun is writing the overhead: “”" to start the docstring and end it and @param and @type for each variable. There must be a way to avoid this boring and purely repetitive task. So I searched the intarwebs and, to my surprise, found nothing. While I still refuse to believe that everybody writes their docstrings on the fly, or uses an IDE das generates the stubs, my buddy and me also refused to write all the donkey code ourselves. So we started what I want to share with you:
A python script that inserts docstring stubs into a file. We used the oppurtunity to revive our regular expression skills. (Kudos to Kodos! If you want to develop a regex on the fly, Kodos is for you…).
So here goes:
import re
# fileNames to process
baseFile = "Game.py"
outputFile = "Game_enhanced.py"
# configuration parameters
# method parameters to skip
skip = ["self"]
# number of spaces to insert for identation
ident = 4
# we have to keep track of how many chars we already inserted because
# we are modifying the string while processing it
inserted = 0
def insert(original, new, pos):
“”"
inserts new into original at pos
@type original: string
@type new: string
@type pos: integer
“”"
return original[:pos] + new + original[pos:]
def newline(current_ident):
“”"
returns \n with current_ident + global ident spaces attached
@type current_ident: integer
@param current_ident: the identation of the line we are wrapping
“”"
global ident
res = “”
for i in range(0,current_ident + ident):
res += ” ”
return “\n” + res
# regular expressions
method_pattern = re.compile(’.*def .*.’)
variable_pattern = re.compile(’[(].*[)]‘)
variable_name_pattern = re.compile(’,')
docstring_pattern = re.compile(’\s*”"”‘)
# read in the file to be processed
f = open(baseFile, “r”)
text = f.read()
# find all methods
method_iter = method_pattern.finditer(text)
# process each method
for item in method_iter:
# the position of the docstring
# here we have to take into account what has already been inserted
pos = item.end() + inserted
# the identation of the method is the distance from the beginning of the
# line till the “d”
current_ident = item.group().find(”d”)
# test if there already is a docstring: the next non-whitespace should be “”"
if docstring_pattern.match(text[pos:]):
# debug/info notice on the konsole
#print item.group(), ” already has a docstring”
pass
else:
variables = []
#search vor the (…) section
vars = variable_pattern.search(item.group())
vars = vars.group()
# get rid of the paranthesis
vars = vars.replace(’(',”)
vars = vars.replace(’)',”)
# split the variables by commata
for var in variable_name_pattern.split(vars):
# get rid of spaces
var = var.strip()
# and use only the first word, we get rid of standard values
var = var.partition(” “)[0]
# test for exclusion
if not var in skip:
variables.append(var)
# generate the docstring
docstring = newline(current_ident) + ‘\”\”\”‘ + newline(current_ident)\
+ ‘FIXME’ + newline(current_ident)
# add the @ lines
for var in variables:
docstring += ‘@param ‘+var+’:’ + newline(current_ident)\
+’@type ‘+var+’:’ + newline(current_ident)
# close it
docstring += ‘\”\”\”‘
#insert the docstring into the text
text = insert(text, docstring, pos)
# record how many chars have been inserted
inserted += len(docstring)
# open the output file
fo = open(outputFile, ‘w’)
# write the output
fo.write(text)
Now the trained eye notices a couple of things:
- It is neither very clever or complicated. All it does is search for a method declaration, check if a docstring is present, parse the variable names (stripping of keywords), generate a docstring with @param and @type lines and paste it to the text.
- It completely ignores @return and @rtype stuff because when considering nested functions we quickly came to the insight that detecting returns is not trivial. The “return” of a function does neither have to be before the next “def” nor does the identation give any sufficient info. As this is the only information our script gives us, we skipped that. In order to properly detect the return stuff we would have to parse the whole method body (including any inner methods, i.e. we would have to parse everything till the ident matches the ident of the def again) and make a judgement which return belongs to which def based on all the idents in between those two. If between a def and a return there is another def, then the return belongs to the first if the ident of the return is lower than the ident of the second def.
- It uses spaces and no tabs to do idents
- It is not a usable module. Ideally, the thing would be a module which would provides a method to pass a list of files and some configuration stuff to.
- Again: It’s simple!
Comments and suggestions are very welcome. And I would love to have a “D’Oh!!” moment when somebody points me to where we could have achieved that in 5 minutes time…
Tags: (k)ubuntu, Tag hinzufügen
30. November 2008 um 19:24
[...] python docstrings import re # fileNames to process baseFile = %26quot;Game.py%26quot; outputFile = %26quot;Game_enhanced.py%26quot; # configuration parameters # method parameters to skip skip = [%26quot;self%26quot;] # number of spaces to insert for identation ident = 4 # we have to keep track … [...]