Blog by Sumana Harihareswara, Changeset founder

06 Nov 2013, 7:06 a.m.

Top, Iterators and Generators, and Git, Emacs, and REPL Tips

Hi, reader. I wrote this in 2013 and it's now more than five years old. So it may be very out of date; the world, and I, have changed a lot since I wrote it! I'm keeping this up for historical archive purposes, but the me of today may 100% disagree with what I said then. I rarely edit posts after publishing them, but if I do, I usually leave a note in italics to mark the edit and the reason. If this post is particularly offensive or breaches someone's privacy, please contact me.

Dumping into a post some things I've learned recently, trying to disregard the potential "you didn't know that already?!?!" surprise, feigned or genuine, that people might impose on me.*

  • How did I never use top before? Magic! "Why in the world is my fan so loud? [run top] Epiphany, I closed that tab minutes ago, why are you still going like gangbusters? Fine, I'll quit and restart you."
  • Lots of data types in Python are iterables. Like, say, lists, or strings. If you call the iter method with an object of that type as the argument, you get an iterator -- if you want to do stuff with that, then you give it a name. An iterator (holy crap) is like a function that holds onto state, so that it remembers what its state was the last time you accessed it! The point of an iterator is to traverse the iterable from beginning to end, yielding one value each time it's called with .next() or similar, then saying all done with a StopIteration error. Like this:
    >>> a = [3,6,9]
    >>> a
    [3, 6, 9]
    >>> iter(a)
    <listiterator object at 0x7f5c8c8da490>
    >>> s = iter(a)
    >>> type(s)
    <type 'listiterator'>
    >>> s.next()
    3
    >>> s.next()
    6
    >>> s.next()
    9
    >>> s.next()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    StopIteration
    >>> r = "captain"
    >>> w = iter(r)
    >>> w.next()
    'c'
    >>> w.next()
    'a'
    >>> w.next()
    'p'
    >>> w.next()
    't'
    
  • If the body of a Python function includes the verb yield (instead of return), then you've just made a generator. A generator creates an iterator that performs your whims! Again, you don't just call it directly; you assign a variable to a run of the generator function, with the same syntax as you'd use if you wanted to make an instance of a class, and then you have a generator object, which is an iterator that you treat as you would another iterator. Lemme show you:
    >>> def foo():
    ...     yield "first"
    ...     yield "second"
    ...     yield "last"
    ... 
    >>> b = foo()
    >>> type(b)
    <type 'generator'>
    >>> b.next()
    'first'
    >>> b.next()
    'second'
    >>> b.next()
    'last'
    >>> b.next()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    StopIteration
    

    I read (bits of) so very very many pages about this, and fellow students tried to help me get this (thank you, Joe and Gideon), and then yesterday I paired with Jessica McKellar and she sealed the deal and I think I get it now! As she explained, you might want to use generators if you want to get an infinite series of values (e.g., all the even numbers up to infinity). Or if you're crunching numbers and it takes hella resources to do this particular part of the crunching on THE WHOLE DATASET ALL AT ONCE, with generators you can just crunch one input at a time and yield it up, then move to the next input in the sequence seamlessly when needed. You can speed up bottlenecks in your assembly line by doing particular computations in a just-in-time way.

  • Meta-g g in Emacs takes me to a specific line number in the file. Putting (setq column-number-mode t) in ~/.emacs.d/init.el ensures that the statusbar at the bottom of the editor displays column number along with line number. These tips together make it much easier for me to seek out whatever discrepancies git or pep8 have brought to my attention.
  • The xmlrpclib module makes it pretty easy to access XML-RPC web APIs, e.g. the Trac API as accessible on the Django project's site. However! The IPython and bpython REPLs may attempt to nicely autocomplete not-really-discoverable method names ... across the network ... and choke. And maybe crash. So if you want to play with it, just use the regular Python REPL. (But for everything else, oh wow, the bpython REPL is pretty snazzy.)
  • git grep is great! It's automatically recursive, and only searches "the tracked files in the work tree, blobs registered in the index file, or blobs in given tree objects" (quoting from the man page). Just like with grep, if you use -n, then with every matching line you also get the line number. Or set lineNumber = True in the [grep] section of ~/.gitconfig to always have that on. If you miss colored output, use the --color=always option, or (as I just discovered) you should check out git configuration options, e.g. color.ui=true, to make LOTS OF OUTPUT colored and useful!
* The magic of Hacker School: no one at Hacker School will do that. Nor well-actually me about this post! Random internet commenters might, and I may delete them.

Comments

Tikitu
http://ww.logophile.org/
06 Nov 2013, 13:59 p.m.

Good stuff! If you like top you might also like htop (same idea but prettier and easier to interact with) and iotop (same idea but for "what is churning my disk all of a sudden?!"). And ack-grep might be useful too: a non-git-specific semi-equivalent to git grep.

C. Scott Ananian
http://cscott.net
06 Nov 2013, 14:58 p.m.

For git color options, "auto" is usually a better bet than "true" -- that makes sure that "git grep foo " doesn't get a bunch of random color escape characters in it.

I have the following in my ~/.gitconfig:

[color]

diff = auto

log = auto

commit = auto

branch = auto

[log]

decorate = short

[rebase]

autosquash = true<br/>

Avram
http://grumer.org/
06 Nov 2013, 22:19 p.m.

Usually when I run top, my #1 CPU-occupying process is top.

Avram
http://grumer.org/
06 Nov 2013, 22:21 p.m.

(Nooooo, the comments form ate my <kbd> tags!)

Jed
http://www.kith.org/journals/jed
11 Nov 2013, 12:36 p.m.

Nifty! Thanks for posting these!

I didn't know about most of them. I had encountered integrators in other contexts (in Java, I think), but had had to figure out what they were and how they worked from context; neat to see it laid out so simply and clearly. And I hadn't heard of generators before; super-cool. Can you also use a generator to create an iterator that iterates in nonlinear order, like random order or alphabetical order or something? If I'm understanding right, it sounds related to the concept of implementing your own comparison functions for use by sort functions.