Blog by Sumana Harihareswara, Changeset founder

16 Feb 2024, 11:10 a.m.

Celebrate Beautiful Soup's 20th Anniversary

Please help join a celebration for the 20th anniversary of the software project Beautiful Soup on May 19th, 2024! For twenty years, this screen-scraping library has made it easier to get data out of HTML.

You didn't write that awful page. You're just trying to get some data out of it. Beautiful Soup is here to help. Since 2004, it's been saving programmers hours or days of work on quick-turnaround screen scraping projects.

Back in March 2004, Leonard Richardson asked:

Can someone point me to an HTML parser that turns an HTML document into a nested data structure like what I sketched out below? I'm sick of having to jump through hoops to collect the text of a link. I know there's something similar for XML because I heard about it at EuroPython. Stop me before I write my own!

Leonard went on to create Beautiful Soup, announcing it on May 19th, 2004, and has maintained it ever since. The current release is Beautiful Soup 4.12.3, published a month ago.

A few months ago, I spoke with Leonard (who is now my spouse) about what he would like for the 20th anniversary.

Tell us your stories

Leonard would love for us to put together an anthology in which people share stories of how they've used Beautiful Soup, and what that's enabled in their lives. Kind of like a festschrift. So many people have informally said "The first programming project I ever did was in Beautiful Soup" or "We couldn't have done this art/activism project without Beautiful Soup" to us, but very few people write it down. And Leonard isn't just looking for Hall of Fame-worthy projects that are inherently awesome -- he also wants to know about how using it has changed your life. Was using it an important step in you thinking of yourself as a programmer, or starting a company, or triumphing over a bureaucracy?

Please comment or email me (sumanah+bs20@fastmail.com) to let me know if you can contribute! I figure mostly people will write textual essays, but if you want to make a video, a song, an interactive fiction piece, or some other cool piece of art, I want to know about it! We will probably need your materials by April 23rd to edit, design, print, and ship a book in time for May 19th.

I'd prefer for someone else to do the legwork of compiling and editing it -- the How-To we put together for our last anthology may come in handy -- and I'll update this section if someone else steps up to do that.

Parties

Please feel free to hold your own anniversary parties! But also, since the anniversary overlaps with PyCon US this year, I'm hoping to use a one-hour open space slot that weekend for a celebration. Leonard would make a short speech, and a few people would give five-minute lightning talks about cool things they did with Beautiful Soup, or how using it was an inflection point in their lives. And then if you want to say hi to Leonard in person and thank him for his work, you could do that. If we can manage to get some copies of the anthology to the convention then we'd do that too and you can get Leonard to autograph them.

Leonard doesn't usually go to conferences and conventions, so I'm glad he's open to doing this.

Please comment or email me (sumanah+bs20@fastmail.com) by May 9th to let me know if you're going to be at PyCon US in May and could give a 5-minute talk.

Funding

Here's what we could use:

  • Maybe a few hundred dollars to get books printed
  • A few hundred to a thousand dollars, or some Amtrak or frequent flyer points, if we want to upgrade the PyCon travel (New York to Pittsburgh and back) to something better than coach/economy class (he is tall)

So email me (sumanah+bs20@fastmail.com) if you can help with either of those things.

Leonard's also working on some big documentation improvements, so if your company wants to spend a few thousand dollars to sponsor that, go ahead and email me.

Evergreen requests

And, as Leonard reminds us on the Beautiful Soup homepage:

If you have questions, send them to the discussion group. If you find a bug, file it on Launchpad. If it's a security vulnerability, report it confidentially through Tidelift.
If you use Beautiful Soup as part of your work, please consider a Tidelift subscription. This will support many of the free software projects your organization depends on, not just Beautiful Soup.
If Beautiful Soup is useful to you on a personal level, you might like to read Tool Safety, a short zine I wrote about what I learned about software development from working on Beautiful Soup. Thanks!

As Leonard noted a few years ago, Tidelift support has helped him get back to being interested in doing Beautiful Soup work.

Reminiscences

(Feel free to skip this bit; it's really just me being sentimental about someone I admire a lot.)

Leonard, May 19th, 2004:

I don't do the grand vision thing well. In my experience, a grand vision is largely a pheremonal advertisement for the person with the vision. I see instead a lot of little strategies and experiments. The world doesn't need another wannabe visionary, but I figure if I can make some incremental improvements and implement some ideas, I can hold my head high with the rest of them.

Leonard, May 20th, 2004:

I wrote my own library, which I like to call Beautiful Soup and which I mentioned in passing yesterday. It's the HTML parser that just doesn't care. If you give it perfect HTML, it'll give you a perfect data structure, just like the big-name parsers. But other parsers know too much about HTML. They choke on or try to rewrite bad markup. They assume you care about the whole document. A pirate might make you walk the plank, but only a parser would make you walk the whole tree.

Leonard, May 21st, 2004:

Because Beautiful Soup's reception greatly exceeded expectations (I think I tapped a real market need here), I made a cute little web page for it with lots of examples and Tenniel graphics. I finally get to pay tribute to the line from Carroll that made me actually roll around on the floor laughing when I was 9 and my father was reading The Annotated Alice aloud, one chapter a week, complete with all the incomprehensible-to-children annotations.

(My heart hurts too much to quote what Leonard wrote in January 2013 in memory of our friend Aaron Swartz: "For Aaron" and "429 Too Many Requests".)

Leonard, September 16th, 2010:

I'm also not in the business of saving each of my users ten seconds. I want to save you three hours, or a week would be better. That's why I do what I do—you can see this thread running through every job I've had, both my nonfiction books, and my most successful open source projects. Check the intro to Ruby Cookbook: I just realized I wrote the same damn thing in there.
This book is meant to save you time. Time, as they say, is money, but a span of time is also a piece of your life. Our lives are better spent creating new things than fighting our own errors, or trying to solve problems that have already been solved. We present this book in the hope that the time it saves, distributed across all its readers, will greatly outweigh the time we spent creating it.
Beautiful Soup is showing its age, but I don't think it's totally crazy to estimate that it's saved thirty or forty person-years since 2004. Someday I'll die, and I don't want the main thing people say about me to be "The time he saved other people was a greater-than-unity multiple of his own lifespan." But that is an important part of my personal philosophy. It's Tim O'Reilly's "Create more value than you capture" applied to time.

If Beautiful Soup has saved you time, I hope you'll contribute to this celebration! Thanks.

Comments

-dsr-
https://blog.randomstring.org
11 Mar 2024, 15:18 p.m.

I just realized that two of the tools I use frequently -- the blogging software Pelican and the fanfiction-to-epub converter FanFicFare -- both depend on BeautifulSoup.

So thanks muchly! These things make my life better.