Comments on: Creating Data Where There is None https://www.statedecoded.com/2012/06/creating-data/ Legal codes, for humans. Fri, 29 Jun 2012 02:34:57 +0000 hourly 1 https://wordpress.org/?v=6.2.5 By: Waldo Jaquith https://www.statedecoded.com/2012/06/creating-data/#comment-441 Fri, 29 Jun 2012 02:34:57 +0000 http://www.statedecoded.com/?p=134#comment-441

Your idea about the CA bills is interesting but I think there’s an insurmountable problem: each bill maps to many statutes.

I don’t think that’s actually an insurmountable problem. And unlike the rest of my proposal, this bit I say from experience, because I already map legislation to laws on Virginia Decoded, using Richmond Sunlight (which I also run, conveniently :). There are a great many bills that map only to one statute. In Virginia, something like half of all bills that affect the Code of Virginia affect just one section of the code. From an admittedly brief perusal of California State Legislature’s website, I’ve found examples there of bills that affect just one statute. I think a large sample would be necessary to determine the number of bills liable to exist that affect just one statute, though. It’s quite possible that the total number is too low to be valuable. I also suspect that there are clever solutions waiting to be employed to deal with legislation that affects multiple sections (given a large enough sample size that affect a very small number of sections, for instance), but I sure haven’t thought that through.

I’ve thought about doing something like that and then using it as input to a captcha-type voting system. People would be asked to vote up the best candidate title.

Good idea!

The California Law Revision Commission processes a steady stream of CA law, implementing various kinds of fixes and improvements. And I discovered that they create titles for internal use, publishing them in their reports. The content is unfortunately buried in PDFs, and not highly structured, but it’s a possibility.

Wow, you’re right—that’s really great stuff! It might be in PDFs, but at least it’s actual text in there, instead of images, so it’s imminently scrape-able. Given a few years of annual reports, I’ll bet you could get a huge chunk of the California Code covered! You might consider asking California Law Revision Commission if they have a listing of titles, either for the entire code or just for all of the sections that they have had cause to assign titles to. They may well have a big spreadsheet they’d be happy to give you. My experience with the equivalent Virginia organization has been overwhelmingly positive—they’re eager to give me all of the information that I ask for and more. Perhaps California would be as forthcoming?

]]>
By: Robb Shecter https://www.statedecoded.com/2012/06/creating-data/#comment-440 Fri, 29 Jun 2012 02:14:25 +0000 http://www.statedecoded.com/?p=134#comment-440 Hey Waldo,

You wrote a great description of the problem. Your idea about the CA bills is interesting but I think there’s an insurmountable problem: each bill maps to many statutes.

I like the idea of scavenging names from unconventional sources like these, though, including textual analysis. I’ve thought about doing something like that and then using it as input to a captcha-type voting system. People would be asked to vote up the best candidate title.

Now, similar to your CA bills idea, I found another small but steady source of titles: The California Law Revision Commission processes a steady stream of CA law, implementing various kinds of fixes and improvements. And I discovered that they create titles for internal use, publishing them in their reports. The content is unfortunately buried in PDFs, and not highly structured, but it’s a possibility. Take a look at this, beginning at page 43: http://clrc.ca.gov/pub/Printed-Reports/Pub235-AR.pdf

]]>