Version 0.9 of The State Decoded is now available on GitHub. This release completes the work to store multiple editions of a code started in v0.8. Since we have made fundamental changes to how laws are stored, potentially breaking previous installations, we have added it as an additional pre-1.0 release.
Site admins can now choose to import a new edition of the legal code, or replace the current edition of code. This allows multiple editions of each code to live in parallel with the others – so when your locality updates their code, you can create a historical record of previous versions. Right now you can browse and search previous editions of the code, and all related data is kept isolate to each version. In the future this will enable us to compare and show the differences between versions of the code, as Wikipedia and GitHub do.
We now allow users to choose between the default Solr search engine and a new, optional MySQL search adapter. The MySQL adapter uses the built-in database to search, with no additional packages needed. This means that The State Decoded can now be run on most standard hosting without any extra setup. If your host can run WordPress, you can run State Decoded. Solr is still a far more powerful and robust solution, but users now have an easier setup option.
To support this, we’ve refactored the search system to better abstract the code. The result of this is that data is directly imported into the search system from the database instead of having to jump through XSLT hoops. This abstraction also means that new search engines can be used easily, by writing a new search interface adapter. For example, if anyone wanted to write an adapter to use Elasticsearch, that’s now much easier.
There is now a new command line tool that comes with the package called statedecoded
. This command can be used to run various tasks, including updating the database when a new version of State Decoded comes out, and more. Data imports can be run via the tool as well, so it can now easily be scheduled with cron
to import automatically on a regular basis. You can also create your own tasks to run using the tool.
The database structure itself can now be updated with this tool. This removes the need to delete the entire database and install from scratch every time the structure is changed.
We’ve added an example parser for working with Municipal Code Corporation’s XML format, to go along with our parser for American Legal Publishing Corporation’s data, and our default, generic parser. This improves the ability for localities to import data from their current codifier.
We’ve also made a large number of other bug fixes, design tweaks, and accessibility improvements.
As always, this work wouldn’t be possible without the generous support of The John S. and James L. Knight Foundation, who fund The OpenGov Foundation‘s work on The State Decoded. We also couldn’t have done this without the help of Grant Ingersoll of Lucidworks fixing our Solr issues, and support from other amazing members of the Solr community, including Doug Turnbull, Erik Hatcher, and John Berryman. Design suggestions were, as always, made by John Athayde of Meticulous.
]]>John Athayde and Lynn Wallenstein of Meticulous were responsible for dropping the old design and creating a new one from scratch. They’ve rendered the site almost unrecognizably different, which a highly modular, flexible design that looks good on screens of all sizes, is easy to customize, and will continue to grow with the State Decoded project. The layout is coded in SASS, which is handy for a lot of designers, and employs a pluggable template system that makes it easy to drop in custom designs. It even has a nice print stylesheet for folks who still like to have a hard copy in their hand. There’s no overstating what a huge improvement that this is.
It’s really insulting to call this a mere search engine. The team at Open Source Connections—John Berryman and Doug Turnbull, in specific—designed an implementation of Apache’s Solr search platform that’s optimized for legal data. Of course, we used this to add a site search engine with spellcheck and live search suggestions, but having legal data indexed by Solr (and the underlying Lucene system) facilitates all sorts of exciting new features in the realms of natural language processing, information retrieval, content recommendations, and machine learning.
The complexity of The State Decoded and the needs of its users outgrew the code base and data structure that had functioned from v0.1 through v0.7. Long-time State Decoded contributor Bill Hunt took on this project, completing it in just a few weeks, implementing a routing table, moving a lot of functionality into controllers, and routing all queries to a permalinks table. The old approach looks pretty clunky compared to what Bill built.
We’ve tested out The State Decoded on the major Linux distributions and hosting platforms, identified all of the changes that were needed to allow it to work smoothly in those environments, and modified the software accordingly. There’s now an automated environmental test suite, to make sure that The State Decoded will work properly, with multiple paths of accomplishing the same task, to require as little work as possible to configure the server. The installation process has fewer steps than ever, as everything that can possibly be automated is automated.
There’s an entire workflow to handle new editions of codes, bulk downloads are created automatically, a sample XSLT is included for State Decoded XML, there’s a default home page that doesn’t require any customization, it now supports non-unique section identifiers (!), there’s an API method for search, there’s a proper administrative section now, we have an assets repository for the design’s Photoshop files, a sitemap.xml is automatically generated, every law now has Dublin Core embedded, and dozen of other things.
The work done by Bill Hunt, Meticulous, and Open Source Connections wouldn’t be possible without the generous support of The John S. and James L. Knight Foundation, whose funding made it possible to hire them. And Bill’s ongoing work on The State Decoded is courtesy of his employer, The OpenGov Foundation, who now employs him to contribute to the project and to implement The State Decoded in cities and states around the U.S.
Thanks, too, to Chris Birk, Karl Nicholas, Nick Skelsey, Josh Tauberer, and Rostislav Tsiomenko for their suggestions, testing, and contributed code.
]]>There’s a great deal more to be done with the documentation. It needs to be organized, structured narratively, enhanced with illustrations, and simply cover more material. But it’s not bad, and now it’s easy for others to help make it better.
]]>The first is Subsection Identifier, which turns theoretically structured text into actually structured text. It is common for documents in outline form (contracts, laws, and other documents that need to be able to cross-reference specific passages) to be provided in a format in which the structural labels flow into the text. For example:
A. The agency may appoint a negotiated rulemaking panel (NRP) if a regulatory action is expected to be controversial.
B. An NRP that has been appointed by the agency may be dissolved by the agency when:
1. There is no longer controversy associated with the development of the regulation;
2. The agency determines that the regulatory action is either exempt or excluded from the requirements of the Administrative Process Act; or
3. The agency determines that resolution of a controversy is unlikely.
One of the helpful features of The State Decoded is that it breaks up this text, understanding not just that every labelled line can stand alone, but also that the final line, despite being labelled “3,” is actually “B3,” since “3” is a subset of “B.” That functionality has been forked from The State Decoded, and now stands alone as Subsection Identifier, which accepts passages of text, and turns them into well-structured text, like such:
(
[0] => stdClass Object
(
[prefix_hierarchy] => stdClass Object
(
[0] => A
)
[prefix] => A.
[text] => The agency may appoint a negotiated rulemaking panel (NRP) if a regulatory action is expected to be controversial.
)
[1] => stdClass Object
(
[prefix_hierarchy] => stdClass Object
(
[0] => B
)
[prefix] => B.
[text] => An NRP that has been appointed by the agency may be dissolved by the agency when:
)
[2] => stdClass Object
(
[prefix_hierarchy] => stdClass Object
(
[0] => B
[1] => 1
)
[prefix] => B.1.
[text] => There is no longer controversy associated with the development of the regulation;
)
[3] => stdClass Object
(
[prefix_hierarchy] => stdClass Object
(
[0] => B
[1] => 2
)
[prefix] => B.2.
[text] => The agency determines that the regulatory action is either exempt or excluded from the requirements of the Administrative Process Act; or
)
[4] => stdClass Object
(
[prefix_hierarchy] => stdClass Object
(
[0] => B
[1] => 3
)
[prefix] => B.3.
[text] => The agency determines that resolution of a controversy is unlikely.
)
)
The second mini-project is Definition Scraper, which extracts defined terms from passages of text. Many legal documents begin by defining words that are then used throughout the document, and knowing those definitions can be crucial to understanding that document. So it can be helpful to be able to extract a list of terms and their definitions. Definition Scraper needs only be handed a passage of text, and it will determine whether it contains defined terms and, if it does, it will return a dictionary of those terms and their definitions.
Running this passage through Definition Scraper:
“The Program” refers to any copyrightable work licensed under this License.
A “covered work” means either the unmodified Program or a work based on the Program.
Yields the following two-entry dictionary:
(
[program] => “The Program” refers to any copyrightable work licensed under this License. Each licensee is addressed as “you”. “Licensees” and “recipients” may be individuals or organizations.
[covered work] => “covered work” means either the unmodified Program or a work based on the Program.
)
Definition Scraper is also a core function of The State Decoded, but warrants becoming its own project because it is so clearly useful for applications outside of the framework of The State Decoded.
The decision to spin off these projects was prompted by a report by the John S. and James L. Knight Foundation, the organization that funds The State Decoded, which evaluated the success of their News Challenge winners. They found several common attributes among the more successful funded projects, including this:
Projects that achieved strong use and adoption of their code often built and released their software in individual components, knowing that certain elements had value and a wide range of uses beyond the main focus of their project.
As development on The State Decoded continues, we may well spin off more mini-projects, if it becomes clear that more components of the overall project could be useful stand-alone tools.
]]>The State Decoded now has a fully fleshed out RESTful, JSON-based API. It has three methods: Law, Structure, and Dictionary. Law provides all available information about a given law. Structure provides all available information about a given structural unit (the various organizational units of legal codes—”titles,” “chapters,” “parts,” etc.). And Dictionary provides the definition (or definitions) for a term within a legal code. The data for these comes directly from the internal API that drives the site—what’s available publicly is what drives the site privately. In fact, I’m toying with the idea of having the site consume its own API, using internal APIs solely to serve data to the external API, and having every other part of the site get its data from that external API.
For a quick start trying this out on the Virginia site, you can use the trial key, 4l6dd9c124ddamq3
(though don’t build any applications using that key, or they will break when it expires), and see the API documentation to put it to work. For a really quick start, you can just browse the Code of Virginia via the API, check out a list of definitions, or read the text of a law. If you decide that you like what you see, register for a key and put this API to work.
Personally, this is the release that I’ve been waiting for. There’s an extent to which the purpose of The State Decoded project is really just to provide an API for legal codes; the fact that there’s a pretty website atop that API is just icing on the cake.
A significant obstacle to implementing The State Decoded has been the need to customize the parser for each installation. Every legal code is different—there are no standards—with some all in one big SGML file, others stored in thousands of XML files, and other still needing to be scraped off of webpages. That necessitated modifying the State Decoded parser to interface the data from the source files with the internal API. That’s really not an obstacle to people and organizations who are serious about implementing The State Decoded. But plenty of people might be serious if they could just try it out first. There’s a huge gradient between “huh, looks interesting” and “I must get my city/state/country laws online!” It’s foolish to assume that people won’t just want to try it out first. After all, that’s how I prefer to get started with software and projects.
The solution was to establish an XML standard for importing legal codes into The State Decoded, to provide a low-barrier-to-entry path in addition to the more complex path. To be clear, this is not an attempt to create an XML standard for legal codes. This is a loosely typed standard, used solely as an import format for The State Decoded. Many legal codes are already stored as XML—that’s the most common file format—so getting those codes into The State Decoded now only requires writing a bit of XSLT. This is a much lower barrier to entry.
The XML looks like this:
<?xml version="1.0" encoding="utf-8"?> <law> <structure> <unit label="" identifier="" order_by="" level=""></unit> </structure> <section_number></section_number> <catch_line></catch_line> <order_by></order_by> <text> <section prefix=""></section> </text> <history></history> </law>
Several of those fields are optional, too. There will certainly be legal codes and organizations for which this won’t do the trick—they’ll need to modify the parser to handle some unusual types of data, fields, third-party data sources, etc. But for most people, this will be a big improvement.
The Code of Virginia is available as State Decoded XML, so if you’ve been considering playing with The State Decoded, it just got a whole lot easier to deploy a test site. Just download that XML and follow the installation instructions.
Thanks to Tom MacWright, Andrew Nacin, Daniel Trebbien, and Chad Robinson for their pull requests, wiki edits, and trouble tickets.
]]>Here are the most interesting changes:
Most of these changes are, in one way or another, moving the project towards standardization, automation, and normalization, to make it easier to deploy, maintain, and use. It should all be a lot easier to understand for a programmer diving into it for the first time.
The next release is version 0.6, dedicated to API improvements. That will be comprised of a relatively small number of issues, but they’re big ones: creating a RESTful JSON-based API, and supporting a crudely typed XML input format to simplify the process of parsing new codes. The latter is important, because the present arrangement requires that one know enough PHP to modify the parser to suit their own code’s unique storage and formatting. The idea here is that you can, alternately, use the tools of your choice to create an XML version of that code, and as long as that XML is of the style expected by the parser, it can be imported without having to edit a line of PHP in The State Decoded. Note that v0.6 was supposed to be the release in which the Solr search engine was integrated deeply into the software. That has now been pushed back—it’ll probably be v0.9—in order to accommodate a vendor’s schedule.
]]>There are a few big changes:
A list of all closed issues is available for those who want specifics. And for those who are suckers for details, this is the first release for which a detailed Git commit log is available, with relatively detailed comments for all 68 commits that comprise this release.
This release is two weeks late, almost entirely because of time spent on a pernicious and difficult parsing bug that, it only occurred to me today, shouldn’t have blocked this release because, while an important problem, it has absolutely nothing to do with definitions. (The problem that is being wrestled with is how to handle subsections of laws that span paragraphs. Easy to describe, difficult to solve, at least for those state codes that pretend that a paragraph and a section are one and the same. I’m looking at you, Virginia.) That issue has been moved back to v0.5, and I’ll go right back to wrestling with it on Monday.
Next up, version 0.5 will be another general-enhancements release. Version 0.6 will be the Solr release—the version in which the popular search software becomes integrated deeply into the project. Version 0.7 will be the API release, where the nascent API gets built out to full functionality and documented properly. Version 0.8 will be the user interface release, in which the design will be overhauled, a responsive design will be implemented, serious work will go into the typography, an intercode navigation system will be implemented, contextual help and explanations will be embedded throughout, and the results of some light UI testing will be incorporated. Version 0.9 will be dedicated to optimizations—making everything go faster and be more fault-tolerant, both through improving the code base and supporting the APC and Varnish caching systems. And, finally, version 1.0 will be the first release in which State Decoded becomes a platform that facilitates the sort of analysis and data exchange that makes this project so full of possibility—things like flexible content export, visualizations, user portfolios of interesting laws, and surely lots of other things.
]]>As with the prior two releases, this is an alpha release—there’s no installer, documentation, or administrative backend. With this release the gap between the released packages and the version of the software powering Virginia Decoded and Sunshine Statutes (Florida) is smaller than ever, and I’m hopeful that I can port those sites over to run on v0.3 of The State Decoded.
With this release, the project is back on the monthly release schedule that was started in June. A roadmap is emerging for the next few releases. Version 0.4 will be dedicated almost entirely to enhancements to the dictionary system that makes laws self-documenting, and that’s due out on September 1. Version 0.5 will be another general-enhancements release, due out on October 1. Version 0.6 will be the Solr release—the version in which the popular search software becomes integrated deeply into the project, due out on November 1. (Solr functionality, by the way, is made possible by a generous contribution from Open Source Connections, specifically David Dodge, Joseph Featherston, and Kasey McKenna, who recently spent a great deal of time setting up Solr to support The State Decoded.) And version 0.7 will be the API release, where the nascent API gets built out to full functionality and documented properly, and that’s due out on December 1.
]]>As with v0.1, this version is an alpha release—there’s no installer, documentation, administrative backend. It’s only for the brave or curious. Virginia Decoded and Sunshine Statutes (Florida) are not running on this release but, instead, a modified version. Each release will get closer to providing all of the functionality of these live sites with the flexibility and abstraction to provide the same functions for other states, and I remain optimistic that v0.3 will be the release that can be installed on these live sites, so that I can eat my own dog food.
]]>The State Decoded started off as Virginia Decoded, a project that I spent nights and weekends on for a year or so. The Knight Foundation’s $165,000 grant funds taking that Virginia-specific code and abstracting it sufficiently to apply to other states, cities, or even countries. In the three months that this has been my full-time job, much of my time has been spent on the very specific task of de-Virginia-ing the code, so that it can work anywhere.
This v0.1 release is the result of that streamlining. It required some significant architectural changes. The biggest change was providing support for the widely varying structures of legal codes. The Code of Virginia is broken into titles, which are broken into chapters, which are broken into articles, and those are made up of individual sections.* Accordingly, I had tables in the database for each of these structures. As a result, the software could only work for legal codes that used the same three-tiered structural system. That had to be tossed out and rewritten. There were a series of changes of this nature, all of which simplified and normalized the software’s functionality.
Anybody looking to launch their own implementation of the State Decoded with this v0.1 release would be disappointed by the awkwardness of the process. There’s no installer, instructions, or clever administration system. But it is functional, structurally intact, and extremely informative for anybody interested in putting it to work.
I’m marching towards v0.2, scheduled for release one month from now, and working on getting Virginia Decoded using the live release of the State Decoded software, instead of the State Decoded software being merely derived from Virginia Decoded. The two should converge somewhere around v0.3, and that’s the next major developmental milestone in this project.
* The official SGML file of the Code of Virginia has no representation of articles and, as a result, they are not employed on Virginia Decoded. That is not a permanent problem, but that is the present state of things.
]]>