Today, version 0.4 of The State Decoded was tagged on GitHub and bundled up for download, the result of six weeks of work. This release is dedicated (almost) exclusively to enhancements to the dictionary system. Eighteen issues comprise the changes in this release, sixteen of which pertain to the built-in automatic, custom dictionary system, which finds defined terms within legal codes and stores them in a dictionary, using that data to embed contextual definitions that are relevant to each law.
There are a few big changes:
- The State Decoded comes with a built-in dictionary of general legal terms. Using several different non-copyrighted, government-created legal dictionaries, a collection of nearly 500 terms have been put together, which will help people to understand common legal terms that are rarely defined within legal codes, such as “mutatis mutandis,” “tort,” “pro tem,” and “cause of action.”
- Dictionary terms are now identified more aggressively, which means that for many states, the size and scope of the custom dictionary is going to expand substantially. In the case of Virginia there was a 49% increase (a leap from 7,681 to 11,504 definitions), a striking difference that could be observed immediately when browsing the site.
- The problem of nested/overlapping definitions has been solved. When one definition was nested within another (e.g., if we have definitions for both “robbery” and “armed robbery”), then mousing over “robbery” would yield a pair of pop-up definitions, one obscuring the other. Now only the definition for the longest term is defined under those circumstances.
- Internal terminology has been standardized. In various places the dictionary and its components were all called different things (glossary, definitions, dictionary, terms, etc.) in different places. Now the collection of words is called a “dictionary,” each defined word is a “term,” and the description of that that term means is a “definition.”)
- The retrieval and display of definitions is substantially faster—they take about half the time that they used to. This is a result of optimizing and simplifying the structure of the database table in which definitions are stored.
A list of all closed issues is available for those who want specifics. And for those who are suckers for details, this is the first release for which a detailed Git commit log is available, with relatively detailed comments for all 68 commits that comprise this release.
This release is two weeks late, almost entirely because of time spent on a pernicious and difficult parsing bug that, it only occurred to me today, shouldn’t have blocked this release because, while an important problem, it has absolutely nothing to do with definitions. (The problem that is being wrestled with is how to handle subsections of laws that span paragraphs. Easy to describe, difficult to solve, at least for those state codes that pretend that a paragraph and a section are one and the same. I’m looking at you, Virginia.) That issue has been moved back to v0.5, and I’ll go right back to wrestling with it on Monday.
Next up, version 0.5 will be another general-enhancements release. Version 0.6 will be the Solr release—the version in which the popular search software becomes integrated deeply into the project. Version 0.7 will be the API release, where the nascent API gets built out to full functionality and documented properly. Version 0.8 will be the user interface release, in which the design will be overhauled, a responsive design will be implemented, serious work will go into the typography, an intercode navigation system will be implemented, contextual help and explanations will be embedded throughout, and the results of some light UI testing will be incorporated. Version 0.9 will be dedicated to optimizations—making everything go faster and be more fault-tolerant, both through improving the code base and supporting the APC and Varnish caching systems. And, finally, version 1.0 will be the first release in which State Decoded becomes a platform that facilitates the sort of analysis and data exchange that makes this project so full of possibility—things like flexible content export, visualizations, user portfolios of interesting laws, and surely lots of other things.