What regular expression would match Library of Congress call numbers?
I'm looking for something that can help with my awful reporting
interface.
Has anyone created a macro or a program that will strip out internal
spaces, except for the space before the date and recognize all of the
different ways in which LoC call numbers can appear, creating a list in
shelf order?
In short, I need to normalize my LC call number data.
Brett
Comments
- jonsca: Are you looking for code? I'm not sure that will be on topic here, but
see what others think. If you had something that you had started, you
could present it as a question on SO or DBA.SE, which I think have a
firmer basis with normalization and regex.
Answer by Bill Dueber
Short answer: it's really, really hard. LC Call Numbers are a freakin'
mess.
Check out this discussion for some links to existing code.
http://www.libcatcode.org/questions/5/normalized-lcc-for-sorting
Comments
- Joe: as there are only a few links of significance in that discussion (to the
Perl, Perl/Python and Java code), you may want to reproduce them here,
just in case the intermediary site should disappear.
- Brett: Thanks. I've been perusing code4lib and the limited options out there,
and I think a python parser may be the only option. That's a larger job
than I can tackle right now.