Professional Ethics & Data Exhaust
written by Jessica Rodrigues as an assignment in fall 2020 for LS 566 at the University of Alabama
Libraries are obligated to collect some patron data in order to steward the collection and issue library cards pursuant to local policy, such as names, addresses, and contact information. Some libraries choose to accumulate additional information for demographic purposes - gender, birthdates, perhaps even linguistic preference or racial identity. Patrons can generate additional stored data based on charge history preferences or computer use in the building. These, combined with use metadata generated from everyday library interactions, creates a large trove of potentially personally identifiable data. As a result, libraries must be intentional and purposeful in which data they collect, what kind of data exhaust is generated in regular library use, and how they will safely store or eliminate this data and metadata.
Some forms of using use metadata may seem innocuous at first glance -- for example, noticing that a patron regularly checks out cozy mysteries, so recommending a hold on the new Joanne Fluke before being asked -- but this becomes invasive when dealing with regular item use about more sensitive topics like health or sex that a patron would rather not have affiliated with their identity. In this example, while the patron may appreciate the thought behind the new Joanne Fluke novel, she may then decide to self-censor and avoid checking out titles about a sensitive medical condition or steamy romance paperbacks because she now perceives that someone is watching what she is reading and that this information could get out. As the ALA put it, “When users recognize or fear that their privacy or confidentiality is compromised, true freedom of inquiry no longer exists.” So while this Amazon-style approach to automated suggested reading may be enticing to some users, it should remain optional and very private.
Perhaps the simplest solution to protecting patron data is not keeping any more than is necessary -- “if a library doesn’t have a particular piece of data, it cannot be misused by someone” (Corrado, 2020, p. 51). For example, there was a library in the Midwest (which I will not name) that, rather than use a library card registration form that asked for patrons’ addresses and birthdays, decided it would be simpler to just scan and retain copies of patrons’ drivers licenses. However, this created extraneous, sensitive, unnecessary data that the library was burdened with protecting, like ID numbers and even which patrons were using special driver IDs for undocumented immigrants. While they had no indications of a data breach, they decided to end that practice and destroy all of the unnecessary records. Similarly, while some patrons may appreciate having an option to view their checkout history, any such function should be set up in such a way that it is not viewable to staff or other users.
The Seattle Public Library chose to accumulate additional demographic information about patrons to better understand use patterns of patrons and to target perceived patron needs. As Seattle librarian Becky Yoose states in the Journal of Intellectual Freedom and Privacy in 2017, “Libraries should -- if not must -- be sanctuaries from this kind of default detailed data collection,” yet, she continues, “libraries need to gain insights into their patron populations.” If a library, such as Seattle Public Library, insists on collecting unnecessary data for marketing or decision making purposes, it must do so in an ethical way that protects privacy, and must first sincerely ask themselves if there is no better way.
Librarians do not generally go into their career with an eye towards cybersecurity or information technology, but if “personal data is as hot as nuclear waste,” as Cory Doctorow suggested in The Guardian in 2008, then libraries are veritable plutonium refineries. Doctorow was referring to efforts by the London Underground to accumulate additional identifiable information about its users, which created a data exhaust that could viably be used to construct narratives about their lives and choices, but libraries are similarly situated. Library workers often know a great deal about patrons from basic interactions -- imagine these inferences being drawn on a large scale using computer learning. As libraries have increasingly come under attack from phishing and ransomware schemes that endanger private records, it is impossible to create a truly airtight data network. It is important, therefore, for libraries to intentionally limit the amount of data and metadata they request, store, and generate, to protect the institution itself, its patrons’ privacy, and the confidence of its patrons.
Corrado, E. M. (2019). Libraries and protecting patron privacy. Technical Services Quarterly, 37(1), 44-54. doi:10.1080/07317131.2019.1691761
Doctorow, C. (2008, January 15). Personal data is as hot as nuclear waste. The Guardian. Retrieved from https://www.theguardian.com/technology/2008/jan/15/data.security#:~:text=Personal%20data%20is%20as%20hot%20as%20nuclear%20waste&text=Our%20capacity%20to%20store%2C%20copy,skyward%2C%20headed%20straight%20into%20infinity.
Pomerantz, J. (2015). Metadata. Cambridge, MA: The MIT Press.
Privacy: An Interpretation of the Library Bill of Rights. (2020, February 05). Retrieved September 7, 2020, from http://www.ala.org/advocacy/intfreedom/librarybill/interpretations/privacy
Yoose, B. (2017). Balancing privacy and strategic planning needs. A case study in de-identifaction of patron data. Journal of Intellectual Freedom and Privacy. (2)1, 15-22.