Company | Support | Contact Us
Privacy | Transactions and Code Sets | Security | Identifiers
HIPAA Navigator | HIPAA SLP | Manuals
Approach | Assessment | Implementation | Training | Evaluation | Maintenance
For Providers | For Health Plans | FAQ | Free Downloads
For Providers | For Health Plans | For Attorneys | For Security Professionals
subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link | subglobal7 link
subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link | subglobal8 link

Standards for Privacy of Individually Identifiable Health Information

G. Section 164.514--Other Requirements Relating to Uses and Disclosures of Protected Health Information

1. De-Identification of Protected Health Information

December 2000 Privacy Rule

At Sec. 164.514(a)-(c), the Privacy Rule permits a covered entity to de-identify protected health information so that such information may be used and disclosed freely, without being subject to the Privacy Rule's protections. Health information is de-identified, or not individually identifiable, under the Privacy Rule, if it does not identify an individual and if the covered entity has no reasonable basis to believe that the information can be used to identify an individual. In order to meet this standard, the Privacy Rule provides two alternative methods for covered entities to de-identify protected health information.

First, a covered entity may demonstrate that it has met the standard if a person with appropriate knowledge and experience applying generally acceptable statistical and scientific principles and methods for rendering information not individually identifiable makes and documents a determination that there is a very small risk that the information could be used by others to identify a subject of the information. The preamble to the Privacy Rule refers to two government reports that provide guidance for applying these principles and methods, including describing types of techniques intended to reduce the risk of disclosure that should be considered by a professional when de-identifying health information. These techniques include removing all direct identifiers, reducing the number of variables on which a match might be made, and limiting the distribution of records through a "data use agreement" or "restricted access agreement" in which the recipient agrees to limits on who can use or receive the data.

Alternatively, covered entities may choose to use the Privacy Rule's safe harbor method for de-identification. Under the safe harbor method, covered entities must remove all of a list of 18 enumerated identifiers and have no actual knowledge that the information remaining could be used, alone or in combination, to identify a subject of the information. The identifiers that must be removed include direct identifiers, such as name, street address, social security number, as well as other identifiers, such as birth date, admission and discharge dates, and five-digit zip code. The safe harbor requires removal of geographic subdivisions smaller than a State, except for the initial three digits of a zip code if the geographic unit formed by combining all zip codes with the same initial three digits contains more than 20,000 people. In addition, age, if less than 90, gender, ethnicity, and other demographic information not listed may remain in the information. The safe harbor is intended to provide covered entities with a simple, definitive method that does not require much judgment by the covered entity to determine if the information is adequately de- identified.

The Privacy Rule also allows for the covered entity to assign a code or other means of record identification to allow de-identified information to be re-identified by the covered entity, if the code is not derived from, or related to, information about the subject of the information. For example, the code cannot be a derivation of the individual's social security number, nor can it be otherwise capable of being translated so as to identify the individual. The covered entity also may not use or disclose the code for any other purpose, and may not disclose the mechanism (e.g., algorithm or other tool) for re- identification.

The Department is cognizant of the increasing capabilities and sophistication of electronic data matching used to link data elements from various sources and from which, therefore, individuals may be identified. Given this increasing risk to individuals' privacy, the Department included in the Privacy Rule the above stringent standards for determining when information may flow unprotected. The Department also wanted the standards to be flexible enough so the Privacy Rule would not be a disincentive for covered entities to use or disclose de- identified information wherever possible. The Privacy Rule, therefore, strives to balance the need to protect individuals' identities with the need to allow de-identified databases to be useful.

March 2002 NPRM

The Department heard a number of concerns regarding the de-identification standard in the Privacy Rule. These concerns generally were raised in the context of using and disclosing information for research, public health purposes, or for certain health care operations. In particular, concerns were expressed that the safe harbor method for de-identifying protected health information was so stringent that it required removal of many of the data elements that were essential to analyses for research and these other purposes. The comments, however, demonstrated little consensus as to which data elements were needed for such analyses and were largely silent regarding the feasibility of using the Privacy Rule's alternative statistical method to de-identify information.

Based on the comments received, the Department was not convinced of the need to modify the safe harbor standard for de-identified information. However, the Department was aware that a number of entities were confused by potentially conflicting provisions within the de-identification standard. These entities argued that, on the one hand, the Privacy Rule treats information as de-identified if all listed identifiers on the information are stripped, including any unique, identifying number, characteristic, or code. Yet, the Privacy Rule permits a covered entity to assign a code or other record identification to the information so that it may be re-identified by the covered entity at some later date.

The Department did not intend such a re-identification code to be considered one of the unique, identifying numbers or codes that prevented the information from being de-identified. Therefore, the Department proposed a technical modification to the safe harbor provisions explicitly to except the re-identification code or other means of record identification permitted by Sec. 164.514(c) from the listed identifiers (Sec. 164.514(b)(2)(i)(R)).

Overview of Public Comments

The following provides an overview of the public comment received on this proposal. Additional comments received on this issue are discussed below in the section entitled, "Response to Other Public Comments."

All commenters on our clarification of the safe harbor re- identification code not being an enumerated identifier supported our proposed regulatory clarification.

Final Modifications

Based on the Department's intent that the re- identification code not be considered one of the enumerated identifiers that must be excluded under the safe harbor for de-identification, and the public comment supporting this clarification, the Department adopts the provision as proposed. The re-identification code or other means of record identification permitted by Sec. 164.514(c) is expressly excepted from the listed safe harbor identifiers at Sec. 164.514(b)(2)(i)(R).

Response to Other Public Comments

Comment: One commenter asked if data can be linked inside the covered entity and a dummy identifier substituted for the actual identifier when the data is disclosed to the external researcher, with control of the dummy identifier remaining with the covered entity.

Response: The Privacy Rule does not restrict linkage of protected health information inside a covered entity. The model that the commenter describes for the dummy identifier is consistent with the re- identification code allowed under the Rule's safe harbor so long as the covered entity does not generate the dummy identifier using any individually identifiable information. For example, the dummy identifier cannot be derived from the individual's social security number, birth date, or hospital record number.

Comment: Several commenters who supported the creation of de- identified data for research based on removal of facial identifiers asked if a keyed-hash message authentication code (HMAC) can be used as a re-identification code even though it is derived from patient information, because it is not intended to re-identify the patient and it is not possible to identify the patient from the code. The commenters stated that use of the keyed-hash message authentication code would be valuable for research, public health and bio-terrorism detection purposes where there is a need to link clinical events on the same person occurring in different health care settings (e.g. to avoid double counting of cases or to observe long-term outcomes).

These commenters referenced Federal Information Processing Standard (FIPS) 198: "The Keyed-Hash Message Authentication Code." This standard describes a keyed-hash message authentication code (HMAC) as a mechanism for message authentication using cryptographic hash functions. The HMAC can be used with any iterative approved cryptographic hash function, in combination with a shared secret key. A hash function is an approved mathematical function that maps a string of arbitrary length (up to a pre-determined maximum size) to a fixed length string. It may be used to produce a checksum, called a hash value or message digest, for a potentially long string or message.

According to the commenters, the HMAC can only be breached when the key and the identifier from which the HMAC is derived and the de- identified information attached to this code are known to the public. It is common practice that the key is limited in time and scope (e.g. only for the purpose of a single research query) and that data not be accumulated with such codes (with the code needed for joining records being discarded after the de-identified data has been joined).

Response: The HMAC does not meet the conditions for use as a re- identification code for de-identified information. It is derived from individually identified information and it appears the key is shared with or provided by the recipient of the data in order for that recipient to be able to link information about the individual from multiple entities or over time. Since the HMAC allows identification of individuals by the recipient, disclosure of the HMAC violates the Rule. It is not solely the public's access to the key that matters for these purposes; the covered entity may not share the key to the re- identification code with anyone, including the recipient of the data, regardless of whether the intent is to facilitate re-identification or not.

The HMAC methodology, however, may be used in the context of the limited data set, discussed below. The limited data set contains individually identifiable health information and is not a de-identified data set. Creation of a limited data set for research with a data use agreement, as specified in Sec. 164.514(e), would not preclude inclusion of the keyed-hash message authentication code in the limited data set. The Department encourages inclusion of the additional safeguards mentioned by the commenters as part of the data use agreement whenever the HMAC is used.

Comment: One commenter requested that HHS update the safe harbor de-identification standard with prohibited 3-digit zip codes based on 2000 Census data.

Response: The Department stated in the preamble to the December 2000 Privacy Rule that it would monitor such data and the associated re-identification risks and adjust the safe harbor as necessary. Accordingly, the Department provides such updated information in response to the above comment. The Department notes that these three- digit zip codes are based on the five-digit zip Code Tabulation Areas created by the Census Bureau for the 2000 Census. This new methodology also is briefly described below, as it will likely be of interest to all users of data tabulated by zip code.

The Census Bureau will not be producing data files containing U.S. Postal Service zip codes either as part of the Census 2000 product series or as a post Census 2000 product. However, due to the public's interest in having statistics tabulated by zip code, the Census Bureau has created a new statistical area called the Zip Code Tabulation Area (ZCTA) for Census 2000. The ZCTAs were designed to overcome the operational difficulties of creating a well-defined zip code area by using Census blocks (and the addresses found in them) as the basis for the ZCTAs. In the past, there has been no correlation between zip codes and Census Bureau geography. Zip codes can cross State, place, county, census tract, block group and census block boundaries. The geographic entities the Census Bureau uses to tabulate data are relatively stable over time. For instance, census tracts are only defined every ten years. In contrast, zip codes can change more frequently. Because of the ill-defined nature of zip code boundaries, the Census Bureau has no file (crosswalk) showing the relationship between US Census Bureau geography and US Postal Service zip codes.

ZCTAs are generalized area representations of U.S. Postal Service (USPS) zip code service areas. Simply put, each one is built by aggregating the Census 2000 blocks, whose addresses use a given zip code, into a ZCTA which gets that zip code assigned as its ZCTA code. They represent the majority USPS five-digit zip code found in a given area. For those areas where it is difficult to determine the prevailing five-digit zip code, the higher-level three-digit zip code is used for the ZCTA code. For further information, go to: http://frwebgate.access.gpo.gov/cgi-bin/leaving.cgi?from=leavingFR.html&log=linklog&to=http://www.census.gov/geo/www/gazetteer/places2k.html.

Utilizing 2000 Census data, the following three-digit ZCTAs have a population of 20,000 or fewer persons. To produce a de-identified data set utilizing the safe harbor method, all records with three-digit zip codes corresponding to these three-digit ZCTAs must have the zip code changed to 000. The 17 restricted zip codes are: 036, 059, 063, 102, 203, 556, 692, 790, 821, 823, 830, 831, 878, 879, 884, 890, and 893.

 

Go to TOP

LIMITED DATA SETS

Privacy Policy | Legal Notice | ©2001-2008 HIPAAssociates, Inc.