a

API Documentation

The API enables a list of breached accounts (username and email) to be quickly searched via a RESTful service.

Overview

This is the current version of the API.

Authorization

Authorization is required for all APIs that enable Breached.Me to search for breached username and email addresses. A Breached.Me subscription key is required for authorization which can be obtained on the API key page. Then, the key will be passed in a “breached.me-api-key” header like this:

GET https://breached.me/api/v1/{service}/{parameter} breached.me-api-key: [your key]

HTTP response code will be used to show the result of the API call such as:

Code Description
401 Unauthorized – the API key provided is not valid

Specifying the API version

Version 1 of the API is consumable only by specifying the API version in the URL.

Versioning via the URL

This method can easily be invoked directly by requesting the URL with an appropriate user agent string.

GET https://breached.me/api/v1/{service}/{parameter}

Specifying the user agent

Each request to the API must be accompanied by a user agent request header. Typically this should be the name of the app consuming the service. A missing user agent will result in an HTTP 403 response. A valid request would look like:

GET https://breached.me/api/v1/{service}/{parameter} user-agent: [your app name]

The user agent should accurately describe the nature of the API consumer, so, it can be clearly determined in the request. Failure to do so may result in the blocking of the request.

Getting all breaches for an account

The most common use of the API is to return a list of all breaches. The API takes a single parameter which is the account to be searched for. The account is not case sensitive and leading or trailing white spaces will be deleted. The account should always be URL encoded. This is an authenticated API and a BM API key must be given with the request.

GET https://haveibeenpwned.com/api/v3/breachedaccount/ {account}bm-api-key: [your key]

By default, only the name of the breached account is returned – not the complete breached data – thus, reducing the response body size by approximately 98%. The name can then be used to either retrieve a single breach or it can be found in the list of all breaches in the system. If you’d like complete breach data returned in the API call, a non-truncated response can be specified via query string parameter:

Parameter Example Description
truncateResponse ?truncateResponse=false Returns the full breached model.

The result set can also be filtered by giving one of these query strings:

Parameter Example Description
domain ?domain=adobe.com Filters the result set to only breaches against the domain specified. It is possible that one site (and consequently domain), is compromised on multiple occasions.

Note: The public API won’t return accounts from any breaches flagged as sensitive or retired. By default, the API should return breaches flagged as unverified, however, these can be removed using the following parameter:

Parameter Example Description
includeUnverified ?includeUnverified=false Returns breaches that have been flagged as “unverified”. By default, both verified and unverified breaches are returned when performing a search.

Getting all breached sites in the system

A “breach” is an instance when a system might be jeopardized by someone (aka attacker). For example, eBay was a breach, Canva was a breach, etc. It is possible to return the details of the breach in the system which currently stands at 487 breaches.

By version in URL:

GET https://breached.me/api/v1/breaches

The result set can also be filtered by giving one of these query strings:

Parameter Example Test Description
domain ?domain=adobe.com test Filters the result set to only breaches against the domain specified. It is possible that one site (and consequently domain), is compromised on multiple occasions.

Getting a single breached site

Sometimes, you just need to check a single breach that can be retrieved by the breach “name”. “Name” is a stable value which may or may not be the same as the “title” (which can change). See the breach model below for more information.

By version in URL:

GET https://breached.me/api/v1/breach/{name}

Getting all data classes in the system

A “data class” is an attribute of a record compromised in a breach. For example, many breaches expose data classes such as “Email addresses” and “Passwords”. The values returned by this service are ordered alphabetically in a string array and will expand over time as new breaches expose previously unseen classes of data.

By version in URL:

GET https://breached.me/api/v1/dataclasses

The breached model

The breached model

Attribute Type Description
Name string A Pascal-cased name representing the breach and is unique across all breaches. This value does not change and may be used to name dependent assets (such as images) but must not be shown directly to end-users (see the “Title” attribute instead).
Title string A descriptive title for the breach suitable for displaying to end-users. It’s unique across all breaches, but individual values may change in the future (i.e. if another breach occurs against an organization already in the system). If a permanent value is required to call the breach, refer to the “Name” attribute instead.
Domain string The domain of the primary website where the breach happened. This may be used for identifying other assets and external systems on the site.
BreachDate date The date (only) the breach originally happened (in ISO 8601 format). This is not always accurate — frequently, breaches are discovered and reported long after the original incident. This is just a guide.
AddedDate datetime The (precise) date and time the breach was added to the system (in ISO 8601 format).
ModifiedDate datetime The (precise) date and time the breach was revised (in ISO 8601 format). This will only vary from the AddedDate attribute if other attributes have changed or data is changed (i.e. additional data is identified and loaded). It is always either equal to or higher than the AddedDate attribute.
PwnCount integer The total number of accounts loaded into the system.
Description string An overview of the breach in HTML markup. This may include markup such as emphasis and strong tags as well as hyperlinks.
DataClasses string[] This attribute describes the nature of the data compromised in the breach and contains an alphabetically ordered string array of impacted data classes.
IsVerified boolean Indicates that the breach is considered unverified. An unverified breach may not have been hacked from the indicated website. An unverified breach is still loaded into BM when there’s sufficient confidence that a significant portion of the data is legitimate.
IsFabricated boolean Indicates that the breach is considered fabricated. A fabricated breach is unlikely to have been hacked from the indicated website and usually contains a large amount of manufactured data. However, it still contains legitimate email addresses and asserts that the account owners were compromised in the alleged breach.
IsSensitive boolean Indicates if the breach is considered sensitive. The public API will not return any accounts for a breach flagged as sensitive.
IsRetired boolean Indicates if the breach has been retired. This data has been permanently removed and will not be returned by the API.
IsSpamList boolean Indicates if the breach is considered a spam list. This flag has no impact on any other attributes but it means that the data has not come as a result of a security compromise.
LogoPath string A URI that specifies where a logo for the breached service can be found. Logos should always be in PNG format.

Sample breach response

All responses return breached models either in a collection (breaches for an account or all breaches in the system) or as a single item (retrieving a breach by name). When a collection is returned, it’s sorted alphabetically by the title of the breach.

[
{
“Name”:”Adobe”,
“Title”:”Adobe”,
“Domain”:”adobe.com”,
“BreachDate”:”2013-10-04″,
“AddedDate”:”2013-12-04T00:00Z”,
“ModifiedDate”:”2013-12-04T00:00Z”,
“PwnCount”:152445165,
“Description”:”In October 2013, 153 million Adobe accounts were breached with each containing an internal ID, username, email, encrypted password and a password hint in plain text. The password cryptography was poorly done and many were quickly resolved back to plain text. The unencrypted hints also disclosed much about the passwords adding further to the risk that hundreds of millions of Adobe customers already faced.”,
“DataClasses”:[“Email addresses”,”Password hints”,”Passwords”,”Usernames”],
“IsVerified”:True, “IsFabricated”:False,
“IsSensitive”:False, “IsRetired”:False,
“IsSpamList”:False,
“LogoPath”:”https://haveibeenpwned.com/Content/Images/PwnedLogos/Adobe.png” },
{
“Name”:”BattlefieldHeroes”,
“Title”:”Battlefield Heroes”,
“Domain”:”battlefieldheroes.com”,
“BreachDate”:”2011-06-26″,
“AddedDate”:”2014-01-23T13:10Z”,
“ModifiedDate”:”2014-01-23T13:10Z”,
“PwnCount”:530270,
“Description”:”In June 2011 as part of a final breached data dump, the hacker collective “LulzSec” obtained and released over half a million usernames and passwords from the game Battlefield Heroes. The passwords were stored as MD5 hashes with no salt and many were easily converted back to their plain text versions.”,
“DataClasses”:[“Passwords”,”Usernames”],
“IsVerified”:True,
“”IsFabricated”:False,
“IsSensitive”:False,
“IsRetired”:False,
“IsSpamList”:False,
“LogoPath”:”https://haveibeenpwned.com/Content/Images/PwnedLogos/ BattlefieldHeroes.png”
}
]

Breached Passwords overview

Breached Passwords are more than half a billion passwords that have previously been exposed in data breaches. The service is detailed in the launch blog post. The entire data set is both downloadable and searchable online via the Breached Passwords page.

Each password is stored as an SHA-1 hash of a UTF-8 encoded password. The downloadable source data delimits the full SHA-1 hash and the password count with a colon (:) and each line with a CRLF.

Searching by range

In order to protect the value of the source password being searched for, Breached Passwords also implements a k-Anonymity model that allows a password to be searched for by partial hash. This allows the first 5 characters of an SHA-1 password hash (not case-sensitive) to be passed to the API:

GET https://api.breachedpasswords.com/range/{first 5 hash chars}

When a password hash with the same first 5 characters is found in the Breached Passwords repository, the API will respond with an HTTP 200 and include the suffix of every hash beginning with the specified prefix, followed by a count of how many times it appears in the data set. The API consumer can then search the results of the response for the presence of their source hash and if not found, the password does not exist in the data set. A sample response for the hash prefix “21BD1” would be as follows:

0018A45C4D1DEF81644B54AB7F969B88D65:1
00D4F6E8FA6EECAD2A3AA415EEC418D38EC:2
011053FD0102E94D6AE2F8B83D76FAF94F6:1
012A7CA357541F0AC487871FEEC1891C49C:2
0136E006E24E7D152139815FB0FC6A50B15:2

A range search typically returns approximately 500 hash suffixes, although this number will differ depending on the hash prefix being searched for and will increase as more passwords are added. There are 1,048,576 different hash prefixes between 00000 and FFFFF (16^5) and every single one will return HTTP 200; there is no circumstance in which the API should return HTTP 404.

Code Body Description
200 Hash suffixes   counts Ok — all password hashes beginning with the searched prefix are returned alongside prevalence counts

Introducing padding

In order to further strengthen privacy, padding can be added to responses so if anyone was able to intercept encrypted responses to the API s/he can’t determine which hash prefix was searched for by observing the response size. You can enable padding by a request header and ensures that all responses contain between 800 and 1,000 results regardless of the number of hash suffixes returned by the service.

Code Body Description
Add-Padding Add-Padding: true Pads out responses to ensure all results contain a random number of records between 800 and 1,000.

Note: Padded entries always have a password count of 0 and can be discarded once received.

Getting all pastes for an account

The API takes a single parameter which is the email address to be searched for. The email is not case sensitive and leading or trailing white spaces will be deleted. The email should always be URL encoded. This is an authenticated API and a BM API key must be given with the request.

GET https://breached.me/api/v1/dataclasses
bm-api-key: [your key]

The paste model

Each paste contains a number of attributes describing it. The current attributes are:

Attribute Type Description
Source string The paste service the record was retrieved from. Current values are: Pastebin, Pastie, Slexy, Ghostbin, QuickLeak, JustPaste, AdHocUrl, PermanentOptOut, OptOut
Id string The ID of the paste as it was given at the source service. Combined with the “Source” attribute, this can be used to resolve the URL of the paste.
Title string The title of the paste as observed on the source site. This may be null and if so will be omitted from the response.
Date date The date and time (precision to the second) that the paste was posted. This is taken directly from the paste site when this information is available but may be null if no date is published.
EmailCount integer The number of emails that were found when processing the paste. Emails are extracted by using the regular expression \b[a-zA-Z0-9\.\-_\+]+@[a-zA-Z0-9\.\-_]+\.[a-zA-Z]+\b

Sample paste response

Searching an account for pastes always returns a collection of the pasted entity. The collection is sorted chronologically with the newest paste first.

[
{
“Source”:”Pastebin”,
“Id”:”8Q0BvKD8″,
“Title”:”syslog”,
“Date”:”2014-03-04T19:14:54Z”,
“EmailCount”:139
},
{
“Source”:”Pastie”,
“Id”:”7152479″,
“Date”:”2013-03-28T16:51:10Z”,
“EmailCount”:30
}
]

Cross-origin resource sharing (CORS)

CORS is only supported for non-authenticated APIs. When supported, it accepts all origins — you can hit the API from websites on any other domain.

HTTPS

All API endpoints must be invoked over HTTPS. Any requests over HTTP will result in a 301 response with a redirect to the same path on the secure scheme. Only TLS versions 1.2 and 1.3 are supported; older versions of the protocol will not allow a connection to be made.

Response codes

Semantic HTTP response codes are used to indicate the result of the search:

Code Description
200 Ok — everything worked and there’s a string array of pwned sites for the account
400 Bad request — the account does not comply with an acceptable format (i.e. it’s an empty string)
401 Unauthorized — either no API key was provided or it wasn’t valid
403 Forbidden — no user agent has been specified in the request
404 Not found — the account could not be found and has therefore not been pwned
429 Too many requests — the rate limit has been exceeded
503 Service unavailable — usually returned by Cloudflare if the underlying service is not available

Abuse

There’s not much point; if you want to build up a treasure trove of breached email addresses or usernames, go and download the dumps (they’re usually just a Google search away) and save yourself the hassle and time of trying to enumerate an API one account at a time. With that, the use of the API should be within acceptable use.

Rate limiting

Requests to the breaches and pastes APIs are limited to one per every 1500 milliseconds each from any given BM API key (a key may request both APIs within this period). Any request that exceeds the limit will receive an HTTP 429 “Too many requests” response. The response also includes an accompanying “retry-after” response header expressing the number of seconds remaining before the IP address can make a successful API call (the value is rounded up to the next whole second).

The response body explains the rate limit and refers to the acceptable use of documentation.

HTTP/1.1 429
retry-after: 2 { “statusCode”: 429, “message”: “Rate limit is exceeded. Try again in 2 seconds.” }

The retry period can be changed; attempting to query the API more aggressively than the allowable rate causes the retry period to start again with each failed request. It’s advisable to avoid querying the API at exactly the same rate limit as your network behavior as this may result in some requests arriving within the retry period and causing a 429. Adding an additional 100-millisecond delay between requests on top of the rate limit should normally ensure this won’t happen.

When the rate limit is consistently exceeded, further defenses may be employed to limit the ability to query the API. These defenses include blocks or JavaScript challenges by Cloudflare which may result in an HTTP 503 “Service Unavailable” response.

There is no rate limit for the Breached Passwords API.

Acceptable use

The API has been designed to make it easy for people to do awesome things with it. Things that are not awesome include:

  • Querying the data for purposes that are intended to cause more harm to the victims of data breaches
  • Anything deliberately intended to limit service availability such as the denial of service attacks
  • Deliberate attempts to bypass measures designed to ensure acceptable use
  • Improperly identifying the user agent such that it accurately describes the consumer of the API
  • Misrepresenting the consuming client by impersonating other user agents in an attempt to confuse API requests
  • Other services designed to fraudulently represent the Breached Me name or brand
  • Misrepresenting the source of the data as originating from somewhere other than Breached Me
  • Not adhering to the Creative Commons Attribution License as described below
  • Automating the consumption of other APIs not explicitly documented on this page
  • Using the service in a fashion that brings Breached Me into disrepute

License — breach & paste APIs

This work is licensed under a Creative Commons Attribution 4.0 International License.

In other words, you’re welcome to use the public API to build other services, but you must identify Breached Me as the source of the data. Clear and visible attribution with a link to breached.me should be present anywhere data from the service is used including when searching breaches or pastes and when representing breach descriptions. It doesn’t have to be obvious, but the interface of Breached Me should clearly attribute the source per the Creative Commons Attribution 4.0 International License.

In order to help maximize adoption, there are no licensing or attribution requirements on the Breached Passwords API, if you’d like one you can contact us.

Skip to content