You can use the form above to quickly get terms, and see the kind of output you can expect.
To use this tool from within your own application, have a look at our examples page.
To extract terms from a given text, use the following request URL:
The following parameters can be used:
| Parameter | Value | Description |
|---|---|---|
| text | string | The text to extract terms from (UTF-8 encoded). English is the only supported language. |
| output | json, xml, txt, php, html | The format to return the terms. |
| terms_only | 1 or 0 (default) | Set this to 1 if you're only interested in the terms (not the occurrence and term word count). Only applies to JSON output. |
| max | number (default 50) | The maximum number of terms to return. |
| lowercase | 1 or 0 (default) | Set this to 1 to have all extracted terms converted to lowercase |
| callback | string | For JSONP: name of your Javascript function to receive the JSON response. If JSON has not been requested, this has no effect The following characters are allowed: A-Z a-z 0-9 . [] and _. |
| url | string | This can be used instead of 'text' or 'text_or_url', to point to a web article. |
| text_or_url | string | For convenience, this parameter can be used instead of the 'text' or 'url' parameters to accept either a URL (on its own) or some text. |
| key | string | Access key. If you've set one up in custom_config.php, otherwise not required. |
| yahoo | 1 or 0 (default) | Set this to 1 to enable Yahoo mode (output format matching that used by Yahoo's Term Extraction service). Alternatively, you can simply call yahoo.php instead of extract.php to enable Yahoo mode. |
These parameters can be used to filter the results.
| Parameter | Value | Description |
|---|---|---|
| min_occurrence | number (default 1) | The minimum number of times a single-word (unigram) term must appear for it be included in the output. |
| max_strength | number (default 3) | Strength is the number of words in the term, so to reduce results to terms with a maximum of 2 words, set this to 2. |
| keep_if_strength | number (default 2) | Keep a term if the term's word count is equal to or greater than this, regardless of occurrence. |
| exc[] | string | Check terms for this string, and exclude term if there's a match or partial match. This can appear multiple times. |
| filter | 1 (default) or 0 | Set this to 0 to disable filtering (overriding the four parameters above). |
These additional parameters can be used instead of the 'key' and 'text' parameters above. They are here for compatibility with Yahoo's Term Extraction service.
| Parameter | Value | Description |
|---|---|---|
| appid | string | Same as 'key' |
| context | string | Same as 'text' |
Either text, url, or text_or_url must be supplied.
In addition to the options above, Term Extraction comes with a configuration file which allows you to control how the application works.
To change the configuration, save a copy of config.php as custom_config.php and make any changes you like to it.
If everything works fine, feel free to modify this page by following the steps below:
Next time you load this page, it will automatically load custom_index.php instead.
Check our help centre if you need help. You can also email us at help@fivefilters.org.
Thanks for downloading and setting up the Term Extraction web service. This software is developed and maintained by FiveFilters.org.
Term Extraction from FiveFilters.org is a free software project to help you perform term extraction through a web service. Given some text it will return a list of terms with (hopefully) the most relevant first. Terms can be returned in a number of formats. The application is intended to be a simple, free alternative to Yahoo's Term Extraction service. English is the only language supported at the moment.
Note: 'Free' as in 'users have the freedom to run, copy, distribute, study, change and improve the software' (see the free software definition)
If you're the owner of this site and you plan to offer this service to others through your hosted copy, please keep a download link so users can grab a copy of the code if they want it (you can either offer a free download yourself, or link to the purchase option on fivefilters.org to support us).
For full details, please refer to the license.
If you're not the owner of this site (ie. you're not hosting this yourself), you do not have to rely on an external service if you don't want to. You can download your own copy of Term Extraction under the AGPL license.
Term Extraction is written in PHP and relies on the following primary components:
Depending on your use, these secondary components may also be used:
PHP 5.2 or above is required. A simple shared web hosting account should work fine.
Download from FiveFilters.org
Term Extraction is licensed under the AGPL version 3 — more information about why we use this license can be found on FiveFilters.org
The software components in this application are licensed as follows...