This website does readability filtering of other pages. All styles, scripts, forms and ads are stripped. If you want your website excluded or have other feedback, use this form.

: ; {{all_catalogs[cid].name}}
  1. {{d.ext_name}} ({{d.cnt}} )

[]

"{{entries[0].ext_name}}"

{{e.id}}
{{e.id}} {{e.label}}


[]

{{title}}

Catalog groups

{{ucFirst(type)}} {{type2count[type]}}

Search catalogs

  • {{all_catalogs[cid].name}} {{all_catalogs[cid].desc}} [{{all_catalogs[cid].type}}]

Latest catalogs

{{all_catalogs[catalog].name}} {{all_catalogs[catalog].desc}}

Catalogs by property class

{{ucFirst(item2group[igk].label)}} {{item2group[igk].catalogs.length}} {{ld[mq].label}} [{{mq}}] [] {{ld[mq].description}} P{{catalog.wd_prop}} {{catalog.name}} {{catalog.desc}} {{catalog.desc}} {{renderPercentage(catalog.manual)}} {{renderPercentage(catalog.autoq)}} {{renderPercentage(catalog.nowd)}} {{renderPercentage(catalog.na)}}

{{catalog.name}} {{catalog.name}}

{{catalog.desc}}

{{catalog.desc}}

This catalog appears to be empty, maybe the initial scraping is still running

{{catalog.username}} | ({{catalog.last_scrape}}) |

{{catalog.manual}} {{Math.floor(1000*catalog.manual/catalog.total)/10}}% {{catalog.autoq}} {{Math.floor(1000*catalog.autoq/catalog.total/10)}}% {{catalog.nowd}} {{Math.floor(1000*catalog.nowd/catalog.total)/10}}% {{catalog.na}} {{Math.floor(1000*catalog.na/catalog.total)/10}}% {{catalog.unmatched}} {{Math.floor(1000*catalog.unmatched/catalog.total)/10}}% {{catalog.total}} This catalog has been deactivated.

{{type.type}}{{type.cnt}}

{{ym.ym.substr(0,4)+'-'+ym.ym.substr(4,2)}}{{Math.floor(100*ym.cnt/(catalog.manual+catalog.na))}}% {{ym.cnt}}

{{u.cnt}}

{{ts.substr(0,4)+"-"+ts.substr(4,2)+"-"+ts.substr(6,2)+" "+ts.substr(8,2)+":"+ts.substr(10,2)+":"+ts.substr(12,2)}} {{entry.ext_name|decodeEntities|removeTags|miscFixes}} {{entry.ext_name|decodeEntities|removeTags|miscFixes}} # {{all_catalogs[entry.catalog].name}}: ({{entry.ext_id|decodeEntities}}) By # {{all_catalogs[entry.catalog].name}}:
{{entry.ext_desc|decodeEntities|removeTags|miscFixes}} By | | | | | [] |

{{s.site}} {{s.articles}}

{{site}}

{{entry.ext_name|decodeEntities|removeTags|miscFixes}} 🔎

{{all_catalogs[entry.catalog].name|decodeEntities|removeTags|miscFixes}} {{entry.id}} {{entry.ext_id}} {{entry.ext_id}} {{a.language}} {{a.label}} {{entry.ext_desc|decodeEntities|removeTags|miscFixes}} {{entry.type|decodeEntities|removeTags|miscFixes}} {{entry.born}}  –  {{entry.died}} {{entry.lat}}/{{entry.lon}} {{a.aux_name}} {{r.ext_name|decodeEntities|removeTags|miscFixes}} {{r.ext_desc|decodeEntities|removeTags|miscFixes}}

| | | |

SPARQL results

{{e.id}} [] {{e.label}}

{{e.id}} [] {{e.label}}

{{e.title|decodeEntities|removeTags|miscFixes}} {{e.snippet|decodeEntities|removeTags|miscFixes}}

[]

{{update_mnm_status}} {{update_mnm_result.not_found}} IDs from Wikidata not found in Mix'n'match.
{{update_mnm_result.no_changes_written.length}} mismatches between Wikidata and Mix'n'match
Enty #{{d.entry.id}}: Wikidata says , Mix'n'match says

;

This page helps to create an automated web page scraper, to generate and update Mix'n'match catalogs.
The goal is to create a list of URLs, iterate through them, and scrape the respective pages to generate Mix'n'match entries.
[See example]

Catalog

Add a scraper to an existing catalog (give ID), or create a new catalog (leave ID empty).
Note: only the original catalog creator can save to an existing catalog, but everyone can add a new one.
Note: if you enter the property first and then click another field, some information will be filled in automatically.

Levels

A URL can be constructed from a static part, and one or more variables, here called levels. Each level can be a defined list of keys (e.g., letters), a range (numeric from-to, plus step size), or follow (get URLs listed on a page and follow them). The last level with be run through, before the level above it (next lower level) ticks ahead, and the higher level resets.
So, if the first level is keys A-Z, and the second is range 1-100 (step size 1), URLs will use A/1, A/2,... A/100, B/1, B/2,... Z/100.

Level {{level_id+1}}
{{l.mode}} Set keys to: A-Z | a-z {{getURLestimate()}} At least one level is required

Scraper

Now use the level values to construct the URLs to be scraped, then define block/entry-level regular expressions to get the data

Resolve

Use $1, $2,... for the parts of RegEx entry, and $L1, $L2,... for the current values of the levels

Options

Testing and saving

{{can_save_message}} Test is running... The scraper for this catalog was successfully saved, run can take minutes/hours (days in rare cases).
Test results
URL used {{test_results.last_url}} HTML of page Scraped entries {{test_results.results.length}} entries found on page IDNameDescriptionType{{r.id}} {{r.id}} {{r.name}} {{r.desc}} {{r.type}} ERROR: {{test_results.status}}

| |