This page uses content from Wikipedia and is licensed under CC BY-SA.

Wikipedia:Bots/Noticeboard

This is a message board for coordinating and discussing bot-related issues on Wikipedia (also including other programs interacting with the MediaWiki software). Although this page is frequented mainly by bot owners, any user is welcome to leave a message or join the discussion here.

If you want to report an issue or bug with a specific bot, follow the steps outlined in WP:BOTISSUE first. This not the place for requests for bot approvals or requesting that tasks be done by a bot. General questions about the MediaWiki software (such as the use of templates, etc.) should be asked at Wikipedia:Village pump (technical).


CommonsDelinker and Filedelinkerbot

Filedelinkerbot was created to supplement CommonsDelinker, which was performing inadequately, with a lot of unaddressed bugs (including, off the top of my head, breaking templates and galleries) and limited maintenance. Is there any continued need for CommonsDelinker, that cannot be replaced by Filedelinkerbot? There are some issues which I'd like to raise, such as the removal of images from discussion archives (which should really be left as red links), and having single location to discuss such issues would really be preferable. --Paul_012 (talk) 03:46, 27 January 2018 (UTC)

Slow-burn bot wars

Moved from WP:ANI#Slow-burn bot wars Primefac (talk) 15:44, 27 January 2018 (UTC)

Does anyone know why two bots edit war over which links to use for archived web refs? By way of example, the edit history of Diamonds Are Forever (novel) shows InternetArchiveBot and GreenC bot duking it out since September 2017. I've seen it on a couple of other articles too, but I can't be that bothered to dig them out. Although no real harm is done, it's mildly annoying when they keep cluttering up my watchlist. Cheers - SchroCat (talk) 14:17, 27 January 2018 (UTC)

That would have to be resolved by the bot owners, probably at Wikipedia:Bots/Noticeboard. NinjaRobotPirate (talk) 14:35, 27 January 2018 (UTC)

I added a {{cbignore}} (respected by both bots) until we figure it out. Notify us on our talk page or WP:BO is easiest. -- GreenC 15:40, 27 January 2018 (UTC)

This appears to be an issue with GreenC bot. IABot is repairing the archive link and the URL fragment, and GreenC bot is removing it for some reason.—CYBERPOWER (Chat) 16:55, 27 January 2018 (UTC)
GreenC bot gets the URL from the WebCite API as data authority - this is what WebCite says the archive is saved under. -- GreenC 17:35, 27 January 2018 (UTC)
GreenC bot could use the |url= as data authority, but most of the time it is the other way around where the data in |url= is truncated and the data from WebCite is more complete. Example, example. So I went with WebCite as being more authoritative since that is how it's saved on their system. -- GreenC 17:47, 27 January 2018 (UTC)
That's not the problem though. It's removing the fragment from the URL. It shouldn't be doing that.—CYBERPOWER (Chat) 18:15, 27 January 2018 (UTC)
It's not removing the fragment. It's synchronizing the URL with how it was saved on WebCite. If the fragment is not there, it's because it was never there when captured at WebCite, or WebCite removed it during the capture. The data authority is WebCite. This turns out to be a good method as seen in the examples because often the URL in |url= field is missing information. -- GreenC 20:20, 27 January 2018 (UTC)
I'm sorry, but that makes no sense. Why would WebCite, or any archiving service, save the fragment into the captured URL? The fragment is merely a pointer for the browser to go to a specific page anchor. IABot doesn't capture the fragments when reading URLs, but carries them through to archive URLs when adding them.—CYBERPOWER (Chat) 20:27, 27 January 2018 (UTC)
Why is IABot carrying the fragment through into the archive URL? It's not used by the archive (except archive.is in certain cases where the '#' is a '%23'). -- GreenC 21:26, 27 January 2018 (UTC)
Do you understand what the fragment is for? It's nothing a server ever needs to worry about, so it's just stripped on their end. It is a browser pointer. If the original URL had a fragment, attaching the same fragment to the archive URL makes sense so the browser goes straight to the relevant section of the page as it did in the original URL.—CYBERPOWER (Chat) 21:39, 27 January 2018 (UTC)
Yeah I know what a fragment does (though was temporarily confused I forgot they worked at other services). But fragments don't work with WebCite URLs. We tack the "?url=.." on for RFC long-URL reasons but it is dropped when doing a replay (example). So there is no inherent reason to retain fragments at WebCite. However.. I can see the logic to keep them for some future purpose we can't guess at. And since it's already been done, by and large. So I will see about modifying GreenC bot to retain the fragment for WebCite (it already does for other services).
There is the other problem as noted: IABot -> GreenCbot - any idea what might have caused it? -- GreenC 22:17, 27 January 2018 (UTC)
Well even if it is dropped, which it should do, it still doesn't change the fact the page anchors exist. I'll give you an example of what I mean.—CYBERPOWER (Chat) 22:23, 27 January 2018 (UTC)
The fragment is not the part after the ?, that is the query string. The fragment is the part after the #. --Redrose64 🌹 (talk) 22:24, 27 January 2018 (UTC)
  • @GreenC: here is what I'm trying to explain. Suppose you have the live URL with a fragment ([en.wikipedia.org]), which in this case goes to a section of the page above us. Suppose said original URL dies and IABot adds an archive URL. It will add the archive, and carry over the fragment, [web.archive.org], so that when a user clicks it, they are still taken to the relevant section of the page. If you dropped the fragment, either in the original or the archive, you will still get the same page, but the browser won't take the user straight to the relevant content that was originally being cited.—CYBERPOWER (Chat) 22:31, 27 January 2018 (UTC)
Yes I understand but it's different with WebCite URLs fragments don't work for reasons noted above. Try it: [www.webcitation.org] . Also on a different matter, what about this edit sequence? IABot -> GreenCbot -- GreenC 23:36, 27 January 2018 (UTC)
GreenC bot is now carrying through the fragment in-line with IABot per above. -- GreenC 00:47, 28 January 2018 (UTC)
Oh I see what you mean. The anchors don't actually work there, despite the fragment. In any event, IABot doesn't selectively remove them from WebCite URLs, as the fragment handling process happens during the archive adding process during the final stages of page analysis, when new strings are being generated to replace the old ones. I personally don't see the need to bloat the code to "fix" that, but then there's the question, what's causing the edit war?—CYBERPOWER (Chat) 00:52, 28 January 2018 (UTC)
GreenC bot is fixed so it won't strip the fragment there shouldn't be any more edit wars over it, but there are probably other edit wars over other things we don't know about. Not sure how to find edit wars. -- GreenC 04:31, 28 January 2018 (UTC)
Not sure how to find edit wars. Perhaps your bots could look at the previous edit to a page, and if it was made by its counterpart, log the edit somewhere for later analysis. It won't catch everything, and it might turn up false positives, but it's something. ​—DoRD (talk)​ 14:42, 28 January 2018 (UTC)
GreenC bot targets pages previous edited by IABot so there always overlap. -- GreenC 15:01, 28 January 2018 (UTC)
Maybe a pattern of the two previous edits being GreenC and IAbot? Galobtter (pingó mió) 15:09, 28 January 2018 (UTC)
And/or the edit byte sizes being the same.. but it would take a program to trawl through 10s of thousands of articles and 100s of thousands of diffs it wouldn't be trivial to create. But a general bot-war detector would be useful to have for the community. -- GreenC 15:18, 28 January 2018 (UTC)
  • Many thanks to all. I never knew this board existed (thus the original opening at ANI), but thanks to all for sorting this out. Cheers - SchroCat (talk) 09:53, 28 January 2018 (UTC)
  • It looks like the bots disagree over something else too, according to this history and this. Is this the same sort of issue? Thanks - SchroCat (talk) 10:00, 3 February 2018 (UTC)
For Isabella Beeton it looks like the bot war will continue. @Cyberpower678: do you know why IABot made this edit? [1] - it removed the "url=" portion from the URL. -- GreenC 22:46, 3 February 2018 (UTC)
For Moonraker (novel), IABot is double-encoding the fragment ie. %3A -> %253A (%25 is the code for %). Although this is in the |url= it is garbage data. So I manually changed the |url= [2] removing the double encoding, re-ran IABot and it reports modifying the link, but no diff shows up. -- GreenC 23:11, 3 February 2018 (UTC)
Based on the above there is garbage data and this is causing the bots to disagree. I believe GreenC bot data is accurate - it gets it from the WebCite API and there is no way it is wrong because it is used to create the snapshot with. The data in the |url= field used by IABot is problematic. Ideally IABot would also use the WebCite API. However failing that I can see trying to fix both the |archiveurl= and |url= fields which might keep IABot from reverting. -- GreenC 23:20, 3 February 2018 (UTC)
Update: Made two changes to WaybackMedic that should help some. 1. when modifying the WebCite URL also modify the |url= to match. This will keep IABot from reverting in some cases but not all. 2. log changes and when the same change occurs in the same article, it will notify me of a possible bot war. No idea how many this will be. Worse case I'll just add a {{cbignore}}. -- GreenC 16:46, 4 February 2018 (UTC)

Need someone with a mass rollback script now.

Would someone who has a mass rollback script handy please revert InternetArchiveBot's edits going all the way back to the timestamp in this diff? Kind of urgent. IABot destroyed roughly a thousand articles, due to some communication failure with Wikipedia.—CYBERPOWER (Chat) 22:29, 29 January 2018 (UTC)

 Done Nihlus 23:07, 29 January 2018 (UTC)
@Cyberpower678: When you say "destroyed", this means...? --Redrose64 🌹 (talk) 23:38, 29 January 2018 (UTC)
It deleted chunks of articles or stuffed chunks of it into the references section, by making massive references out of them.—CYBERPOWER (Chat) 23:40, 29 January 2018 (UTC)
That does not seem to be the case with all articles it edited. eg this change was OK [en.wikipedia.org] Graeme Bartlett (talk) 02:03, 30 January 2018 (UTC)
Most were bad, it was easier to just have that batch rolled back entirely. IABot will eventually pass over it again.—CYBERPOWER (Around) 02:19, 30 January 2018 (UTC)

I started an RfC regarding IABot

Please see Wikipedia:Village_pump_(proposals)#Disable_messages_left_by_InternetArchiveBot if interested.—CYBERPOWER (Be my Valentine) 19:55, 11 February 2018 (UTC)

ARBCOM: Amendment request: Magioladitis 2

There's an amendment request for the Magioladitis 2 case. Feel free to comment or not.Headbomb {t · c · p · b} 15:58, 16 February 2018 (UTC)

The request was declined and the discussion is now closed. Headbomb {t · c · p · b} 21:36, 20 February 2018 (UTC)

KolbertBot

I have some minor concerns about KolbertBot (talk · contribs) editing outside of the scope set forth at Wikipedia:Bots/Requests for approval/KolbertBot as it is editing other editor's comments that contain an http link on Template:Did you know nominations/ subpages ([3] [4] [5]). I personally don't feel the bot should be altering these without approval (as it was never approved for discussion type pages). This can mainly be attributed to the weird DYK nomination process, but I feel it should be discussed before the bot is allowed to continue. I asked Jon Kolbert to bring it up for discussion or to stop editing these pages but was essentially ignored and told "it's better for the readers", so I am bringing it up myself. Nihlus 19:34, 20 February 2018 (UTC)

I don't see anything in the BRFA that restricts KolbertBot from editing DYK nominations. It's specifically listed as allowed to edit in the mainspace and template space. I suppose you could argue that this wasn't the intention, but either way I can't find any attempt at discussing this with the bot op as outlined in WP:BOTISSUE, so I suggest that venue first.
That being said, I don't see the issue with making discussions URL point to the correct http/https protocols, nor is this something that would warrant putting KolbertBot on hold while this is being resolved, assuming consensus is against KolbertBot making those edits in the DYK space. The bot is breaking nothing, at best you have a very mild annoyance. Headbomb {t · c · p · b} 21:33, 20 February 2018 (UTC)
It was discussed elsewhere, as I have already stated. Bots should not edit outside their purview and if others have concern then they should be addressed. I don't think the bot should be editing other people's comments, regardless, as it was not approved and not something detailed in its BRFA. If he wants to extend the scope of his bot's actions, then he should get it approved like the rest of us would normally do. I only brought it here because I was ignored in my direct request. Nihlus 22:15, 20 February 2018 (UTC)
The BRFA specifically says to change http->https in mainspace and template namespaces. Your examples are from the template namespace (primarily because nobody wants to fix this legacy process, these discussions aren't templates and don't really belong there at all, but it is an entrenched process). Now is this slightly out of the intent - sure I guess. And asking @Jon Kolbert: to update his code to skip pages starting with Template:Did you know nominations/ seems reasonable. Jon, can this be easily accommodated? — xaosflux Talk 22:33, 20 February 2018 (UTC)
Yes, and that was my only request. I understand what the template namespace is, but I doubt the oddly placed DYK nominations were considered to be "templates", and I don't believe they should be. There is a reason this bot is not running in the Talk or Project namespaces. Nihlus 22:36, 20 February 2018 (UTC)
Nihlus commented on IRC that they thought it wasn't appropriate for KolbertBot to be editing past DYK noms and it wasn't approved. I resoonded by saying it was approved by being in Template namespace, but not as the result of a BRFA specifically dedicated to editing them. Nihlus suggested I should opt to just skip them to which I disagreed, the reasoning being that anyone who refers to past DYKs would benefit by having HTTPS links. I suggested that if they were still opposed to it, to start a discussion so there's consensus to edit those pages or consensus to not edit those pages. I think it's beneficial because not only does it secure more links, some sites switch around their URL format and having those updates made by KolbertBot helps prevent linkrot. To be clear, I'm open to either option (whichever gets consensus), but I prefer continuing to modify the links. Skipping them would not be hard to implement. Jon Kolbert (talk) 23:13, 20 February 2018 (UTC)
@Jon Kolbert: thanks for the note, as these are de facto non-templates it would be best to skip them - if it is not a difficult code change for you. Those pages are unlikely to be seen by most "readers". — xaosflux Talk 23:16, 20 February 2018 (UTC)
@Xaosflux: If that's the desired behaviour, sure. I'll add an exception when I'm back on my desktop. Jon Kolbert (talk) 01:28, 21 February 2018 (UTC)
Thanks Jon Kolbert, if your task changes to be "all namespaces" or the like in the future these can be done again. (note: I have always hated those being in ns-10, screws up lots of things! — xaosflux Talk 02:18, 21 February 2018 (UTC)