ויקיפדיה:מסדר ניקיון/מיזם Tidy

Here is a classification of diffs into different categories along with example titles, detailed description where useful, and proposed resolutions (to be filled out in all cases).

Self-closing tags like <b/> <div/>, etc. עריכה

Tidy strips self-closing tags like <b/>, <div/> but a HTML5 parser treats them as <b>, <div>, etc.

Sample searches to find these on your wiki:

  1. To find <b/> paste this into the search box: insource:/\<b\/\>/
  2. To find <p/> paste this into the search box: insource:/\<p\/\>/
  3. To find <td/> paste this into the search box: insource:/\<td\/\>/
  4. To find <div/> paste this into the search box: insource:/\<div\/\>/ (Note that this will find only "empty" divs.)
  5. To find <span/> paste this into the search box: insource:/\<span\/\>/ (Note that this will find only "empty" spans.)

For a long list of many more searches that have yielded results on the English Wikipedia, expand the box below. Note that regular expression searches do not always return reliable results. See Phabricator bug phab:T106685 for details.

  1. insource:/\<\/*b *\/\>/
  2. insource:/\<\/*big *\/\>/
  3. insource:/\<\/*blockquote *\/\>/
  4. insource:/\<\/br *>/
  5. insource:/\<\s*\/*\s*b r\s*\|*\s*\/*\s*\>/
  6. insource:/\<\s*\/*\s*br\s*[\\\?]+\s*\/*>/
  7. insource:/\<\s*\/*\s*br\s*[\|\/\?\.]+\s*\/\>/
  8. insource:/\<\s*\/\s*br[\s\|\/\?\.]*\>/
  9. insource:/\<\s*\<\/*\s*br\s*\/*>/
  10. insource:/\<\/*center *\/\>/
  11. insource:/\<\/*del *\/\>/
  12. insource:/\<\/*div *\/\>/
  13. insource:/(\<div class=\"[a-zA-Z0-9_:; #%\-]+\" *)\/\>/
  14. insource:/(\<div id=[\'\"][ça-zA-Z0-9\-_ ]+[\'\"] +style=\"[a-zA-Z0-9_:; #%\-]+[\'\"] *)\/\>/
  15. insource:/(\<div id=[\'\"]*[ça-zA-Z0-9\-_ ]+[\'\"]* *)\/\>/
  16. insource:/(\<div style=\"[a-zA-Z0-9_:; #%\-]+\" *)\/\>/
  17. insource:/\<font *\/\>/
  18. insource:/\<font color=\"[a-z ]+\"*\/\>/
  19. insource:/\<font style=\"*[a-z ]+\"*\/\>/
  20. insource:/\<\/*h1 *\/\>/
  21. insource:/\<\/*h2 *\/\>/
  22. insource:/\<\/*h3 *\/\>/
  23. insource:/\<\/*h4 *\/\>/
  24. insource:/\<\/*h5 *\/\>/
  25. insource:/\<\/*i *\/\>/
  26. insource:/\<\/*p *\/\>/
  27. insource:/(\<p id=[\'\"][ça-zA-Z0-9\-_ ]+[\'\"] *)\/\>/
  28. insource:/\<\/*s *\/\>/
  29. insource:/\<\/*small *\/\>/
  30. insource:/\<\/span *\/\>/
  31. insource:/(\<span class\s*=\s*\"*[ça-zA-Z0-9\-_ ]+\"* *)\/\>/
  32. insource:/(\<span id\s*=\s*\"*[ça-zA-Z0-9\-_ \(\)\–\.\,\:\&\'\"\;\/\%\!]+\"* *)\/\>/
  33. insource:/\<span *\/\>/
  34. insource:/\<span style=\"color\"*\/\>/
  35. insource:/\<\/*strike *\/\>/
  36. insource:/\<\/*sub *\/\>/
  37. insource:/\<\/*sup *\/\>/
  38. insource:/\<\/*td *(colspan=\d+)\/\>/
  39. insource:/\<\/*td *\/\>/
  40. insource:/(\<td style=\"[a-zA-Z0-9_:; #%\-=]+\" *)\/\>/
  41. insource:/\<\/*th *\/\>/
  42. insource:/\<\/*tr *\/\>/
  43. insource:/\<\/*u *\/\>/
Example Titles Detailed description for the example titles Proposed resolution
enwiki:Horse

dewiki:Nachwachsender Rochstoff

Self-closing bold tags "<b/>" were used in the {{hands}} template as wikitext syntax modifiers, for example to prevent interpretation of punctuation at the start of the line. HTML 5 specifies that "<b/>" is to be treated the same as "<b>", but with a parse error emitted, whereas Tidy treats them as empty elements and removes them. So in Depurate, bold formatting ran on to the end of the article. The standard solution is <nowiki/>, but the author chose <b/> instead in order to reduce the post-expand include size. Further discussion: w:User talk:Wikid77#Empty_bold_tags

Same with the dewiki page that has a <b/> in the first paragraph of the article. It serves no purpose and can be safely deleted.

Update wikitext
enwiki:Villafranchian,

ruwiki pages using Template:Автомобиль (Ex: ruwiki: Renault Espace)

A self-closing div tag in the middle of {{Neogene ELMA}} is interpreted as <div> instead of <div></div>, causing the remainder of the article to move inside the timeline box.

Same with the ruwiki template

Update wikitext?
enwiki:2016 Malaysia Premier League and possibly other pages Use of <div id="Perlis v ATM"/> and many other divs like that causes content following that section to be swallowed into the <div> tags.
Most of the itwikisource pages that are showing diffs Use of <span class="interwiki-info" id="el" title="(orig.)" style="display:none;" /> in {{IncludiIntestazione}} which is stripped by Tidy and not in HTML5depurate accounts for the large rendering diffs.

Trailing whitespace migration from inline tags like <span>, <b>, etc. עריכה

Tidy migrates trailing whitespace out of inline tags like <span>, <b>, etc. to outside the tag but this is broken Tidy behavior. A HTML5 parser will not do this.

Example Titles Detailed description for the example titles Proposed resolution
ruwiki:Миллер, Боде (Bode Miller) and lots of other ruwiki pages The template {{Обладатели Кубка мира по горнолыжному спорту в общем зачёте}} is displayed with a width of 7400 pixels due to it consisting of a series of adjacent spans with "white-space: nowrap". There is a space at the end of each span's contents, which tidy moves outside the span, allowing the browser to break the line. With Depurate, the space is not moved, so the line is not broken. Update wikitext
itwikivoyage:Avezzano and possibly others The Tidy version has "</span></a></span></b> <span" and the HTML5 depurate version has "</span></a> </span></b><span". That accounts for the red diffs running down the page in the lower-right quarter of the upright diff.

Wikitext markup errors עריכה

Ex: Unclosed tables; Nested tables in fosterable position; <small>…<small> instead of <small>…</small>, etc. These are fixed up differently by Tidy and HTML5Depurate. There is nothing to do in HTML5depurate. The obvious fix here is to fix up the affected templates and pages.

Example Titles Detailed description for the example titles Proposed resolution
eswiki:Bob Esponja There is an unclosed table that then runs into a new section with another table. Update wikitext
enwiki:2015-16 Odense Bulldogs season,

enwiki:2015-16 ABA League

Unclosed <small> tags in http://en.wikipedia.org/w/index.php?title=Template:2015%E2%80%9316%20Metal%20Ligaen%20table

Unclosed <small> tag in http://en.wikipedia.org/w/index.php?title=2015–16_ABA_League&action=edit&section=6

Update wikitext
Pretty much all the various svwiki page diffs.

Ex:svwiki:Kugelstein

Template:klimatöversikt has a HTML table, and all the svwiki pages have markup of the form
{| border="1"
{{klimatöversikt
...
}}
|}
This markup is parsed as 2 separate tables in a HTML5 parser since the inner <table> is in fosterable position whereas Tidy fixes up the HTML differently and introduces nested tables.
?
itwiki:Juventus_Football_Club_1982-1983 and several other itwiki sports pages Template https://it.wikipedia.org/w/index.php?title=Template:Incontro_di_club has a nested table in fosterable position causing different fixups in Tidy and HTML5 parsers.
ruwiki:Флатт,_Рэйчел (Rachael Flatt), ruwiki:Сабликова, Мартина (Martina Sáblíková) Navbox templates such as {{Чемпионы мира по фигурному катанию среди юниоров}} use {{nowrap begin}}/{{nowrap end}}, with items delimited with {{·w}}, which theoretically should break the nowrap span with a space outside the nowrap section. But doBlockLevels() inserts a misnested paragraph tag which starts inside the first nowrap span, and ends inside the last nowrap span. Tidy fixes this by splitting the spans, whereas depurate moves the whole paragraph inside the first nowrap span, causing the whole list to be nowrapped. remove div tag around nowrap begin/end
enwiki:Wildcat (comics) An unclosed <i> in a heading causes the contents of every block starting from halfway through the TOC to be wrapped in a separate <i>, thanks to AFE reconstruction.

P-tags wrapping newlines עריכה

<p> tags wrapping newlines in the HTML5 depurate but stripped by Tidy cause minor whitespace margin diffs and seems to be a source of a lot of noise in visual diff output.

NOTE for editors: HTML5depurate is taking care of this automatically right now. Eventually we might remove this compatibility pass, but this won't be an issue in the initial Tidy removal rollout.

Example Titles Detailed description for the example titles Proposed resolution
frwiki:Kefteji and many others There are 3 <p>\n</p> tags before the image that is showing up in the visual diff image.

display:inline list wrapping diffs because of inter-element white-space (possibly a concern in other rendering scenarios?) עריכה

In the Tidy version, there seem to be \n chars between </li> and the next <li>. In the HTML5depurate version, in some cases, they are not present. This seems to cause rendering differences in wrapping of lists.

Example Titles Detailed description for the example titles Proposed resolution
ptwiki:José Serra and not sure if there are other pages that are affected similarly. The last line with portals is a list where every element is styled as 'display:inline'. The Tidy version has "</li>\n<li…". The HTML5depurate version has "</li><li…" This missing newline causes the list to render as a long list which causes the entire page to flow and render differently causing larger visual diffs. This could potentially be a concern in other scenarios.

Looks like http://pt.base.wikitextexp.wmflabs.org/w/index.php?title=Template:Portal3&action=edit is the template that generates this list. It doesn't looks like there should be a newline rendered between list items but Tidy is adding the newlines on its own causing rendering diffs.

This looks like this is a Tidy bug and requires fixing pages / templates that rely on this behavior. See Phab:T74416 for an example where an editor fixed a template on frwiki

שריקת הפתיחה עריכה

שגיאות עריכה

  • אפשרויות קובץ שגויות (3,971 שגיאות)
  • תג טבלה שאמור להימחק (102 שגיאות)
  • תוכן מאומץ (204 שגיאות)
  • תג שמקוננים באופן שגוי (2,374 שגיאות)
  • תג מסיים חסר (69,122 שגיאות)

אזהרות עריכה

  • תגי HTML מיושנים (55,830 שגיאות)
  • מעקף באג גלישת פסקה (2 שגיאות)
  • תגים שסוגרים את עצמם (3 שגיאות)
  • תגים ערומים (18,435 שגיאות)

הצעות לבוטים עריכה

תמונות עריכה

  1. מספר טבעי קטן מ-800 -> להוסיף px
  2. לא ממוזער -> למחוק
  3. ממוזער פעמיים -> להוריד פעם אחת
  4. ריק -> למחוק
  5. תמונה -> למחוק
  6. משמאל -> שמאל
  7. px ואחריו מספר -> להעביר לצד השני
  8. ממורכז -> מרכז
  9. אמצע -> מרכז
  10. מספר ואחריו pt -> להחליף להיות אותו מספר עם px
  11. קישור -> למחוק
  12. מימין -> ימין
  13. מספר ואחריו x -> להחליף לאותו מרכז עם px
  14. רגיל -> ממוזער
  15. מספר ואחריו px לא אותיות קטנות כולו -> להעביר לקטנות
  16. שמאל פעמיים -> להוריד פעם אחת
  17. ימין פעמיים -> להוריד פעם אחת
  18. left - שמאל
  19. right - ימין
  20. center - מרכז
  21. קובץ - למחוק
  22. שמאלה -> שמאל
  23. ימינה -> ימין
  24. ממוסגר -> ממוזער
  25. מספר ואחריו p -> אותו מספר עם px
  26. upleft -> שמאל
  27. alt -> למחוק
  28. ממוזער ואחריו טקסט -> להכניס ביניהם פייפ
  29. גבול -> ממוזער
  30. כיתוב תמונה -> למחוק
  31. יישור= משהו -> להפוך להיות משהו
  32. upright -> ימין
  33. default -> לא יודע
  34. לשונית עריכה -> לא יודע
  35. אשף תבניות, אשף התבניות -> לא יודע
  36. noicon -> לא יודע

הערות עריכה