![]() Along with inflections (Any verb contains at least four 'words') - I think these should be listed out in a list of "all existing words". The number would also depend on whether one considers homonyms to be separate words: Would "bow" be listed once, or five times? (as the meaning is not to be considered). Location: Helsinki, Southern Finland Province, Finland we can easily target at 3 million unique words. Adding Latin (think only of all Flora&Fauna designations) or Japanese (untranslatable/unique) culture terms or. ![]() Webster's Third New International Dictionary, Unabridged, together with its 1993 Addenda Section, includes some 470,000 entries. If we talk of main-entries (of any huge Dictionary) we can hardly reach 200,000 mark, inflected forms are still not so many. Of course (arbitrarily) we can say that 3/4 are not to be regarded as strong candidates, still 3 million words remain. By the way, distinct words used in English Wikipedia are about 12,000,000. My amateurish (non-lexicographical) experience tells me that the number is rather 2 million. Guys, you did miss the key phrase 'all existing' in thread's name! AFAIK, Oxford specialists told us the words in English are in range 500,000-750,000 (with several stipulations). There are a number of such lists that draw from various corpora, so a search on "word frequency" might also be useful for someone seeking to build a practical vocabulary in English. edu top-level domain and order the words by frequency of occurrence. Another approach would be to examine the displayed text of web sites in the. The most practical scope would be to derive the corpus from published writing, and to pre-select the publications according to the times and readerships one wishes to study. Some sort of bias and filtering is therefore necessary to compile a list that is manageable, let alone useful. As others have pointed out, any such list is necessarily incomplete the moment it is published. In another thread, the collection known as "Google Books" has also been mentioned as a text corpus. If you need the old lists for some reason, they are still available here.Another search term that might yield results is "corpus". I added new word lists from blog, usenet, w3c, and wikitionary data. In the process of fixing this, I removed the American national and the 20 newsgroups word lists. Previously I used 10 word lists, but several had problems that caused some common words like "and" and words with apostrophes not to appear in the intersection involving 9 or 10 of the lists. Update: In March 2018 I updated the words lists. ![]() ICWSM 2009 blog corpus (top 400K words)īy varying the number of lists a word must appear in (from 1 to 12), I got word lists of varying size and "quality".Westbury Lab Usenet corpus (top 400K words). ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |