Leiden Weibo Corpus
- Home
Welcome to the home page of the Leiden Weibo Corpus, which consists of 5,103,566 messages posted on
Sina Weibo
in January 2012.
Sina Weibo is China's most popular
microblogging service
, and its 300 million users post 100+ million messages a day. This corpus was designed to make it easier to explore the wealth of data these users generate. If you're interested, you can read about
how the corpus was built
, see which words are
most frequently used on Sina Weibo
, look at
a few random messages
or a
map representation of our data
- but you can also go right ahead and use the search functionality below to start exploring.
Enjoy, and if you have any questions, suggestions for improvements, or comments, please feel free to
get in touch
.
Messages
Message ID
Word
Grammar (
help
)
Region
Any
Běijīng 北京
Tiānjīn 天津
Héběi 河北
Shānxī 山西
Nèi Ménggǔ 内蒙古
Liáoníng 辽宁
Jílín 吉林
Hēilóngjiāng 黑龙江
Shànghǎi 上海
Jiāngsū 江苏
Zhèjiāng 浙江
Ānhuī 安徽
Fújiàn 福建
Jiāngxī 江西
Shāndōng 山东
Hénán 河南
Húběi 湖北
Húnán 湖南
Guǎngdōng 广东
Guǎngxī 广西
Hǎinán 海南
Chóngqìng 重庆
Sìchuān 四川
Guìzhōu 贵州
Yúnnán 云南
Xīzàng 西藏
Shǎnxī 陕西
Gānsù 甘肃
Qīnghǎi 青海
Níngxià 宁夏
Xīnjiāng 新疆
Táiwān 台湾
Xiānggǎng 香港
Àomén 澳门
Qítā 其他
Hǎiwài 海外
Unknown (SW code: 0)
Unknown (SW code: 1033)
Unknown (SW code: 1035)
Unknown (SW code: 1044)
Unknown (SW code: 1100)
City
Any
Gender
Both
Male
Female
Lexical data
Single word
Word
Advanced
Meaning
Beginning with
Ending in
Page generated in 0.0004 seconds. [
Home
] [
About
] [
Help
] [
Open access
] [
Legal & privacy
] [
Powered by
] [
Contact us
]