What's hot in databases? (updated for 2012)

Trending topics in VLDB, via key words in the titles of publications from 2000 to 2012. I did a quick and dirty job of removing stop words and stemming. If you want, you can also download the sqlite database with all of the data.

Thanks to @samrmadden for the 2000 - 2011 titles.

Most popular keywords of all time

Thees trends are based on the most popular keywords across all years of VLDB publications

Topics such as efficiency, large, indexing, and optimization have been increasing in popularity in the past few years.

What keywords have not been doing as well? We've all heard about how XML is dead, and it certainly is by paper count measures. However so are traditionally database topics such as management, information, and relational. Also, it seems that technology and xml, are taking a downturn, as is relational -- is this the turning point for NoSQL? Strange to see OLAP slowing down, given the popularity of large data these days.

data
39,27,41,44,50,60,71,49,45,58,65,48,60,60,39
querying
10,12,15,19,11,29,31,37,35,33,35,19,31,29,23
databases
20,12,19,15,19,16,30,24,16,16,19,18,26,18,19
systems
3,8,6,12,14,10,20,18,11,15,17,8,8,16,7
efficient
2,2,4,5,6,7,8,18,15,16,15,13,11,13,12
based
7,4,6,6,4,13,14,14,9,5,10,10,14,15,14
xml
0,4,4,15,12,21,17,21,14,9,11,4,7,2,2
web
2,5,9,16,11,13,8,7,4,5,12,15,8,14,6
processing
3,2,5,10,4,9,10,8,10,10,12,6,15,10,6
streaming
0,0,1,1,7,13,19,12,9,10,8,10,11,3,6
indexing
2,7,5,7,6,7,7,7,10,9,8,8,10,8,8
searching
4,1,1,2,9,7,7,8,5,10,16,6,17,8,7
optimization
6,7,3,2,6,7,8,5,11,6,8,6,9,8,12
management
3,3,5,8,6,8,14,8,5,9,8,6,3,8,2
mining
8,3,6,3,2,7,6,4,6,4,4,7,6,3,6
large
5,2,1,5,3,5,6,5,3,4,6,4,8,9,7
networks
1,3,2,1,1,4,5,8,6,3,5,10,4,9,4
information
3,2,9,5,5,3,5,5,4,5,3,2,7,4,3
integration
1,3,6,2,3,6,5,3,5,6,8,6,5,2,1
relational
2,3,5,5,5,7,9,5,2,3,1,5,4,1,5

Trending keywords

Let's now look at keywords that have burst into the scene in the past few years. These keywords are selected by computing the ratio of "the average number of times a keyword is used since 2009" by "the average before 2009".

It makes sense that words like mapreduce, social, and cloud have made a splash, but I was suprised that graphs and subgraphs have been consistently increasing in popularity in the past half a decade. As machine learning becomes more and more integrated into databases probabilistic approaches have been gaining lots of traction. Similarly, probabilistic approaches to consesus and failures distributed settings are promising. As always, being fast is what makes the big bucks.

Finally it's nice to see that crowdsourcing has finally come to the VLDB community

mapreduce
0,0,0,0,0,0,0,0,0,0,0,4,5,9,9
graphs
0,0,1,0,1,0,0,3,1,3,7,8,11,15,14
cloud
0,0,0,0,0,0,0,0,0,0,1,1,6,3,3
uncertain
0,0,0,0,0,0,1,2,2,2,6,1,8,7,4
aware
0,0,0,0,0,0,1,0,2,3,2,2,9,4,3
entity
0,0,0,1,0,0,0,1,0,1,1,3,6,3,3
socially
0,0,1,0,0,0,0,0,0,1,2,4,1,7,3
differential
0,0,0,0,0,0,0,0,0,0,0,1,1,1,4
probabilistically
0,1,0,0,1,0,4,1,1,4,5,2,7,4,7
shortest
0,0,0,0,0,0,0,0,0,0,0,0,1,1,3
subgraphs
0,0,0,0,0,0,0,1,0,0,2,0,1,2,4
partitioning
0,0,0,0,0,1,1,0,1,1,1,0,2,2,4
mechanism
0,0,0,0,0,0,0,1,0,0,0,0,0,0,4
hybrid
0,1,0,1,1,0,0,0,0,0,0,1,0,2,3
locality
0,0,1,0,0,2,0,0,1,1,1,1,0,3,3
crowdsourcing
0,0,0,0,0,0,0,0,0,0,0,0,0,1,2

Top topics by year

It's good to see that data, query and databases are never far from our minds.

2012
querying23
databases19
data17
based14
efficient12
optimization12
graphs10
mapreduce9
indexing8
large7
2011
data40
querying29
databases18
systems16
based15
web14
efficient13
graphs13
processing10
large9
2010
querying31
data30
databases26
searching17
processing15
based14
efficient11
streaming11
graphs10
indexing10
2009
data27
querying19
databases18
web15
efficient13
based10
networks10
streaming10
graphs8
indexing8
2008
data44
querying35
databases19
systems17
searching16
efficient15
processing12
web12
xml11
based10
2007
data41
querying33
databases16
efficient16
systems15
processing10
searching10
streaming10
indexing9
management9
2006
querying35
data26
databases16
efficient15
xml14
optimization11
systems11
indexing10
processing10
based9
2005
querying37
data25
databases24
xml21
efficient18
systems18
based14
streaming12
patterns10
management8
2004
data40
querying31
databases30
systems20
streaming19
xml17
based14
management14
processing10
relational9
2003
data43
querying29
xml21
databases16
based13
streaming13
web13
systems10
processing9
storage9
2002
data31
databases19
systems14
xml12
querying11
web11
searching9
services7
streaming7
structure7
2001
data29
querying19
web16
databases15
xml15
systems12
processing10
management8
cache7
indexing7
2000
data21
databases19
querying15
information9
web9
based6
integration6
mining6
systems6
application5
1999
data15
databases12
querying12
systems8
indexing7
optimization7
architecture5
high5
implementing5
web5
1998
databases20
data16
querying10
mining8
based7
algorithms6
optimization6
joins5
large5
performance5