This document is an exploratory analysis of all accepted short papers, full papers, and posters at the AGILE conference 2018, in Lund, Sweden - “Geospatial Technologies for All”.

The analysis is based on the work published in “Reproducible research and GIScience: an evaluation using AGILE conference papers” (see preprint and code repository). You can see the source code of the analysis in the source R Markdown file agile-2018-papers.Rmd.

Data preparation

The PDFs are read in from three seperate directories. The full papers are not openly available. The short papers and posters can be downloaded from the conference website.

The following table gives the numbers of used documents split up by type.

type count
full 19
poster 32
short 74

The text is extracted from PDFs and it is processed to create a tidy data structure without stop words. The stop words include specific words, such as lund, which is included in the page header, abbreviations, and terms particular to scientific articles, such as figure.

About 49 % of the words are considered stop words.

The following tables shows how many non-stop words each document has, sorted by number of non-stop words. The id is built from the file name plus a prefix: for full papers, it is the significant part (the last 3 numbers) of the DOI and the prefix fp_; for short papers and posters, it is the submission number included in the file name and the prefixes sp_ and po_ respectively.

id type non-stop words
fp_92 full 4820
fp_910 full 4575
fp_97 full 4355
fp_99 full 4243
fp_98 full 3641
fp_915 full 3586
fp_913 full 3496
fp_916 full 3431
fp_912 full 3374
fp_914 full 3334
fp_91 full 3123
fp_917 full 3019
fp_919 full 3015
fp_911 full 2965
fp_93 full 2787
fp_94 full 2736
fp_918 full 2643
fp_96 full 2596
fp_95 full 2570
sp_121 short 2406
sp_152 short 2373
sp_41 short 2290
sp_122 short 2274
sp_128 short 2218
sp_171 short 2175
sp_93 short 2044
sp_139 short 1953
sp_172 short 1933
sp_168 short 1881
sp_126 short 1876
po_70 poster 1870
sp_104 short 1866
sp_125 short 1854
sp_145 short 1834
sp_106 short 1816
sp_151 short 1814
sp_147 short 1811
sp_134 short 1809
sp_89 short 1794
sp_112 short 1787
po_170 poster 1755
sp_173 short 1751
sp_53 short 1739
sp_65 short 1730
sp_107 short 1703
sp_75 short 1702
sp_109 short 1699
sp_69 short 1697
sp_131 short 1685
sp_76 short 1673
sp_149 short 1670
sp_160 short 1669
sp_130 short 1666
sp_63 short 1657
sp_123 short 1652
sp_74 short 1633
sp_132 short 1631
sp_108 short 1630
sp_71 short 1630
sp_124 short 1608
sp_59 short 1599
sp_66 short 1594
sp_135 short 1576
sp_165 short 1569
sp_80 short 1563
sp_150 short 1559
sp_142 short 1553
sp_51 short 1540
sp_146 short 1531
sp_72 short 1527
sp_118 short 1526
po_48 poster 1516
sp_163 short 1509
sp_94 short 1509
sp_64 short 1500
sp_55 short 1460
sp_101 short 1453
sp_86 short 1453
sp_68 short 1441
sp_111 short 1425
sp_103 short 1420
po_57 poster 1411
sp_143 short 1400
sp_137 short 1391
sp_98 short 1364
sp_87 short 1362
sp_2018 short 1347
sp_67 short 1325
sp_114 short 1314
po_52 poster 1289
sp_2 short 1278
sp_140 short 1274
sp_84 short 1273
sp_73 short 1199
sp_157 short 1173
sp_153 short 1146
po_82 poster 1113
sp_166 short 1023
po_162 poster 971
po_81 poster 940
po_58 poster 931
sp_148 short 921
po_155 poster 903
po_99 poster 902
po_62 poster 869
po_105 poster 810
po_120 poster 794
po_136 poster 750
po_0605 poster 747
po_115 poster 711
po_161 poster 709
po_54 poster 708
po_164 poster 703
po_127 poster 701
po_102 poster 698
po_90 poster 670
po_144 poster 660
po_91 poster 618
po_77 poster 610
po_83 poster 598
po_129 poster 562
po_167 poster 558
po_110 poster 538
po_96 poster 516
po_49 poster 474
Total 212644

Summary statistics of non-stop words of all documents:

id type non-stop words
Length:125 full :19 Min. : 474
Class :character poster:32 1st Qu.:1199
Mode :character short :74 Median :1576
Mean :1701
3rd Qu.:1866
Max. :4820

Word clouds and top words

The following word cloud is based on 347 unique words occuring each at least 100 times, all in all occuring 73243 times which comprises 34 % of non-stop words.

For the following word cloud, the word stems were extracted based on a stemming algorithm from package quanteda.

License & Metadata

This document is licensed under a Creative Commons Attribution 4.0 International License. All contained code is licensed under the Apache License 2.0.

Runtime environment description:

devtools::session_info(include_base = TRUE)
## Session info -------------------------------------------------------------
##  setting  value                       
##  version  R version 3.4.4 (2018-03-15)
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language en                          
##  collate  en_US.UTF-8                 
##  tz       Europe/Berlin               
##  date     2018-06-13
## Packages -----------------------------------------------------------------
##  package      * version date       source        
##  assertthat     0.2.0   2017-04-11 CRAN (R 3.4.0)
##  backports      1.1.2   2017-12-13 CRAN (R 3.4.3)
##  base         * 3.4.4   2018-03-16 local         
##  bindr          0.1.1   2018-03-13 CRAN (R 3.4.4)
##  bindrcpp     * 0.2.2   2018-03-29 CRAN (R 3.4.4)
##  broom          0.4.4   2018-03-29 CRAN (R 3.4.4)
##  cellranger     1.1.0   2016-07-27 CRAN (R 3.4.0)
##  cli            1.0.0   2017-11-05 CRAN (R 3.4.2)
##  colorspace     1.3-2   2016-12-14 cran (@1.3-2) 
##  compiler       3.4.4   2018-03-16 local         
##  crayon         1.3.4   2017-09-16 CRAN (R 3.4.1)
##  data.table     1.11.4  2018-05-27 CRAN (R 3.4.4)
##  datasets     * 3.4.4   2018-03-16 local         
##  devtools       1.13.5  2018-02-18 CRAN (R 3.4.3)
##  digest         0.6.15  2018-01-28 CRAN (R 3.4.3)
##  dplyr        * 0.7.5   2018-05-19 CRAN (R 3.4.4)
##  evaluate       0.10.1  2017-06-24 CRAN (R 3.4.0)
##  fastmatch      1.1-0   2017-01-28 CRAN (R 3.4.4)
##  forcats      * 0.3.0   2018-02-19 CRAN (R 3.4.3)
##  foreign        0.8-70  2018-04-23 CRAN (R 3.4.4)
##  ggplot2      * 2.2.1   2016-12-30 CRAN (R 3.4.2)
##  glue           1.2.0   2017-10-29 CRAN (R 3.4.2)
##  graphics     * 3.4.4   2018-03-16 local         
##  grDevices    * 3.4.4   2018-03-16 local         
##  grid         * 3.4.4   2018-03-16 local         
##  gridBase     * 0.4-7   2014-02-24 CRAN (R 3.4.0)
##  gridExtra    * 2.3     2017-09-09 CRAN (R 3.4.1)
##  gtable         0.2.0   2016-02-26 CRAN (R 3.4.0)
##  haven          1.1.1   2018-01-18 CRAN (R 3.4.3)
##  here         * 0.1     2017-05-28 CRAN (R 3.4.4)
##  highr          0.7     2018-06-09 CRAN (R 3.4.4)
##  hms            0.4.2   2018-03-10 CRAN (R 3.4.3)
##  htmltools      0.3.6   2017-04-28 CRAN (R 3.4.0)
##  httr           1.3.1   2017-08-20 CRAN (R 3.4.1)
##  janeaustenr    0.1.5   2017-06-10 cran (@0.1.5) 
##  jsonlite       1.5     2017-06-01 cran (@1.5)   
##  kableExtra   * 0.9.0   2018-05-21 CRAN (R 3.4.4)
##  knitr          1.20    2018-02-20 CRAN (R 3.4.3)
##  lattice        0.20-35 2017-03-25 CRAN (R 3.3.3)
##  lazyeval       0.2.1   2017-10-29 CRAN (R 3.4.2)
##  lubridate      1.7.4   2018-04-11 CRAN (R 3.4.4)
##  magrittr       1.5     2014-11-22 CRAN (R 3.4.0)
##  Matrix         1.2-14  2018-04-09 CRAN (R 3.4.4)
##  memoise        1.1.0   2017-04-21 CRAN (R 3.4.3)
##  methods      * 3.4.4   2018-03-16 local         
##  mnormt         1.5-5   2016-10-15 cran (@1.5-5) 
##  modelr         0.1.2   2018-05-11 CRAN (R 3.4.4)
##  munsell        0.4.3   2016-02-13 cran (@0.4.3) 
##  nlme           3.1-137 2018-04-07 CRAN (R 3.4.4)
##  parallel       3.4.4   2018-03-16 local         
##  pdftools     * 1.8     2018-05-27 CRAN (R 3.4.4)
##  pillar         1.2.3   2018-05-25 CRAN (R 3.4.4)
##  pkgconfig      2.0.1   2017-03-21 cran (@2.0.1) 
##  plyr           1.8.4   2016-06-08 cran (@1.8.4) 
##  psych          1.8.4   2018-05-06 CRAN (R 3.4.4)
##  purrr        * 0.2.5   2018-05-29 CRAN (R 3.4.4)
##  quanteda     * 1.3.0   2018-06-05 CRAN (R 3.4.4)
##  R6             2.2.2   2017-06-17 CRAN (R 3.4.0)
##  RColorBrewer * 1.1-2   2014-12-07 cran (@1.1-2) 
##  Rcpp           0.12.17 2018-05-18 CRAN (R 3.4.4)
##  RcppParallel   4.4.0   2018-03-02 CRAN (R 3.4.4)
##  readr        * 1.1.1   2017-05-16 CRAN (R 3.4.0)
##  readxl         1.1.0   2018-04-20 CRAN (R 3.4.4)
##  reshape2       1.4.3   2017-12-11 CRAN (R 3.4.3)
##  rlang          0.2.1   2018-05-30 CRAN (R 3.4.4)
##  rmarkdown      1.9     2018-03-01 CRAN (R 3.4.3)
##  rprojroot      1.3-2   2018-01-03 CRAN (R 3.4.3)
##  rstudioapi     0.7     2017-09-07 CRAN (R 3.4.1)
##  rvest          0.3.2   2016-06-17 CRAN (R 3.4.2)
##  scales         0.5.0   2017-08-24 CRAN (R 3.4.1)
##  slam           0.1-43  2018-04-23 CRAN (R 3.4.4)
##  SnowballC      0.5.1   2014-08-09 cran (@0.5.1) 
##  spacyr         0.9.9   2018-04-17 CRAN (R 3.4.4)
##  stats        * 3.4.4   2018-03-16 local         
##  stopwords      0.9.0   2017-12-14 CRAN (R 3.4.3)
##  stringi        1.2.2   2018-05-02 CRAN (R 3.4.4)
##  stringr      * 1.3.1   2018-05-10 CRAN (R 3.4.4)
##  tibble       * 1.4.2   2018-01-22 CRAN (R 3.4.3)
##  tidyr        * 0.8.1   2018-05-18 CRAN (R 3.4.4)
##  tidyselect     0.2.4   2018-02-26 CRAN (R 3.4.3)
##  tidytext     * 0.1.9   2018-05-29 CRAN (R 3.4.4)
##  tidyverse    * 1.2.1   2017-11-14 CRAN (R 3.4.2)
##  tokenizers     0.2.1   2018-03-29 CRAN (R 3.4.4)
##  tools          3.4.4   2018-03-16 local         
##  utils        * 3.4.4   2018-03-16 local         
##  viridisLite    0.3.0   2018-02-01 CRAN (R 3.4.3)
##  withr          2.1.2   2018-03-15 cran (@2.1.2) 
##  wordcloud    * 2.5     2014-06-13 CRAN (R 3.4.1)
##  xml2           1.2.0   2018-01-24 CRAN (R 3.4.3)
##  yaml           2.1.19  2018-05-01 CRAN (R 3.4.4)