This document is an exploratory analysis of all accepted short papers, full papers, and posters at the AGILE conference 2018, in Lund, Sweden - “Geospatial Technologies for All”.
The analysis is based on the work published in “Reproducible research and GIScience: an evaluation using AGILE conference papers” (see preprint and code repository). You can see the source code of the analysis in the source R Markdown file agile-2018-papers.Rmd
.
The PDFs are read in from three seperate directories. The full papers are not openly available. The short papers and posters can be downloaded from the conference website.
The following table gives the numbers of used documents split up by type.
type | count |
---|---|
full | 19 |
poster | 32 |
short | 74 |
The text is extracted from PDFs and it is processed to create a tidy data structure without stop words. The stop words include specific words, such as lund
, which is included in the page header, abbreviations, and terms particular to scientific articles, such as figure
.
About 49 % of the words are considered stop words.
The following tables shows how many non-stop words each document has, sorted by number of non-stop words. The id
is built from the file name plus a prefix: for full papers, it is the significant part (the last 3 numbers) of the DOI and the prefix fp_
; for short papers and posters, it is the submission number included in the file name and the prefixes sp_
and po_
respectively.
id | type | non-stop words |
---|---|---|
fp_92 | full | 4820 |
fp_910 | full | 4575 |
fp_97 | full | 4355 |
fp_99 | full | 4243 |
fp_98 | full | 3641 |
fp_915 | full | 3586 |
fp_913 | full | 3496 |
fp_916 | full | 3431 |
fp_912 | full | 3374 |
fp_914 | full | 3334 |
fp_91 | full | 3123 |
fp_917 | full | 3019 |
fp_919 | full | 3015 |
fp_911 | full | 2965 |
fp_93 | full | 2787 |
fp_94 | full | 2736 |
fp_918 | full | 2643 |
fp_96 | full | 2596 |
fp_95 | full | 2570 |
sp_121 | short | 2406 |
sp_152 | short | 2373 |
sp_41 | short | 2290 |
sp_122 | short | 2274 |
sp_128 | short | 2218 |
sp_171 | short | 2175 |
sp_93 | short | 2044 |
sp_139 | short | 1953 |
sp_172 | short | 1933 |
sp_168 | short | 1881 |
sp_126 | short | 1876 |
po_70 | poster | 1870 |
sp_104 | short | 1866 |
sp_125 | short | 1854 |
sp_145 | short | 1834 |
sp_106 | short | 1816 |
sp_151 | short | 1814 |
sp_147 | short | 1811 |
sp_134 | short | 1809 |
sp_89 | short | 1794 |
sp_112 | short | 1787 |
po_170 | poster | 1755 |
sp_173 | short | 1751 |
sp_53 | short | 1739 |
sp_65 | short | 1730 |
sp_107 | short | 1703 |
sp_75 | short | 1702 |
sp_109 | short | 1699 |
sp_69 | short | 1697 |
sp_131 | short | 1685 |
sp_76 | short | 1673 |
sp_149 | short | 1670 |
sp_160 | short | 1669 |
sp_130 | short | 1666 |
sp_63 | short | 1657 |
sp_123 | short | 1652 |
sp_74 | short | 1633 |
sp_132 | short | 1631 |
sp_108 | short | 1630 |
sp_71 | short | 1630 |
sp_124 | short | 1608 |
sp_59 | short | 1599 |
sp_66 | short | 1594 |
sp_135 | short | 1576 |
sp_165 | short | 1569 |
sp_80 | short | 1563 |
sp_150 | short | 1559 |
sp_142 | short | 1553 |
sp_51 | short | 1540 |
sp_146 | short | 1531 |
sp_72 | short | 1527 |
sp_118 | short | 1526 |
po_48 | poster | 1516 |
sp_163 | short | 1509 |
sp_94 | short | 1509 |
sp_64 | short | 1500 |
sp_55 | short | 1460 |
sp_101 | short | 1453 |
sp_86 | short | 1453 |
sp_68 | short | 1441 |
sp_111 | short | 1425 |
sp_103 | short | 1420 |
po_57 | poster | 1411 |
sp_143 | short | 1400 |
sp_137 | short | 1391 |
sp_98 | short | 1364 |
sp_87 | short | 1362 |
sp_2018 | short | 1347 |
sp_67 | short | 1325 |
sp_114 | short | 1314 |
po_52 | poster | 1289 |
sp_2 | short | 1278 |
sp_140 | short | 1274 |
sp_84 | short | 1273 |
sp_73 | short | 1199 |
sp_157 | short | 1173 |
sp_153 | short | 1146 |
po_82 | poster | 1113 |
sp_166 | short | 1023 |
po_162 | poster | 971 |
po_81 | poster | 940 |
po_58 | poster | 931 |
sp_148 | short | 921 |
po_155 | poster | 903 |
po_99 | poster | 902 |
po_62 | poster | 869 |
po_105 | poster | 810 |
po_120 | poster | 794 |
po_136 | poster | 750 |
po_0605 | poster | 747 |
po_115 | poster | 711 |
po_161 | poster | 709 |
po_54 | poster | 708 |
po_164 | poster | 703 |
po_127 | poster | 701 |
po_102 | poster | 698 |
po_90 | poster | 670 |
po_144 | poster | 660 |
po_91 | poster | 618 |
po_77 | poster | 610 |
po_83 | poster | 598 |
po_129 | poster | 562 |
po_167 | poster | 558 |
po_110 | poster | 538 |
po_96 | poster | 516 |
po_49 | poster | 474 |
Total | 212644 |
Summary statistics of non-stop words of all documents:
id | type | non-stop words | |
---|---|---|---|
Length:125 | full :19 | Min. : 474 | |
Class :character | poster:32 | 1st Qu.:1199 | |
Mode :character | short :74 | Median :1576 | |
Mean :1701 | |||
3rd Qu.:1866 | |||
Max. :4820 |
The following word cloud is based on 347 unique words occuring each at least 100 times, all in all occuring 73243 times which comprises 34 % of non-stop words.
For the following word cloud, the word stems were extracted based on a stemming algorithm from package quanteda
.
This document is licensed under a Creative Commons Attribution 4.0 International License. All contained code is licensed under the Apache License 2.0.
Runtime environment description:
devtools::session_info(include_base = TRUE)
## Session info -------------------------------------------------------------
## setting value
## version R version 3.4.4 (2018-03-15)
## system x86_64, linux-gnu
## ui X11
## language en
## collate en_US.UTF-8
## tz Europe/Berlin
## date 2018-06-13
## Packages -----------------------------------------------------------------
## package * version date source
## assertthat 0.2.0 2017-04-11 CRAN (R 3.4.0)
## backports 1.1.2 2017-12-13 CRAN (R 3.4.3)
## base * 3.4.4 2018-03-16 local
## bindr 0.1.1 2018-03-13 CRAN (R 3.4.4)
## bindrcpp * 0.2.2 2018-03-29 CRAN (R 3.4.4)
## broom 0.4.4 2018-03-29 CRAN (R 3.4.4)
## cellranger 1.1.0 2016-07-27 CRAN (R 3.4.0)
## cli 1.0.0 2017-11-05 CRAN (R 3.4.2)
## colorspace 1.3-2 2016-12-14 cran (@1.3-2)
## compiler 3.4.4 2018-03-16 local
## crayon 1.3.4 2017-09-16 CRAN (R 3.4.1)
## data.table 1.11.4 2018-05-27 CRAN (R 3.4.4)
## datasets * 3.4.4 2018-03-16 local
## devtools 1.13.5 2018-02-18 CRAN (R 3.4.3)
## digest 0.6.15 2018-01-28 CRAN (R 3.4.3)
## dplyr * 0.7.5 2018-05-19 CRAN (R 3.4.4)
## evaluate 0.10.1 2017-06-24 CRAN (R 3.4.0)
## fastmatch 1.1-0 2017-01-28 CRAN (R 3.4.4)
## forcats * 0.3.0 2018-02-19 CRAN (R 3.4.3)
## foreign 0.8-70 2018-04-23 CRAN (R 3.4.4)
## ggplot2 * 2.2.1 2016-12-30 CRAN (R 3.4.2)
## glue 1.2.0 2017-10-29 CRAN (R 3.4.2)
## graphics * 3.4.4 2018-03-16 local
## grDevices * 3.4.4 2018-03-16 local
## grid * 3.4.4 2018-03-16 local
## gridBase * 0.4-7 2014-02-24 CRAN (R 3.4.0)
## gridExtra * 2.3 2017-09-09 CRAN (R 3.4.1)
## gtable 0.2.0 2016-02-26 CRAN (R 3.4.0)
## haven 1.1.1 2018-01-18 CRAN (R 3.4.3)
## here * 0.1 2017-05-28 CRAN (R 3.4.4)
## highr 0.7 2018-06-09 CRAN (R 3.4.4)
## hms 0.4.2 2018-03-10 CRAN (R 3.4.3)
## htmltools 0.3.6 2017-04-28 CRAN (R 3.4.0)
## httr 1.3.1 2017-08-20 CRAN (R 3.4.1)
## janeaustenr 0.1.5 2017-06-10 cran (@0.1.5)
## jsonlite 1.5 2017-06-01 cran (@1.5)
## kableExtra * 0.9.0 2018-05-21 CRAN (R 3.4.4)
## knitr 1.20 2018-02-20 CRAN (R 3.4.3)
## lattice 0.20-35 2017-03-25 CRAN (R 3.3.3)
## lazyeval 0.2.1 2017-10-29 CRAN (R 3.4.2)
## lubridate 1.7.4 2018-04-11 CRAN (R 3.4.4)
## magrittr 1.5 2014-11-22 CRAN (R 3.4.0)
## Matrix 1.2-14 2018-04-09 CRAN (R 3.4.4)
## memoise 1.1.0 2017-04-21 CRAN (R 3.4.3)
## methods * 3.4.4 2018-03-16 local
## mnormt 1.5-5 2016-10-15 cran (@1.5-5)
## modelr 0.1.2 2018-05-11 CRAN (R 3.4.4)
## munsell 0.4.3 2016-02-13 cran (@0.4.3)
## nlme 3.1-137 2018-04-07 CRAN (R 3.4.4)
## parallel 3.4.4 2018-03-16 local
## pdftools * 1.8 2018-05-27 CRAN (R 3.4.4)
## pillar 1.2.3 2018-05-25 CRAN (R 3.4.4)
## pkgconfig 2.0.1 2017-03-21 cran (@2.0.1)
## plyr 1.8.4 2016-06-08 cran (@1.8.4)
## psych 1.8.4 2018-05-06 CRAN (R 3.4.4)
## purrr * 0.2.5 2018-05-29 CRAN (R 3.4.4)
## quanteda * 1.3.0 2018-06-05 CRAN (R 3.4.4)
## R6 2.2.2 2017-06-17 CRAN (R 3.4.0)
## RColorBrewer * 1.1-2 2014-12-07 cran (@1.1-2)
## Rcpp 0.12.17 2018-05-18 CRAN (R 3.4.4)
## RcppParallel 4.4.0 2018-03-02 CRAN (R 3.4.4)
## readr * 1.1.1 2017-05-16 CRAN (R 3.4.0)
## readxl 1.1.0 2018-04-20 CRAN (R 3.4.4)
## reshape2 1.4.3 2017-12-11 CRAN (R 3.4.3)
## rlang 0.2.1 2018-05-30 CRAN (R 3.4.4)
## rmarkdown 1.9 2018-03-01 CRAN (R 3.4.3)
## rprojroot 1.3-2 2018-01-03 CRAN (R 3.4.3)
## rstudioapi 0.7 2017-09-07 CRAN (R 3.4.1)
## rvest 0.3.2 2016-06-17 CRAN (R 3.4.2)
## scales 0.5.0 2017-08-24 CRAN (R 3.4.1)
## slam 0.1-43 2018-04-23 CRAN (R 3.4.4)
## SnowballC 0.5.1 2014-08-09 cran (@0.5.1)
## spacyr 0.9.9 2018-04-17 CRAN (R 3.4.4)
## stats * 3.4.4 2018-03-16 local
## stopwords 0.9.0 2017-12-14 CRAN (R 3.4.3)
## stringi 1.2.2 2018-05-02 CRAN (R 3.4.4)
## stringr * 1.3.1 2018-05-10 CRAN (R 3.4.4)
## tibble * 1.4.2 2018-01-22 CRAN (R 3.4.3)
## tidyr * 0.8.1 2018-05-18 CRAN (R 3.4.4)
## tidyselect 0.2.4 2018-02-26 CRAN (R 3.4.3)
## tidytext * 0.1.9 2018-05-29 CRAN (R 3.4.4)
## tidyverse * 1.2.1 2017-11-14 CRAN (R 3.4.2)
## tokenizers 0.2.1 2018-03-29 CRAN (R 3.4.4)
## tools 3.4.4 2018-03-16 local
## utils * 3.4.4 2018-03-16 local
## viridisLite 0.3.0 2018-02-01 CRAN (R 3.4.3)
## withr 2.1.2 2018-03-15 cran (@2.1.2)
## wordcloud * 2.5 2014-06-13 CRAN (R 3.4.1)
## xml2 1.2.0 2018-01-24 CRAN (R 3.4.3)
## yaml 2.1.19 2018-05-01 CRAN (R 3.4.4)