Introduction
Introduction.Rmd
[Updated 14.iv.21 partly to test uploading of vignettes]
Installation
You should always be able to install the latest version of the package with
remotes::install_github("cpsyctc/CECPfuns", build_vignettes = TRUE)
devtools::install_github("cpsyctc/CECPfuns", build_vignettes = TRUE)
Clearly, to make those commands work you will need to have installed
either the massive (and amazing) devtools package
(install.packages("devtools")
) or the much lighter remotes
package (install.packages("remotes")
) which is largely a
wrapper for some devtools functions.
Please check the MIT licence for liability disclaimer: use these functions entirely at your own risk.
How might you find this package useful?
I have functions for various areas of my own needs:
- data cleaning/checking and shaping for analysis,
e.g.
isOneToOne()
,getNOK()
- functions that provide statistical and psychometric analyses,
e.g.
getCSC()
,getBootCImean()
- functions that illustrate or help such analyses,
e.g.
plotBinconf()
- miscellaneous functions that have made my life easier!
Clara and Emily have each found some of these functions useful to them and the functions have helped us with the analyses in Emily’s enormous PhD and for papers that Clara and and I have published. We imagine that others working in the mental health, well-being, therapy worlds will also find them useful and they could even be useful in universities and other training settings where R is replacing SPSS.
How will it be best to use the package?
Operating system, R version and R package requirements
I am trying to make sure that all the functions run on the major operating systems: Linux, Windows and Mac and that it should work back to R 3.5. At the moment I think it works fine on all permutations of those systems and all major releases of R since 3.6. In theory it should work on 3.5 but the R-hub test machine gives an error saying that the package tibble on which CECPfuns depends is not available there though the tibble package page says it works for R ≥ 3.1.0 so there is something odd there that I may or may not be be able to sort out. If you find the package won’t work on your system for R ≥ 3.5.0 do contact me with details of your system and what errors you see and I’ll see if I can make changes to fix it.
Why tidyverse?
In the last year or so, since some time probably late 2019, I have been using the tidyverse realm within the rather diverse ecosystems of R, see here for more about why. That’s probable sensible reading if you are not already a tidyverse user and want to understand some of the design decisions I have made. The next post in my SAFAQ R blog More piping, and rowwise() is also probable sensible reading as it says a bit about why many of the functions here are designed so that they can be used either standalone or in dplyr pipes. That is particularly useful when you might want to run the same analyses for an entire dataset but also separately for subgroups, e.g. by gender, or by therapist.
My page Wisdom(!) contains, a little ruefully, many things I have learned the hard way about data checking, cleaning, mangling and analysing over the last three decades. Some of it now converges well with the reproducibility movement.
Design choices
Code
The code is all R: no C, C++, FORTRAN as they’re really beyond my programming abilities.
I have tried to indent and space the code following the conventions that lintr::lint() likes so a lot of space and consistent indenting. I have also tried to comment things liberally. Perhaps sometimes I have overdone that where the code might have been sufficiently clear not to need comments. However, I know my coding is not good (I’m essentially self-taught: a week of introduction to FORTRAN for students not doing computing in 1975). I do now sometimes find myself working with long thin format datasets with over a million rows and that will climb so some of these functions may get to be run on such sizes of data, however, most of what I work with is much, much smaller so I have almost never worried about speed or whether large numbers will break anything. I have put some error traps in on bootstrapping functions where I think the number of replications might tie things up for a long time. However, I think I have always allowed for those traps to be overridden.
Messages, error messages and warnings
I have tried to put in error messages that will stop functions rather than warnings. Sometimes, where it might conceivably be sensible to do override such a trap because you have a value you want to input to a function that I have tried to warn against, I have tried to give the function an argument that allows you to turn that error into a warning. I have put in warnings in the rarer cases when I really don’t think an error stopping things is correct, but where I think the input might be a mistake.
I have also put in messages from some functions but if I have, I think I have always also added a “verbose =” argument which if you set it to FALSE, will suppress the warning.
I have put in ridiculous numbers of sanity checks on inputs and other arguments, often there are > 10 of these even for fairly simple functions. My rationale was that I want this package to be fairly friendly for relative newcomers to R, or those only using it infrequently. I have done this to try to avoid the situation in which I put something silly into a function but it passes it on to a deeper function or step with the result that the error message I see can be very hard to understand and won’t immediately point me to my error. My aspiration is that everything you might not have meant to do gets trapped before it goes any further.
FunctionNamesAreVerlongAnd … names generally!
I have used long names for functions and for arguments. I’m sorry but I think the extra typing is offset by reduced confusion.
My function names tend to start with a verb: “classify”, “get” and
“plot” are recurring ones. I am also using a rather debased “camelCase”
where names start with a lower case letter and upper case letters mark
where a space has been lost from ordinary English. However, I can’t bear
to use getNNa for “get the Number of NA values” so it’s
getNNA
. Generally I have priviledged multiple capital
letters from English abbreviations over strict camelCase in that
way.
I have tried to use names inside functions that start with the class
of the object: vecN, valX, listNames, matCorr, tibData. However, I think
you won’t see those unless you want to see the code inside a function,
which you can always do with functionName
, i.e. the
function but without the usual “()” on the end.
Synonyms
Occasionally I have created a synonym for a function, for example
checkIsOneToOne()
is simply the same function as
isOneToOne()
and if you ask for the ### Code testing
I have used the excellent testthat library and tried to write tests
for all functions. Every function has tests on the sanity tests on the
inputs/arguments you give the fun, if the function can give warnings,
then I have tested them and generally I have put in some test on
expected output from sample input. However, at the moment I am not
testing the ggplot output from plotFunction()
functions as
I suspect that would then involve me having to reset the test targets
whenever ggplot()
is updated.
Those tests get run every time I update the package and are run on my local machine but also on the major operating systems and major R releases back to 3.5 (see above for requirements).
Help pages and vignettes
I am trying to make the help pages far more extensive than they would be for many R functions and have tried to write them so they not only give the basics to enable you to run the function, know what arguments do what and what it returns, but also to give a lot of examples of use of the function. Again unusually, I have tried to give at least some of the statistical, psychometric or pragmatic background to the function. I am also planning to build a lot of vignettes into the package which will often link a number of functions together and give more background and perhaps some illustrations of use and output. I am also aiming for the help pages, but particularly the vignettes to link to the web resources, particularly to my own (that’s not just narcissism: it’s also because then I am responsible for keeping the links working and logical and so links shouldn’t break).
At the moment the vignettes here are:
- this introduction
- some background on the package
(
vignette("Background", package = "CECPfuns")
) - information about the RCSC framework
(
vignette("RCSC_framework", package = "CECPfuns")
) - information on confidence intervals (CIs) and bootstrapping
(
vignette("Bootstrapping", package = "CECPfuns")
)
Background
You can find more background about the tiny team working on this, and some other background, in the background vignette. You should be able to get to that with this.
vignette("Background", package = "CECPfuns")
To do list!
Sadly, this list is long and certainly incomplete. I am sure it will continue to develop, the package too I hope, for years!
-
General enhancement of the package
Improve help pages for existing functions
Add tests: added for all functions so far, could do with expansion?
Complete the other vignettes that exist and then add more.
Add data
UK YP-CORE cutting points
-
Functions
- data cleaning/checking and shaping for analysis
- Add getScore()
- Add getCOREOMscores() (for the CORE-OM)
- Add getCORESFAOMscores() (for the CORE-SF/A)
- Add getCORESFBOMscores() (for the CORE-SF/B)
- Add classifyYPCOREScores() (for the YP-CORE, by age and gender)
- Add classifyScoresGeneric()
- functions that provide statistical and psychometric analyses
- Add getRCIfromItemData()
- Add getBootCIRCIfromScoresAndAlpha()
- Add getBootCIRCIfromItemData()
- Add getBootCIMeanDiff()
- Add getHedgesg()
- Add getCohend() (deal with “-2” issue)
- Add getCohendPaired()
- Add convCohend2Hedgesg() (deal with “-2”)
- Add countRCSCnumbers() (allow for gender?)
- functions that illustrate or help such analyses
- Add plotJacobsonRCSC() (ditto)
- miscellaneous functions that have made my life easier!
- Add findYourFunctions()
- Add getRmdFileStats()
- Add readCEdates()
- data cleaning/checking and shaping for analysis
then perhaps some functions for standard reports from (fairly) standard CORE datasets (should feed separate Rmd templates)