I run a large scale national survey. We download the data from our survey platfo...

wodenokoto · on July 6, 2019

You are right that tidy data is in a different form that many supplied tables are.

If I understand correctly, you want to know how many NA's there are in each column in a wide-form dataset (as opposed to a tidy dataset)

    # One line to make the data tidy.
    # The form of data will be 3 columns: id, question, answer, and no, we don't care what the columns are called, except for id.
    tidydf <- df %>% gather("question", "answer", -id) 

    # one line to do your check
    tidydf %>% group_by(id) %>% summarise(n_NA = sum(is.na(answer)))

Tidyverse is highly opinionated about its data structure, and it is one of its limiting factors, as it basically treats every dataset as a sparse dataset. This actually fits very well with your data, as a datapoint is not a fixed questionnaire, but rather a datapoint is a respondents answer to a question (as questionnaires vary in questions, a tall table layout is quite fitting).

From there on you have to think in groups and summaries, unless you wanna fight the library.

Tidyverse is an 80% datascience solution. It solves what you need 80% of the time really, really well, and the last 20% you either have to fall back to base R or really torture dplyr.

notafraudster · on July 9, 2019

I agree that an alternative way to do this would be to mutate an ID column (maybe row number), then gather, then summarize, except that this of course will throw away all the rest of the data, so not great if all you want to do is add a column. Hence, I normally map rows or use base R.

robust · on July 6, 2019

Thank you for providing this example! I share your experience that rowwise operations seem more difficult to program using the tidyverse than using apply.

cwyers · on July 5, 2019

map() in purrr is functionally equivalent to lapply(). If you can do something in lapply, you can do it with map.

notafraudster · on July 6, 2019

Right, which is why my example used its cousin, apply, instead of lapply. apply over the row margins of a data frame does not have an equivalent in tidyverse.

hadley · on July 6, 2019

Beware that apply() coerces data frames to matrices, which is time consuming and forces all columns to have the same type.