Use of Asymmetric Models to Estimate the Distribution of Usual Nutrient Intakes

The issue of estimating usual nutrient intake distribution is a challenge for dietitian and statistics. The 24-hour recall is usually one method to collect data and the distributions of usual nutrient intakes are, in general, asymmetric. Thus, this study aims to use asymmetric models in order to estimate the distribution of usual nutrient intakes. Data were drawn from a Health Inquires Survey from São Paulo city, Brazil. It was a cross-sectional population-based study with 1662 individuals. Two 24-hours recalls were collected and it was used in the Nutrition Data System for Research (NDSR) to obtain micronutrient intake data. A random intercept model was used to fit the distribution of micronutrient intake data, which characterizes different measures at the same subject. Asymmetric distributions were proposed for response variable and, for the random effect, a normal distribution was used. The results based on asymmetric models were compared with the National Cancer Institute (NCI) method for amount. No important differences among methods were observed, but this new approach shows advantages: it does not require data transformation and results can be directly interpreted from the estimated parameter of the considered distribution.


Introduction
A common purpose of the dietary assessment is to evaluate the dietary intake of a group or population in relation to one standard, also respecting both the nutrient adequacy and the prevention of chronic disease [1].
There are several methods to measure the intake of nutrients and foods.The most commonly used method is the 24-hour dietary recall.Recall at a single point in time cannot accurately estimate the usual intake because the central characteristic of the diet is the daily variability [2].Factors such as the day of the week, seasonality, among others contribute for this variability.Therefore, it is necessary to use statistical methods to estimate usual dietary intake in order to remove the within-person variability [3].Likewise, some statistical methods have been developed to fit a measurement error model and also the prevalence of inadequacy intake is calculated based on a given standard of several nutrients according to the Estimate Average Recommendation (EAR) or the Adequate Intake (AI).Such methods are: National Research (NR), Iowa State University (ISU) and Iowa State University for Foods (ISUF), Best Power (BP) and National Cancer Institute (NCI).The frame of these methods is the same and the differences between them arise from different assumptions about the measurement characteristics of the 24-hour dietary recall [1].The main point is that the NCI method leads to a substantial improvement over the other existing methods to estimate the distribution of usual intake.Extensions of this model also have been proposed including the episodically consume of foods [4].
When the distribution of the nutrient is very asymmetric, sometimes, the NCI method does not fit properly.An alternative method was used to fit the distribution of nutrient intake directly without considering the within-person variability [5].The authors showed that the estimate inadequate prevalence of the considered nutrients was similar when using an empirical method.Based on this result, an asymmetric distribution can carry out for better results.Another point could be the use of models without the need of data transformation.
Hence, the aim of this paper is to use some asymmetric models to estimate the density of the usual intake and to make some comparisons with the NCI method for amount.

Methodology
Data were drawn from the Health Inquires survey of São Paulo (ISA-Capital 2008).This is a cross-sectional population-based study in a probabilistic sample of individuals living in permanent homes located within the urban area of São Paulo city, Brazil.The studied sample comprised 1662 individuals, from which 508 are adolescents (12-18 years), 637 adults (19-59 years) and 517 elderly (60 years or older) of both sexes.More details about the sampling design can be seen in [6].
Demographic, socioeconomic and lifestyle data were collected from households using a structured questionnaire administered by trained interviewers and two 24-hour recalls were obtained for dietary intake.
Gender was analyzed as a dichotomous qualitative variable (male or female).Age was measured in years, and it was calculated as the difference between the date of data collection and the date of birth of the respondent.Family income per capital was calculated by summing the monetary income reported by all family members and divided by the number of family members, and classified as ≤ 1 minimum wage or > 1 minimum wage (minimum wage in 2008=US$ Page -02 ISSN: 2469-4185 260,00).Educational level of the head of the household was measured in years of schooling and categorized as ≤ 9 or ≥ 10 years of study.
The first 24-hour recall was collected in the home visit and conducted through the Multiple-Pass Method, in which the respondent is guided through five steps (quick list, forgotten foods list, time and occasion, detail and review, final probe) in a standardized process, which helps to maintain the individual interested and engaged in the interview, and helps them remember all the items consumed [7].The second 24-hour recall was conducted by telephone using the interview system of the Nutrition Data System for Research (NDS-R) version 2007, developed by the Nutrition Coordinating Center at the University of Minnesota, Minneapolis, MN, USA, which resembles the Automated Multiple-Pass Method, as it enables the same structure to collect dietary data in five steps [8].
The Nutrition Data System for Research (NDSR) software uses the American food composition database developed by the United States Department of Agriculture (USDA) to transform the information from the 24-hour recall into nutrient intake.The adequacy of nutritional values of foods included in the software was checked using the Brazilian Table of Food Composition.Values of folate and iron were corrected considering the mandatory fortification of prevailing wheat and corn flours in Brazil since 2004.A consistency analysis of dietary data was performed in order to identify possible errors in data collection and processing.
Iron, calcium, magnesium, selenium, zinc, and folate intake were stratified by age range and were described in central tendency measures (mean, median, minimum, maximum) and variability measures (standard deviation, quartiles and variation coefficient -based on the median of the data distribution [9]), once the consumption is different considering this age range.Additionally, adjusted box-plots were presented to evidence the presence of outliers and the asymmetric data distribution [10].
A random intercept model was used to fit the micronutrient intake data which characterizes different measures at the same subject.The idea of modeling used is that the between-person variability is absorbed through the considered random effect and the within-person variability is absorbed for the own nature of the distribution of the chosen response variable which is similar to the model for amount-only model [11].
Asymmetric distributions were proposed for the response variable and for the random effect, a normal distribution with zero mean and variance κ 2 was used.In order to select such asymmetric distributions, fitdist and histdist functions (from GAMLSS -Generalized Additive Models for Location, Scale and Shape, routine at R software, v.3.0.1) were used [12].Next, the asymmetric models were adjusted for energy to verify the influence of the co-variable in the analysis.The penalized maximum likelihood method was used to estimate the parameters of the asymmetric models and the estimate processes were made by RS and CG interactive algorithm [13].The variance of the random effect was estimate by gamlss.mxusing EM algorithm.The fitted asymmetric models were made using R software, v.3.0.1 [14].
The National Cancer Institute (NCI) method for amount-only model which is presented in SAS version 9.3 software, it is implemented by MIXTRAN macro and it was used for comparison to the proposed models since it has a similar structure with the asymmetric models in this study.The average of usual intake estimated by the NCI method via DISTRIB macro is also shown for the purpose of comparison with the parameters estimated by asymmetric models.
The fitted box-plots presented in Figure 1 highlighted the right asymmetry in the distribution of the micronutrient intake as well as the presence of discrepant points as, for example, calcium intake for teenagers (<19 years), which the greater value intake was 3380.40 mg while the median was 545.25 mg.In Table 1, it can be observed high values of standard deviation (SD) for the majority of micronutrients.In this way, the descriptive measures suggested a statistical modelling using asymmetry and robust estimation.
As the distribution of the data can be asymmetric and leptokurtic, fitdist and histdist functions were used in order to select the best distribution for each nutrient.The selected distribution were: gamma, reverse Gumbel, generalized inverse Gaussian, Log normal, Box-Cox t and Box-Cox Cole-Green.
Gamma is an asymmetric distribution for positive variables and depends on a shape parameter α and a scale parameter β.These two parameters are associated with the mean and the variance of the distribution.The reverse Gumbel distribution is a particular case of the extreme value distribution that arises in a logarithm form of a Weibull distribution.The parameters μ and σ of the reverse Gumbel, generalized inverse Gaussian and log normal distributions are associated with the mean and the variance, respectively.In the Box-Cox-t distribution, the parameters μ, σ, ν and τ can be interpreted as a scale (related to median), relative dispersion (associated to the variation coefficient, based on the median of the distribution), asymmetry (power transformation to symmetry) and kurtosis (degrees of freedom), respectively.In this case, the process of estimating the parameters is more robust.The Box-Cox Cole-Green is a special case of the Box-Cox t distribution when the parameter referred to the degrees of freedom number tends to infinity (similar to the existing relationship between t-Student and normal distribution, when the number of degrees of freedom is high).Besides, it is important to remember that the log normal distribution is a special case of Box-Cox Cole-Green distribution when the asymmetry parameter is zero.In this way, their parameters can be interpreted as related to the median and variation coefficient based on the median of the data distribution [14].For random effect, a normal distribution with zero mean and variance κ 2 was proposed.Page -04

Page -03
Table 2 presents the fitted models and the estimate parameters as well as the mean intake obtained by NCI method for amount.It can be observed in Table 2 the obtained estimate by the NCI method for the mean of distribution intake and the obtained estimate for the Box-Cox t and Box-Cox Cole-Green related to the median of it (denoted by µ) are very close to the values observed in the descriptive analysis (Table 1).The asymmetric models that refer to the mean of the intake distribution, in a general way, also presented plausible estimates.As an example, for the calcium intake, reverse Gumbel distribution for the age group less than 19 years old, the estimated mean was 231.96 mg, very close to that described by the raw data (233.60mg).In relation to the Box-Cox t and Box-Cox Cole-Green models, it can be observed that the parameter estimate related to the data variability (σ), in which are very close to the observed values of the raw distribution (VC*), presented a lower estimate than the observed in data.This is expected as the model variability has been explained by the variance of the random effect (Table 1).It is also important to observe that the value of the standard deviation associated to the random effect was too low, exception for the magnesium at the first age group.With this fact, one can infer that between-person variability was not relevant, as observed by the individual profile.
Akaike information criterion (AIC) was used to compare the NCI method and the asymmetric distribution for the distribution nutrient intake, presented on Table 3. From these obtained values, one can observe that there is not a significant difference between the NCI and the proposed asymmetric model.The same happens when energy was included in the model as a confounding factor.About asymmetric models with energy adjustment, an interesting fact was that the variability of the random effect was close to zero, indicating that, probably, there is no need to consider this effect in the modelling, once no interference of it was observed.

Discussion
Several methods to estimate the distribution of the usual intake have been proposed in the literature as then NCI method (considered as a standard in this paper), MSM method (Multiple Source Method), ISU method (Iowa State University) and SPADE method (Statistical Program for Age-Adjusted Dietary Assessment).A comparative study using these four methods was made using two 24-hour recalls [15,16].Authors warn that care must be taken in cases of high variability of high asymmetry.In the present study, descriptive analyses were made for the considered nutrients that presented asymmetric distribution and different models with different distributions could be fitted, providing similar goodness of fit to the NCI method for amount.
As already mentioned, asymmetric distributions have already been used to model the nutrient intake, but without considering between and within variability [5].In another approach, such variabilities were considered by means of modelling a new distribution class, named Box-Cox symmetric class [17].In this case, data from three 24-hour recalls of older people were used and, again, values of AIC obtained from the Box-Cox symmetric class were very close to the NCI method for amount.This indicates that asymmetric models are effective to estimate the distribution of nutrient intake and to have the advantage of the direct interpretation of the involved parameters according to the used distribution, without the need to transform data and after analysis use a back transformation to get the estimate mean in the original scale, as the NCI method does.
Another advantage in using asymmetric models is the possibility to working with distributions that use the median and not the mean data as a central tendency parameter.In statistical analysis of continuous data, normal distribution is the most used due to its good properties, especially in the context of the linear models.However, outliers affect symmetry and also affect inference based on this model encouraging the development of robust procedures, which are defined as less sensitive than the pre assumptions on which they are based on [18].
Another important point is the practicality in using these models due the available tools in gamlss routine.One observed limitation in using such routine was, in some cases, the difficulty of getting the parameter estimate from some distributions fitdist selection criterion.For such cases, distributions were used based on values of AIC close to the ideal fit for the raw data.
It is worth to observe that, including the confounding variables in the distribution of nutrient intake, it seems that the parameters referred to the random effect decreases and, probably, this effect is not important.This fact has already been observed using the NCI method to obtain the inadequate prevalence of nutrient intake after including confounding variables [19].In such case, the inadequate prevalence estimate became close to the obtained by the empirical distribution that does not take the variability (between and within person) in account.For the considered models, despite of not calculating the inadequate prevalence yet, the results are very similar, indicating that the between and within-person variability can lose their effect when confounding variables are included in the model.

Conclusion
It was proposed in this paper the use of asymmetric distributions to estimate the distribution of nutrient intakes based on a random effect model.The main advantages of this new approach are no data transformation and the direct interpretation of the results with the considered distributions.Other studies now can be developed using simulated situations, in order to evaluate the precision of the estimates and the manner to estimate the inadequate prevalence using such proposed models, as well as develop routines for implementing these distributions.

Table 2 :
Asymmetric fitted models for intake data, estimate parameters (standard error) and mean intake obtained by NCI method for amount, ISA 2008.

Table 3 :
Akaike information criterion (AIC) for NCI and asymmetric models for nutrient intake data without and with energy adjustment, ISA 2008.