#Project Abstract Make it easy for your audience to quickly determine what they’re about to digest. Use an abstract or introduction to recall your objectives and clearly state them for your readers. What is the problem that you’ve set out to solve? If you have the desired outcome or any expectations of your audience, say it, as this is the entire reason you’re presenting them with your analysis. You then cover everything from your preamble in this section: the question you’ve been on a mission to answer, your hypothesis, and the methodology you’ve used. Finally, you will often provide a high-level summary of your results and key findings. Don’t worry about spoiler alerts or boring your readers to death with the content that’s about to follow. Trust that if they pay attention past the introduction that they are interested in how you achieve what you claim you have. - It should state the main objective and rationale of your project, - It should outline the methods you used to accomplish your objectives, - It should list your project’s results or product (or projected or intended results or product, if your project is not yet complete), and it should draw conclusions about the implications of your project. "In this project, we applied the memory-based algorithm and model-based algorithm to do collaborative filtering. For the memory-based algorithm, first we use Pearson correlation, vector similarity and simrank(only for Movie data) to gain similarity weight, then we use weight threshold, best-n-estimator and combined to select neighborhoods, finally we predict based above elements. For the model-based algorithm, we use cluster models with EM algorithm. We choose the number of classes by selecting the model structure that yields the largest marginal likelihood of the data. After obtaining the best C, we evaluate it with rank score and compare the results with the memory-based model. We use Rank Score to evaluate MS data, MAE and ROC to evaluate Movie data." #Contribution Statement - "Group Member 1: Completing the Model-based Algorithm with Linna Yu. We improve the algorithm, train the data, come up with the best number of clusters. Calculating the rank score and compared with a memory-based algorithm." - "Group Member 2: Process the Movie-data into a matrix that can be used to calculate the similarity; Write a function to gain Pearson correlation and vector similarity and apply it on MS-data and Movie-data; Write a function to select neighbor and apply it on Movie-data; Write a function to predict and apply it on movie data." - "Group Member 3: Write the SimRank function of Movie data. Write the MAE and ROC function to evaluate the performance of different methods." #Introduction The introduction should provide an overview of the motivation for the analysis. Whats the problem you were attempting to address with the following analysis. You should include important contextual information about the reason for the report; perhaps you recognize a trend that your organization needs to pay attention to or perhaps you’ve identified a gap that could be filled by data science. State the problem clearly. You should state your main questions here...not the descriptive questions the ones you answering with “advanced data science approaches”. You should also follow-up with the major findings. Something like by applying machine learning we found: - Finding 1 - Finding 2 - Finding 3 Based on this we recommend…. #The Dataset The section should contain descriptions of the dataset. If you want to list all the features and descriptions of the data put it here. If you’ve merged several datasets let the reader know and provide links to the datasets. ```{r} mycars <- mtcars str(mycars) ``` #Methodology In this section, you should provide a detailed description of the method. You don’t need to insert mathematical formulas, but you should write about the method as if you were explaining it to someone who heard about it for the first time (e.g., to answer the questions we used k-means clustering. K-means clustering is a data science algorithm that…). You should also add some information about how the algorithm is evaluated so people reading the report know what the common ways to validate your analysis (e.g., the most critical metric regards how well the model does in predicting the target variable on out of sample observation). #Analysis This is the bulk of your work. Think about this as an inverted triangle where you’re starting with the exploratory analysis and working your way towards the main question. Include figures, the results of models to illustrate your analysis. This section should start with exploratory data analysis and end with reporting results from application of machine learning algorithms For instance, > Our dataset of mtcars contained data about `r dim(mycars)[1]` cars. Cars had an average mpg of `r mean(mycars$mpg)`. ```{r} ``` #Conclusions A good analysis is repetitive. You know the intricacies of your work in and out, but your audience does not. You’ve told your readers in your abstract (or introduction, if you prefer) what you had ventured to do and even what you end up finding and the content lays this all out for them. In the conclusions section, you hit them with it again. At this point, they’ve seen the relevant data you’ve carefully chosen to support your theory so it’s time to formally draw your conclusions. Your readers can decide if they agree or not. Speaking of being repetitive, after making your conclusions, you again remind your readers of the objective(s) of this report. Restate them again and help your readers help you―what do you expect now? What feedback would you like? What decision-making can happen now that your report is presented and the insights have been shared? In my work, I often collaborate with strategists to develop a set of recommendations for our clients. Typically I'll take a stab at it based on the expertise I've gained in working with the data and a strategist will refine using their business insights. - Restate the questions from your introduction. - Restate important results. - Include any recommendations for additional data as needed. #References Sources you used in the work. #Appendix You should display the appropriate code here. If you have placed everything in a markdown file please submit it with the final report and not it here. If you have a code repository link to it here.