2015년 1월 28일 수요일

Data Today: Statistical randomness, machine learning pipelines + more

O'Reilly Media Logo
O'Reilly DataNewsletter

1. How to build large-scale machine learning pipelines

Ben Lorica says we need primitives; pipeline synthesis tools; and most importantly,error analysis and verification.

2. Redesigning the news

The Upshot is the New York Times' attempt to redesign the news. Its hallmark: data visualizations. Related: An interesting AMA with the Washington Post's Christopher Ingraham, FiveThirtyEight's Ritchie King, and Quartz's David Yanofsky on data visualization and journalism. (King even answers the question that has been vexing data geeks for years: do we really have to say "data are"? rather than "data is?")

3. Machine learning at Pinterest

Pinterest has just acquired of Kosei, a machine learning company. They describe how they are using machine learning now—and their plans for the future—here.

4. Medium data

Big data is easy. Medium data? Not so much.

5. Award for Initiatives in Research announced

Ben RechtBen Recht has won the William O. Baker Initiatives in Research Award for "his significant contributions at the confluence of optimization, signal processing and statistics, including seminal work on matrix completion." You'll meet Ben if you attend Hard Core Data Science Day at Strata + Hadoop World in San Jose.
Sponsored Content

Become data-driven

Teradata logoWith exponentially proliferating data, companies that capture and analyze all of their data, discover insights, and take action on those insights are better positioned to compete. In these customer testimonials, watch how businesses today are enjoying sustainable competitive advantage by leveraging insights that deliver greater value to their customers and bottom line

6. Amazon changed the price of a bible 100X in 5 years

Here’s what happened to a King James bible under Amazon’s dynamic pricing algorithm.

7. Three people you should meet in London

Strata + Hadoop World is coming to London in May. If you join us, you'll meet some of the most fascinating speakers in the industry, including:
  • Olivier Grisel (Inria & scikit-learn) who will discuss predictive modelling and machine learning with open source tools like scikit-learn and IPython.
  • Gwen Shapira (Cloudera) who will walk you through the architectural considerations for Hadoop applications.
  • Tyler Akidau (Google) who will show you how to ditch your Big Data batch pipelines and go all-streaming-all-the-time, without compromising latency, correctness, or the flexibility to deal with changes in upstream data.
speakers
And, of course, dozens more. Check out the rest of the speakers, and make plans to join us now.
See the program →

8. When big data projects go wrong

A recent survey shows common themes occur when big data projects go wrong—and when they go right.

2015 Data Preview: Spark, Data Viz, YARN + more

speakers
See what's big in big data this year in this free online conference on February 4, featuring some of data's hottest topics and most sought-after speakers, including: Paco Nathan, June Andrews, Ross Fubini, Danyel Fisher, Joe Hellerstein, Dave Holz, and Alistair Croll.
Register for free →

9. 7 ways to avoid being fooled by statistical randomness

Kirk Borne offer tips on how to avoid developing a theory to explain random data. (He’ll also be speaking about dynamic events in massive data streams at Strata + Hadoop World San Jose.)

10. Freebie of the week

Data Viz reportThis week, we have another free report for you: Data Visualization: A New Language for Storytelling, by Mike Barlow. It examines the ways that data scientists and analysts can use visualizations not only for presentation, but also as a key step in the early stages of data analysis.
Download my free report →

Thank You to Our Sponsors

Presented by
ClouderaO'Reilly Media
Elite Sponsors
MapR TechnologiesMicrosoft
Strategic Sponsors
IBMIntel
MemSQLPivotal




댓글 없음:

댓글 쓰기