Michael Cavaretta, Ford data science leader

At Ford Motor Company “big data” means many things, which has ones and zeros piling up everywhere it looks. There’s data coming off the cars, data generated by the machinations of a Fortune 500 company and even the data customers are generating out in the real world about how they view the company. Michael Cavaretta,  the Ford data science leader, is one of they guys charged with helping the company sort all this data out.



These are the highlights from that conversation, but anyone interested in anything from automotive design to the definition of data science will want to hear the whole thing. Alternatively, you can make plans to attend our Structure Data conference March 19-20 in New York, where Cavaretta will be one of dozens of speakers talking about how their companies are using data fundamentally improve their businesses or, in many cases, enable entirely new businesses. (And for more on how data has changed life at Ford, check out this post from 2013.)

Data can inform design

Perhaps you heard during the North American International Auto Show a few weeks ago about how Ford had redesigned its F-150 pickup truck using lightweight aluminum instead of steel to help reduce fuel consumption. Well, Cavaretta’s team was part of that effort.

“We’ve done a lot of work on something we call the ‘CO2 glide path‘ and the idea from this program is to take the technologies that are applicable now, as well as the ones that we think are gonna be coming in the future,” he explained. “We take a look at the costs, we take a look at the benefits, we take a look at when we think they’re gonna come in and whether we think they’re gonna make it at that time or not. And then we kind of put them all into this gigantic hopper and we run an optimization on them, and we kind of sort everything around and we find what’s the best combination that we can look at for the next few years, and then the few years after that. So, light weighting and even aluminum was one of the technologies that went into the hopper there and obviously then got picked.”

Cavaretta’s team used an entirely different data source — social media — to help the company figure why Ford Fiesta owners in the United States didn’t like the three-blink turn signal (which generates three short blinks during lane changes) as much as European drivers did.

“Some of the people who had the three-blink feature on U.S. vehicles didn’t seem like they had the satisfaction we would have expected. So, the marketing team came up and said, ‘Can you find something for us that can give us a little more color, a little bit more story as to what people were talking about?’” he said. After doing some deep mining on social media, “It actually turned out that the satisfaction problems had more to do with the positioning on some of those vehicles of where that turn signal really was on the steering column. But when you read deeply into it, the people were very happy with it.”

Think of the value, not just the costs

Ford has been a pretty data-centric company since after World War II, Cavaretta said, but the company has taken it to a new level in the past several years. “[W]hen Alan Mulally came in, then one of his tenets was ‘the data will set you free,’ and has been always very, very focused on data,” he explained. “What we found was that just kind of doubled the amount of data and analytics the executives were asking for. So when important decisions were going to be made, it was, ‘OK, show me the data, prove to me that you’ve done the analysis and you’ve done it right and that that supports this decision.’”

That focus on data has helped the company view new technologies, such as Hadoop, as enablers of change rather than just cheaper ways to store data. Getting the right answers in a timely manner requires rethinking how the entire data environment is designed, Cavaretta said: “We really wanna move away from, ‘We’re gonna have this waterfall process where we’re gonna sit back and we’re gonna design this fantastic data warehouse and it’s gonna be operational in three years.’ And then three years go by and then half the datasets have changed and the business has moved on.”

Sometimes, though, focusing on value knowing when to say enough is enough of new technologies.

“I know that Cloudera has come out with their data hub and they’ve been talking that idea that you don’t really need data warehouses anymore. And I think for some instances, if you’re looking at new development or you’re looking at a startup company, then that could be very appropriate,” Cavaretta explained. “But for a large enterprise — Ford — it’s gonna be very difficult to come in and say, ‘Oh, we’re just gonna rip all this stuff out and we’re gonna replace it with Hadoop.’ That’s a lot of time and a lot of money and, I think, for a lot of situations, there’s good reasons for the data to be analyzed the way it is and there’s no sense in ripping that out if that immediate value proposition is not there.”

Data scientists don’t have to be demigods

Although, even with all the improvements that data analysis is bringing to Ford, Cavaretta said he’s actually very excited to see that the company doesn’t need to search out what has become the classical — and perhaps unrealistic — definition of data scientist to fills its personnel ranks. “You don’t have to look for these unicorns, these people that are incredibly difficult to find and you have to pay them incredible amounts of money,” he said. “The idea that you can build a team that has all these components in there is something that has been really exciting to me, because we’ve been able to go out and kind of be strategic about some of the people that we bring in, but then also look internal to our organization and supplement that team with some internal resources that have really worked out well.”

Still a data scientist still does need to have some discernable skills beyond being able to point data at a software platform. “While there have been a lot of vendors who say, ‘You don’t need a data scientist, just use our software,’” Cavaretta said, “that seems to me it’s gonna be a while before we get to that stage where it’s really taken over by the software itself.”

Via Gigaom