博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
数据科学家 数据工程师_发展数据科学家和工程师
阅读量:2523 次
发布时间:2019-05-11

本文共 6767 字,大约阅读时间需要 22 分钟。

数据科学家 数据工程师

by David Venturi

大卫·文图里(David Venturi)

发展数据科学家和工程师 (Developing Data Scientists and Engineers)

Free Code Camp问了15,000个人,他们是谁,以及他们如何学习编码。 我隔离了那些专注于数据科学和数据工程的人。 (Free Code Camp asked 15,000 people who they are, and how they’re learning to code. I isolated those focused on data science and data engineering.)

More than 15,000 people responded to Free Code Camp’s 2016 New Coder Survey, granting researchers (like me!) an unprecedented glimpse into how people are learning to code. They released the entire dataset on .

超过15,000人对Free Code Camp的2016年New Coder调查做出了回应,使研究人员( 像我一样! )空前地了解了人们如何学习编码。 他们在上发布了整个数据集。

646位受访者回答了“ 数据科学家/数据工程师 ”的问题:“ 您最感兴趣的角色是哪个?(646 respondents answered “Data Scientist/Data Engineer” to the question: “Which one of these roles are you most interested in?)

Here are a few high-level statistics from this data-focused subset, which complements Free Code Camp’s .

以下是这个以数据为中心的子集的一些高级统计信息,补充了Free Code Camp 的 。

I’ve borrowed the structure of Free Code Camp’s announcement article for ease of comparison. I’ve also included my comments where findings differ notably. And a few bonus plots, too!

为了便于比较,我借用了Free Code Camp的公告文章的结构。 我还发表了自己的评论,其中发现存在显着差异 还有一些奖励情节!

谁参加了? (Who participated?)

Of the 646 developing data scientists and data engineers who responded to the survey:

646位接受调查的发展中的数据科学家和数据工程师:

  • 25% are women (4% more)

    女性 25% (增加4%)

  • their median age is 26 years old (one year younger)

    他们的中位年龄是26(比她小一岁)

  • they started programming an average of 16 months ago (5 months earlier)

    他们平均在16个月前(比5个月前)开始编程

学习者的目标和方法 (Learner goals and approaches)

平均每周花14个小时学习。 (14 hours each week, on average, are spent learning.)

This is one hour less than new coders in general.

一般而言,这比新编码员少一小时。

0%的人想要自由职业者或自己创业。* (0% want to freelance or start their own business.*)

Compared to 40% for the full new coder survey, this is a bit shocking. I have a hunch these zero counts are caused by the . Every respondent that answered the job role of interest question has zero counts for “start your own business” and “freelance.”

与全新编码器调查的40%相比,这有点令人震惊。 我直觉这些零计数是由引起 。 每个回答了兴趣职位问题的受访者,“开办自己的企业”和“自由职业”的计分都为零。

52%的人已经在申请工作,或者将在明年开始申请。 (52% percent are already applying for jobs, or will start applying within the next year.)

This is a longer time horizon than new coders in general, where 65% are applying within the next year.

一般而言,这比新编码员的时间跨度更长,因为新编码员将在明年申请65%的编码。

他们中的大多数人希望在办公室工作,而不是远程工作。 (Most of them want to work in an office, as opposed to remotely.)

并且大多数人愿意搬迁。 (And a majority are willing to relocate.)

他们中的大多数人尚未参加任何现场编码活动。 (Most of them have not yet attended any in-person coding events.)

64%的人使用过Coursera,edX或Udacity中的至少一种。 (64% have used at least one of Coursera, edX, or Udacity.)

Only 46% of new coders in general have used at least one of these resources. These companies have a wider range of subject areas than the some of the coding-specific resources listed.

通常,只有46%的新编码员至少使用了其中一种资源。 这些公司的主题领域比列出的某些特定于编码的资源还要广泛。

Of them, , , and are the only data-specific podcasts noted.

其中, , 和是唯一提到的特定于数据的播客。

只有1%的人参加了训练营。 (Only 1% have attended a bootcamp.)

6% of new coders have attended a bootcamp.

6%的新编码员参加了训练营。

人口统计学和社会经济学 (Demographics and Socioeconomics)

以数据为中心的受访者来自166个国家。 (Data-focused respondents represent 166 countries.)

超过90%来自北美,欧洲和亚洲。 (More than 90% are from North America, Europe, and Asia.)

The dominating percentage of North Americans should be expected because Free Code Camp is based in the United States.

因为Free Code Camp的总部位于美国,所以应该可以预期北美人占主导地位。

他们的城市涵盖了广泛的城市化水平。 (Their cities span a wide range of urbanization levels.)

不到四分之一的受访者是他们国家的少数民族。 (Just under a quarter of respondents are ethnic minorities in their country.)

几乎一半是非英语母语者。 他们长大后会讲148种语言中的一种。 (And nearly half are non-native English speakers. They grew up speaking one of 148 languages.)

67%的人至少拥有学士学位。 (67% have earned at least a bachelor’s degree.)

Compared to 58% for new coders in general, the data-focused subset is more skewed towards post-secondary studies.

相比于一般新程序员的58%,以数据为中心的子集更倾向于中学后学习。

Diversity amongst majors is greater compared to the full survey, where Computer Science and Information Technology checked in at #1 and #2 with 17% and 5%, respectively.

与完整调查相比,专业之间的差异更大,在完整调查中,计算机科学和信息技术分别以17%和5%位居第一和第二。

目前只有一半以上在工作。 (Just over one-half are currently working.)

Two-thirds of the new coder population are currently working.

目前有三分之二的新编码员正在工作。

科技行业的四分之一工作。 (A quarter work in the tech industry.)

There is a higher variety of employment fields compared to the full dataset, where 50% of respondents work in software development and IT.

与完整数据集相比,雇佣领域的多样性更高,在整个数据集中,有50%的受访者从事软件开发和IT工作。

目前的中位数工资为$ 44k。 (Median current salary is $44k.)

The median current salary for the full dataset is $37k.

完整数据集的当前薪水中位数为37,000美元。

他们希望凭借新的数据科学/工程技能获得中位数6万美元。 (And they expect to earn a median of $60k with their new data science/engineering skills.)

The median for the full survey dataset is $50k. With data science/engineering being in 2016, some respondents might be seeking higher wages.

整个调查数据集的中位数为5万美元。 随着2016年数据科学/工程学的 ,一些受访者可能会寻求更高的薪水。

7%曾在本国的军队中服役。 (7% have served in their country’s military.)

13%有孩子,另外3%在经济上抚养年长或残疾亲戚。 五分之一的人在没有配偶帮助的情况下这样做。 (13% have children, and another 3% financially support an elderly or disabled relative. And one-fifth are doing this without the help of a spouse.)

47%的人认为自己就业不足(从事的工作低于其教育水平)。 (47% consider themselves underemployed (working a job that is below their education level).)

This is 5% higher than new coders in general.

一般而言,这比新编码员高5%。

如果他们有房屋抵押贷款,他们平均要欠$ 194k。 (If they have a home mortgage, they owe an average of $194k.)

如果他们有学生贷款,他们平均要欠37,000美元。 (If they have student loans, they owe an average of $37k.)

This average is $3k more than the full survey dataset.

该平均值比整个调查数据集高出3000美元。

14%的人尚未在家中使用高速互联网。 (14% don’t yet have high-speed internet at home.)

目前,有3%的人正在从政府那里获得残疾补助。 (And 3% are currently receiving disability benefits from their government.)

这些是正在学习数据科学和工程的人。 免费的,自定进度的学习资源绝对重要。 (These are the people who are learning data science and engineering. Free, self-paced learning resources are definitely important.)

下一步是什么? (What’s next?)

You can find a of this analysis on Kaggle, where I outline my process.

您可以在Kaggle上找到此分析的 ,其中概述了过程。

Be sure to check out my initial exploration of , where I dive deeper into the characteristics of new coders:

一定要检查一下我对初步探索,在此我将更深入地研究新编码员的特征:

If you have questions or concerns about this series or the R code that generated it, don’t hesitate to .

如果您对此系列或生成它的R代码有疑问或疑虑,请随时 。

翻译自:

数据科学家 数据工程师

转载地址:http://jhewd.baihongyu.com/

你可能感兴趣的文章
可持久化数组
查看>>
去除IDEA报黄色/灰色的重复代码的下划波浪线
查看>>
Linux发送qq、网易邮件服务配置
查看>>
几道面试题
查看>>
【转】使用 WebGL 进行 3D 开发,第 1 部分: WebGL 简介
查看>>
js用正则表达式控制价格输入
查看>>
chromium浏览器开发系列第三篇:chromium源码目录结构
查看>>
java开发操作系统内核:由实模式进入保护模式之32位寻址
查看>>
第五讲:单例模式
查看>>
Python编程语言的起源
查看>>
Azure ARMTemplate模板,VM扩展命令
查看>>
(转)arguments.callee移除AS3匿名函数的侦听
查看>>
onNewIntent调用时机
查看>>
MYSQL GTID使用运维介绍(转)
查看>>
04代理,迭代器
查看>>
解决Nginx+PHP-FPM出现502(Bad Gateway)错误问题
查看>>
Java 虚拟机:互斥同步、锁优化及synchronized和volatile
查看>>
2.python的基本数据类型
查看>>
python学习笔记-day10-01-【 类的扩展: 重写父类,新式类与经典的区别】
查看>>
查看端口被占用情况
查看>>