Choosing k value in KNN classifier?2019 Community Moderator ElectionBackpropagation: how do you compute the...

Should the British be getting ready for a no-deal Brexit?

Crop image to path created in TikZ?

Does bootstrapped regression allow for inference?

"listening to me about as much as you're listening to this pole here"

Prime joint compound before latex paint?

How would photo IDs work for shapeshifters?

How can I add custom success page

Typesetting a double Over Dot on top of a symbol

Are objects structures and/or vice versa?

Ideas for 3rd eye abilities

Why is the design of haulage companies so “special”?

Is domain driven design an anti-SQL pattern?

How is it possible for user's password to be changed after storage was encrypted? (on OS X, Android)

Manga about a female worker who got dragged into another world together with this high school girl and she was just told she's not needed anymore

Are white and non-white police officers equally likely to kill black suspects?

Copycat chess is back

What is the command to reset a PC without deleting any files

Domain expired, GoDaddy holds it and is asking more money

Is there a familial term for apples and pears?

LWC and complex parameters

Denied boarding due to overcrowding, Sparpreis ticket. What are my rights?

Why do we use polarized capacitors?

How to make particles emit from certain parts of a 3D object?

New order #4: World

Choosing k value in KNN classifier?

2019 Community Moderator ElectionBackpropagation: how do you compute the gradient of the final output with respect to any loss function?scikit-learn classifier reset in loopSci-kit learn function to select threshold for higher recall than precisionInterpreting 1vs1 support vectors in an SVMStacking when the the target variable is categorical?How can I do classification with categorical data which is not fixed?Why does Bagging or Boosting algorithm give better accuracy than basic Algorithms in small datasets?When does decision tree perform better than the neural network?Problem about tuning hyper-parametresHow to use a one-hot encoded nominal feature in a classifier in Scikit Learn?

I'm working on classification problem and decided to use KNN classifier for the problem.

so if k=131 gave me auc of 0.689 and k=71 gave me auc of 0.682 what should be my ideal k?

Does choosing higher k means more usage of computational resource? if that's the case can I go with k=71. (or) should I always use K with maximum score no matter what?

asked 8 hours ago

user214

22318

$begingroup$
So, are you calculating auc using cross-validation?
$endgroup$
– pythinker
8 hours ago

$begingroup$
@pythinker yes..
$endgroup$
– user214
8 hours ago

add a comment |

I'm working on classification problem and decided to use KNN classifier for the problem.

so if k=131 gave me auc of 0.689 and k=71 gave me auc of 0.682 what should be my ideal k?

Does choosing higher k means more usage of computational resource? if that's the case can I go with k=71. (or) should I always use K with maximum score no matter what?

asked 8 hours ago

user214

22318

$begingroup$
So, are you calculating auc using cross-validation?
$endgroup$
– pythinker
8 hours ago

$begingroup$
@pythinker yes..
$endgroup$
– user214
8 hours ago

add a comment |

I'm working on classification problem and decided to use KNN classifier for the problem.

so if k=131 gave me auc of 0.689 and k=71 gave me auc of 0.682 what should be my ideal k?

Does choosing higher k means more usage of computational resource? if that's the case can I go with k=71. (or) should I always use K with maximum score no matter what?

asked 8 hours ago

user214

22318

I'm working on classification problem and decided to use KNN classifier for the problem.

so if k=131 gave me auc of 0.689 and k=71 gave me auc of 0.682 what should be my ideal k?

Does choosing higher k means more usage of computational resource? if that's the case can I go with k=71. (or) should I always use K with maximum score no matter what?

machine-learning k-nn

asked 8 hours ago

user214

22318

asked 8 hours ago

user214

22318

asked 8 hours ago

user214

22318

asked 8 hours ago

user214

22318

asked 8 hours ago

user214

22318

$begingroup$
So, are you calculating auc using cross-validation?
$endgroup$
– pythinker
8 hours ago

$begingroup$
@pythinker yes..
$endgroup$
– user214
8 hours ago

add a comment |

$begingroup$
So, are you calculating auc using cross-validation?
$endgroup$
– pythinker
8 hours ago

$begingroup$
@pythinker yes..
$endgroup$
– user214
8 hours ago

So, are you calculating auc using cross-validation?

– pythinker
8 hours ago

@pythinker yes..

– user214
8 hours ago

add a comment |

2 Answers
2

active

oldest

votes

Because knn is a non-parametric method, computational costs of choosing k, highly depends on the size of training data. If the size of training data is small, you can freely choose the k for which the best auc for validation dataset is achieved. In the case where you have a large training dataset, choosing large k can lead to huge computational complexity which is reflected in slow prediction for test data.

answered 7 hours ago

pythinker

5431211

$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
7 hours ago

1

$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
7 hours ago

add a comment |

I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.

answered 8 hours ago

Stephen Ewing

112

New contributor

$begingroup$
So I used go with k=131
$endgroup$
– user214
8 hours ago

$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
8 hours ago

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "557"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f48905%2fchoosing-k-value-in-knn-classifier%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

answered 7 hours ago

pythinker

5431211

$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
7 hours ago

1

$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
7 hours ago

add a comment |

answered 7 hours ago

pythinker

5431211

$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
7 hours ago

1

$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
7 hours ago

add a comment |

answered 7 hours ago

pythinker

5431211

answered 7 hours ago

pythinker

5431211

answered 7 hours ago

pythinker

5431211

answered 7 hours ago

pythinker

5431211

answered 7 hours ago

pythinker

5431211

$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
7 hours ago

1

$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
7 hours ago

add a comment |

$begingroup$
does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?
$endgroup$
– user214
7 hours ago

1

$begingroup$
Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.
$endgroup$
– pythinker
7 hours ago

does 100k rows and 8000 features qualify as big training data? Also choosing high k values means we are underfitting how can I know that i'm not underfitting when choosing high k values?

– user214
7 hours ago

Yes, that’s actually a big training dataset. To ensure that you are not underfitting or overfitting, you should check the performance of your model on the training and validation dataset, simultaneously. If it training score is low, you are underfitting. If training score is much higher than validation score, you are overfitting. The best case is when training and validation scores are close enough.

– pythinker
7 hours ago

add a comment |

I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.

answered 8 hours ago

Stephen Ewing

112

New contributor

$begingroup$
So I used go with k=131
$endgroup$
– user214
8 hours ago

$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
8 hours ago

add a comment |

I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.

answered 8 hours ago

Stephen Ewing

112

New contributor

$begingroup$
So I used go with k=131
$endgroup$
– user214
8 hours ago

$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
8 hours ago

add a comment |

I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.

answered 8 hours ago

Stephen Ewing

112

New contributor

I was taught the best way is to find the error for each k then plot them and look for the "elbow" on the plot.

answered 8 hours ago

Stephen Ewing

112

New contributor

answered 8 hours ago

Stephen Ewing

112

New contributor

answered 8 hours ago

Stephen Ewing

112

answered 8 hours ago

Stephen Ewing

112

New contributor

Stephen Ewing is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

$begingroup$
So I used go with k=131
$endgroup$
– user214
8 hours ago

$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
8 hours ago

add a comment |

$begingroup$
So I used go with k=131
$endgroup$
– user214
8 hours ago

$begingroup$
It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.
$endgroup$
– Stephen Ewing
8 hours ago

So I used go with k=131

– user214
8 hours ago

It really depends. The higher your k the higher your chance of overfitting. So if you do every k from 2 to 200 and plot the error of all of them you use the k where the curve starts to flatten out.

– Stephen Ewing
8 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Sfdwhf